Though roughly 80% of worldwide information is in video format, generative AI has primarily focused on textual content and pictures due the complexity of video evaluation in processing visible, textual, and audio information concurrently. Not solely is video evaluation advanced as a result of its nature of multimodality, the necessity to acknowledge objects, feelings, and context and to successfully search, have interaction, and talk with video information additional present challenges.
Enter Twelve Labs, a startup constructing multimodal basis fashions for video understanding. The overarching drawback that Twelve Labs solves for is video-language alignment. Twelve Labs makes a speciality of creating machine studying programs that produce highly effective video embeddings aligned with human language. This implies their fashions can interpret and describe video content material utilizing textual content. This know-how presents prospects the power to seek for particular moments in an unlimited video archive, both by offering textual content descriptions or interacting with Twelve Labs’ fashions utilizing textual content prompts. This allows the technology of assorted kinds of content material, reminiscent of summaries, chapterizations, and highlights. Finally, Twelve Labs is revolutionizing the way in which we seek for and comprehend movies, addressing present limitations in AI. Their know-how has versatile purposes, together with advert insertion, content material moderation, media evaluation, and spotlight reel creation, making them a major participant within the area of video information interplay.
Twelve Labs initially caught our consideration when a workforce of 4 younger AI engineers gained the 2021 ICCV VALUE Problem, outperforming AI groups from tech giants reminiscent of Tencent, Baidu, and Kakao. We have been extraordinarily impressed by the fast progress of the mannequin and firm’s development because the problem. In a brief time period, Twelve Labs has turn out to be a pacesetter within the area, featured within the NVIDIA GTC 2023 Keynote, and attracting expertise like Minjoon Seo, a professor on the Korea Superior Institute of Science & Know-how (KAIST), who now serves as Chief Scientist. The expertise that Minjoon brings as a distinguished NLP analysis scientist, coupled with CTO Aiden Lee, who’s an professional in CV AI, additional validates Twelve Labs’ capability to create highly effective giant multimodal fashions to video understanding.
Twelve Labs will not be solely offering a cutting-edge video understanding answer but additionally a developer platform that’s set to launch APIs that may deal with video second retrieval, classification, and video-to-text to deal with downstream duties. Basically, Twelve Labs is bringing a brand new video interface to make video simply as straightforward as textual content, giving enterprises and builders programmatic entry to the entire semantic info that reside of their video information. This developer-friendly strategy has already attracted 20,000 builders to the platform throughout the beta part. Additional, they lately introduced that their Pegasus-1 mannequin already outperforms present fashions in video summarization benchmarks, demonstrating a major enchancment in video understanding.