使用Player FM应用程序离线!
Multimodal Video Understanding
Manage episode 388154465 series 3370867
Jae Lee is the cofounder and CEO of Twelve Labs, where they are building video understanding infrastructure to help developers build programs that can see, hear, and understand the world. He was previously the Lead Data Scientist at the Ministry of National Defense in South Korea. He has a bachelors in computer science from UC Berkeley.
In this episode, we cover a range of topics including:
- What is multimodal video understanding
- State of play in multimodal video
- The founding of Twelve Labs
- The launch of Pegasus-1
- Four core principles: Efficient Long-form Video Processing, Multimodal Understanding, Video-native Embeddings, Deep Alignment between Video and Language Embeddings
- Differences between multimodal vs traditional video analysis
- In what ways can malicious actors misuse this technology?
- The future of multimodal video understanding
Jae's favorite books:
- Deep Learning (Authors: Ian Goodfellow, Yoshua Bengio, Aaron Courville)
- The Giving Tree (Author: Shel Silverstein)
--------
Where to find Prateek Joshi:
Newsletter: https://prateekjoshi.substack.com
Website: https://prateekj.com
LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19
Twitter: https://twitter.com/prateekvjoshi
143集单集
Multimodal Video Understanding
Infinite Machine Learning: Artificial Intelligence | Startups | Technology
Manage episode 388154465 series 3370867
Jae Lee is the cofounder and CEO of Twelve Labs, where they are building video understanding infrastructure to help developers build programs that can see, hear, and understand the world. He was previously the Lead Data Scientist at the Ministry of National Defense in South Korea. He has a bachelors in computer science from UC Berkeley.
In this episode, we cover a range of topics including:
- What is multimodal video understanding
- State of play in multimodal video
- The founding of Twelve Labs
- The launch of Pegasus-1
- Four core principles: Efficient Long-form Video Processing, Multimodal Understanding, Video-native Embeddings, Deep Alignment between Video and Language Embeddings
- Differences between multimodal vs traditional video analysis
- In what ways can malicious actors misuse this technology?
- The future of multimodal video understanding
Jae's favorite books:
- Deep Learning (Authors: Ian Goodfellow, Yoshua Bengio, Aaron Courville)
- The Giving Tree (Author: Shel Silverstein)
--------
Where to find Prateek Joshi:
Newsletter: https://prateekjoshi.substack.com
Website: https://prateekj.com
LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19
Twitter: https://twitter.com/prateekvjoshi
143集单集
所有剧集
×欢迎使用Player FM
Player FM正在网上搜索高质量的播客,以便您现在享受。它是最好的播客应用程序,适用于安卓、iPhone和网络。注册以跨设备同步订阅。