A Beginner’s Guide to Multimodal AI

Dev Jadhav
MLwithDev
Published in
13 min read6 days ago

A Primer on Multimodal LLMs

When OpenAI’s Sora made its grand entrance in Feb 2024, creating lifelike videos with ease, it left many quite amazed. Sora, a leading example of a multimodal LLM (MM-LLM), uses text to guide video generation — a promising field of research that’s been on the rise for some years. Over the past year, MM-LLMs have seen exceptional progress, ushering in a new age of AI that can handle and create content in multiple modes. These MM-LLMs mark a big step beyond traditional LLMs, pulling in information from text, images, and audio…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Dev Jadhav
MLwithDev

Dev Jadhav: AI/ML Architect in Eindhoven, blending cloud tech and ML for innovative, impactful solutions. AWS, Azure, GCP expert.