Why is everyone in AI talking about world models?
Some AI luminaries are positioning them as the next big phase for AI progress.
• 5 min read
For years, the startup Runway has built its reputation as a purveyor of AI in Hollywood, signing on studios and filmmakers to use its video models. But the company recently opened up a new line of business aimed at a wider clientele, including robotics companies and video game makers.
Runway’s new family of world models is designed to combine the photorealistic imagery it offers moviemakers with physics prompts that will generate fully simulated real-world environments.
“Our perspective is that world models are really the most important problem that we need to solve in order to further advance the field,” Runway CTO and co-founder Anastasis Germanidis told us. “The next stage will be about building systems that can interact with the physical world and understand the physical world. And text alone cannot get us there.”
After two years in the works, the project is arriving at a buzzy time for the concept of world models. Some AI luminaries argue that the scaling laws that have allowed AI labs to squeeze better performance from ever-bigger models won’t hold much longer. World models—billed in some cases as a way to better orient foundation models in real-life physical environments—have been floated as a next phase for AI progress.
Early machine learning pioneer Yann LeCun is reportedly preparing to leave his longtime post as Meta’s chief scientist to found a startup focused on world models. World Labs, founded by computer vision pioneer Fei-Fei Li, recently launched its first commercial world model, Marble. Google, Nvidia, and Meta have built their own, too.
“I’ve been not making friends in various corners of Silicon Valley, including at Meta, saying that within three to five years, this [world models, not LLMs] will be the dominant model for AI architectures, and nobody in their right mind would use LLMs of the type that we have today,” LeCun said at a recent MIT symposium, according to The Wall Street Journal.
What in the world?
But what exactly is a world model? Like “agents” before it, the term is definitionally vague, not actually new, and in danger of becoming freighted with hype, experts told us. At their most generalized, world models are a representation of the physical world that capture the relationships between objects and can predict how they will behave over time.
“A lot of people look at world models as something that can understand how the world changes. And you can interpret that sentence in many different ways,” Ranjay Krishna, an assistant professor at the University of Washington and researcher at the Allen Institute for AI. “One simple version of this interpretation is to say that if I was to take an action, like push something, what would happen? Maybe something might fall down, maybe it might collide with something else. Being able to predict how things would change, how the future states of the world are going to look…that’s one interpretation.”
Keep up with the innovative tech transforming business
Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.
Krishna said other interpretations might include a more viewpoint-oriented conception of a fixed world or an understanding of social actors within a world.
World models can also refer to a system of understanding within foundation models that might help them fill in cognitive gaps, Eric Landau, CEO and co-founder of AI data platform Encord, told us. Encord offers what it claims is the world’s largest open-source multimodal dataset, which can help train world models.
“[LLMs] take in statistical patterns, and they output other statistical patterns without having a deeper understanding of what is driving those statistical patterns,” Landau said. “It’s fundamentally missing some of the components that make human thinking the core of what it is. We can reason from first principles. We have a deeper understanding of what happens around us.”
For Runway, Germanidis said building a world model was a natural continuation of work the company was already doing on its video models (which are sort of “a poor man’s world model,” in Krishna’s words). Runway’s new family of world models includes first-person navigation through a generated world, a simulation specific to robotics, and one involving conversational avatar characters.
“Data collection in robotics is super slow,” Germanidis said. “[The new use case is] essentially providing a simulated environment where you can run your robotics model.”
Roadblocks ahead
While the concept of world models is not new, the term may be gaining traction now because of advances in image and video generation models, which can serve as a base for a world model, Krishna said. But major challenges lie ahead in scaling up world models, he added. One is the amount of available data.
“A lot of [existing] video data is not directly usable,” Krishna said. “You need to figure out how the world is changing, how the camera is moving around in that world, and then be able to encode that into your model somehow. And that’s not very easy.”
Another potential hurdle is the massive amount of compute needed for video models, which dwarfs even the demands of LLMs, according to Krishna. And plotting out a world is a much more structured process than stringing together a sentence, he said.
“If this continues, we likely will get to a point where things are going to really start speeding up…With the amount of investment and the amount of people interested in this space, there is a good chance that we’ll figure out some of those big challenges,” Krishna said. “I think that’s why people are excited. It’s likely that we’re going to see more startups. But just like with any other technology, I’m sure that we will see that same sort of plateau. But the question is, ‘How long until we see that plateau?’”
Keep up with the innovative tech transforming business
Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.