RT-2, Google’s New Breakthrough To Build Wall-E

Achieving Embodied Intelligence

7 min read20 hours ago

Since the release in November 2022 of ChatGPT, it seems that the whole world revolves around AI.

But don’t worry, this isn’t yet another ‘ChatGPT is amazing’ article, we’re talking about something more revolutionary today.

Google Deepmind's brand new robot, RT-2.

But even though robotic arms aren’t “new”, RT-2 is unparalleled in terms of what it can do. In fact, to create RT-2, Google had to create a new class of AI models, something completely unseen until now.

Embodied intelligence is here.

And, maybe, so is Wall-E.

A New Model Class

What society has achieved with AI in the last six months is absolutely incredible.

In simple terms, we’ve democratized the very first machines that communicate with humans the same way humans do, via natural language.

But the potential of AI far exceeds simple text, and many researchers have set their eyes on an even bigger prize.

On a path to building multimodality

In a recent podcast, AI superhero Andrew NG commented that Computer Vision, the AI field that trains models to process and understand images, was around “two years behind text-prompting”, but he expected it to become a revolution on par with those models.

And what he was referring to with “text-prompting” was none other than Large Language Models, or LLMs, of which ChatGPT is the greatest example.

But everybody knows that, if we want to build machines that are human-grade in terms of capabilities, we need much more than that, because we humans are multimodal.

In layman’s terms, we don’t build our understanding of our world based only on text. We have eyes, we have ears, we have touch… all our senses help us build a representation of what the world is.

In fact, these senses teach us about the world when we’re young, years earlier we learn to read our very first sentence.

Consequently, it’s only natural for AI researchers to want to build models that not only…

RT-2, Google’s New Breakthrough To Build Wall-E

Achieving Embodied Intelligence

A New Model Class

On a path to building multimodality

Read the full story with a free account.

Written by Ignacio de Gregorio

More from Ignacio de Gregorio

Microsoft Just Showed us the Future of ChatGPT with LongNet

Let’s talk about Billions

Meta’s New LLaMa AI Model is a Gift to the World

Spearheading Change

AI Company Valued at $260 Million. It’s Only Four Weeks Old.

Mistral AI, a Record-Breaking Story

Meet Pi, ChatGPT’s Newest Rival, and the Most “Human” AI in the World

It’s all about the conversation

Recommended from Medium

SpaceX Is In Trouble… Again

This is why Musk keeps screwing up Starship.

Meta’s New LLaMa AI Model is a Gift to the World

Spearheading Change

Lists

ChatGPT prompts

ChatGPT

AI Regulation

Predictive Modeling w/ Python

Why Half of Humanity Live in This Circle

It’s the result of one single accident

Microsoft Just Showed us the Future of ChatGPT with LongNet

Let’s talk about Billions

How Spain has unwittingly pioneered a (sort of) universal basic income model

Spain has long had one of the highest levels of unemployment in the European Union. To all intents and purposes, a modern and prosperous…

11 Mind-Blowing AI Apps for macOS to Get Mundane Tasks Done Quicker

These apps, no doubt, will level-up your productivity game