Cookie Notice

OpenAI

Artificial Intelligence / Machine Learning

A robot hand taught itself to solve a Rubik’s Cube after creating its own training regime

Researchers at OpenAI have developed a new method for transferring complex manipulation skills from simulated to physical environments.

Oct 15, 2019
OpenAI

Over a year ago, OpenAI, the San Francisco–based for-profit AI research lab, announced that it had trained a robotic hand to manipulate a cube with remarkable dexterity.

That might not sound earth-shattering. But in the AI world, it was impressive for two reasons. First, the hand had taught itself how to fidget with the cube using a reinforcement-learning algorithm, a technique modeled on the way animals learn. Second, all the training had been done in simulation, but it managed to successfully translate to the real world. In both ways, it was an important step toward more agile robots for industrial and consumer applications.

“I was kind of amazed,” says Leslie Kaelbling, a roboticist and professor at MIT, of the 21018 results. “It’s not a thing I would have imagined that they could have made to work.”

In a new paper today, OpenAI has released the latest results with its robotic hand, Dactyl. This time Dactyl has learned to solve a Rubik’s cube with one hand—once again through reinforcement learning in simulation. This is notable not so much because a robot cracked the old puzzle as because the achievement took a new level of dexterity.

“This is a really hard problem,” says Dmitry Berenson, a roboticist at the University of Michigan who specializes in machine manipulation. “The kind of manipulation required to rotate the Rubik’s cube’s parts is actually much harder than to rotate a cube.”

During testing, Dactyl successfully solved the Rubik's cube even under unexpected circumstances.
OpenAI

From the virtual to physical world

Traditionally, robots have only been able to manipulate objects in very simple ways. While reinforcement-learning algorithms have seen great success in achieving complex tasks in software, such as beating the best human player in the ancient game of Go, using them to train a physical machine has been a different story. That’s because the algorithms must refine themselves through trial and error—in many cases, millions of rounds of it. It would probably take much too long, and a lot of wear and tear, for a physical robot to do this in the real world. It could even be dangerous if the robot thrashed about wildly to collect data.

To avoid this, roboticists use simulation: they build a virtual model of their robot and train it virtually to do the task at hand. The algorithm learns in the safety of the digital space and can be ported into a physical robot afterwards. But that process comes with its own challenges. It’s nearly impossible to build a virtual model that exactly replicates all the same laws of physics, material properties, and manipulation behaviors seen in the real world—let alone unexpected circumstances. Thus, the more complex the robot and task, the more difficult it is to apply a virtually trained algorithm in physical reality.

This is what impressed Kaelbling about OpenAI’s results a year ago. The key to its success was that the lab scrambled the simulated conditions in every round of training to make the algorithm more adaptable to different possibilities.

“They messed their simulator up in all kinds of crazy ways,” Kaelbling says. “Not only did they change how much gravity there is—they changed which way gravity points. So by trying to construct a strategy that worked reliably with all of these crazy permutations of the simulation, the algorithm actually ended up working in the real robot.”

In the latest paper, OpenAI takes this technique one step further. Previously, the researchers had to randomize the parameters in the environment by hand-picking which permutations they thought would lead to a better algorithm. Now the training system does this by itself. Each time the robot reaches a certain level of mastery in the existing environment, the simulator tweaks its own parameters to make the training conditions even harder.

The result is an even more robust algorithm that can move at the precision required to rotate a Rubik’s cube in real life. Through testing, the researchers found that Dactyl also successfully solved the cube under various conditions that it hadn’t been trained on. For example, it was able to complete the task while wearing a rubber glove, while having a few fingers bound together, and while being prodded by a stuffed toy giraffe.

General-purpose robots

OpenAI believes the latest results provide strong evidence that their approach will unlock more general-purpose robots that can adapt in open-ended environments such as a home kitchen. “A Rubik’s cube is one of the most complicated rigid objects out there,” says Marcin Andrychowicz of OpenAI. “I think other objects won’t be much more complicated.”

Though there are more complex tasks that involve more objects or deformable objects, he says, he feels confident that the lab’s method can train robots for all of them: “I think this approach is the approach to widespread adoption of robotics.”

Both Berenson and Kaelbling, however, remain skeptical. “There can be an impression that there’s one unified theory or system, and now OpenAI’s just applying it to this task and that task,” Berenson says of the previous and current paper. “But that’s not what’s happening at all. These are isolated tasks. There are common components, but there’s also a huge amount of engineering here to make each new task work.”

“That’s why I feel a little bit uncomfortable with the claims about this leading to general-purpose robots,” he says. “I see this as a very specific system meant for a specific application.”

Part of the problem, Berenson believes, is reinforcement learning itself. By nature, the technique is designed to master one particular thing, with some flexibility for handling variations. But in the real world, the number of potential variations extends beyond what can reasonably be simulated. In a cleaning task, for example, you could have different kinds of mops, different kinds of spills, and different kinds of floors.

Reinforcement learning is also designed for learning new capabilities largely from scratch. That is neither efficient in robotics nor true to how humans learn. “If you’re already a reasonably competent human and I tried to teach you a motor skill in the kitchen—like maybe you’ve never whipped something with a spoon—it’s not like you have to learn your whole motor control over again,” says Kaelbling.

Moving beyond these limitations, Berenson argues, will require other, more traditional robotics techniques. “There will be some learning processes—probably reinforcement learning—at the end of the day,” he says. “But I think that those actually should come much later.”

×

You've read 1/3 of your free monthly feature stories. Subscribe for unlimited access.

The US Securities and Exchange Commission (SEC) has just halted Telegram's massive—and massively hyped—$2 billion digital token sale....

Those halcyon days: In early 2018, exuberant investors poured billions of dollars into Telegram’s ambitious plan to launch a global cryptocurrency network. In return, they got rights to digital tokens that Telegram, a messaging app with 300 million monthly users, promised would be useful on its future network. That was slated to launch by October 31 of this year.

The news: That initial exuberance has turned to uncertainty. According to the SEC, Telegram Group and a subsidiary company called TON Issuer Inc. conducted an illegal sale of unregistered securities in the US. The defendants sold 2.9 billion digital tokens, called “Grams,” to 171 investors around the world, raising $1.7 billion. Since $425 million worth was sold in the US, the SEC, which is supposed to look out for US investors, was paying attention.

Since the planned network hasn’t yet launched, and the tokens can’t be used for anything yet, they are subject to the same kinds of strict regulations that govern stocks and bonds, says the SEC. That means they should have been registered with the agency, and Telegram should have provided investors with information about its business operations, financial condition, management, and risk factors. “We have repeatedly stated that issuers cannot avoid the federal securities laws just by labeling their product a cryptocurrency or a digital token,” Steven Peikin, co-director of the SEC’s division of enforcement, said in a statement.

What now? According to Bloomberg, the company is “evaluating ways to resolve the agency’s concerns,” and in fact been in talks with the SEC for 18 months. The company said it was “surprised and disappointed” by the lawsuit, and now it may delay the launch of its network.

The takeaway: The SEC is not done cleaning up the mess created by the initial coin offering boom of 2017 and early 2018. Its lawsuit against Telegram comes just two weeks after it settled with Block.One, which had raised $4 billion via a token sale to finance the development of the EOS blockchain network.

It’s not clear why Block.One, which had to pay a $24 million penalty, didn’t get as harsh a penalty as Telegram (though it could be that Block.One was more cooperative with the SEC). But the bottom line is what Peikin said: calling something a cryptocurrency doesn’t exempt it from existing laws.

Keep up with the fast-moving and sometimes baffling world of cryptocurrencies and blockchains with our weekly newsletter Chain Letter. Subscribe here. It’s free!

Expand

Astronomy

Astronomers have spotted a toddler dwarf galaxy 9.4 billion light-years away by turning a cluster of other galaxies into a magnifying glass for x-rays, according to a new paper published in Nature...

What is it: The dwarf galaxy is one 10,000th the size of the Milky Way itself, yet it is brimming with activity. It is currently going through a phase of intense new star formation, resulting in high-energy x-rays pulsing through the region. This is the first time scientists have ever been able to watch this sort of galaxy life stage with x-ray observations. 

How did they do it? Galaxy clusters induce gravitational effects on surrounding matter and energy by bending and magnifying it as a glass of water might with a beam of light. This is called gravitational lensing, and scientists can use this phenomenon to study and determine where different electromagnetic signals hailing from other parts of the universe originate from. 

Gravitational lensing has never before been used to study x-ray emissions, but the principles apply just the same as they would for light. The team used NASA’s Chandra X-ray Observatory to study the Phoenix cluster, 5.7 billion light-years away—a structure that’s a quadrillion times more massive than the sun. It’s a perfect natural lens to magnify x-ray emissions as they pass through. 

After subtracting for x-rays coming from the Phoenix cluster itself, the team found “lensed” emissions magnified 60 times over, coming from a dwarf galaxy 9.4 billion light-years away, born when the universe was a third its current age.

So what? What exactly happened in the first 5 billion years of the universe is still pretty murky. The authors of the study say the new detection shows it’s possible to use natural x-ray magnifiers to identify things that were born soon after the Big Bang. Tools like Chandra could now be used to comb over other aspects of the ancient universe and solve cosmological questions in far greater detail.

Expand

The news: In the third quarter of this year 40% of the UK’s electricity came from renewables like wind, biomass, and solar, while fossil fuels—virtually all gas in the UK’s case—made up 39%, according...

Why? The milestone is largely due to a few new offshore wind farms that came online from July to September this year. The wind farm industry in the UK is in the midst of a boost as turbines become bigger and more efficient, making projects easier to justify commercially.

But …. It’s worth noting that 12% of electricity in the quarter came from biomass, which may count as renewable, but isn't necessarily carbon free. “In some circumstances [biomass] could lead to higher emissions than from fossil fuels,” Carbon Brief noted, adding that the Committee on Climate Change has urged the UK to shift away from biomass power plants.

The global context: The country has made rapid progress, considering that fossil fuels made up four-fifths of its electricity just a decade ago. However, others are further ahead still, and it’s worth remembering the UK accounts for a tiny proportion of global carbon emissions anyway. Germany passed the same milestone as the UK last year, while Sweden met it seven years ago. The power grids in Iceland, Norway, and Costa Rica run almost entirely on renewable energy already.

However: In the grand scheme of things, three countries matter most: China, the US, and India. Between them, they account for about half of all global carbon emissions. China’s share of renewables in its power-generation mix is almost 27%. In the US it’s just 18%.

Update: This piece was updated to note that biomass produces climate emissions.

Expand

Ever wondered whether people were happier in the past? We now have a much better idea, thanks to a new technique that involves analyzing the sentiment behind the words used in millions of pieces of...

The study: A team of researchers, led by Thomas Hills at the University of Warwick, analyzed 8 million books and 65 million newspaper articles published between 1820 and 2009. They assigned happiness scores to thousands of words in different languages and then calculated the relative proportion of positive and negative language for the four different countries.

These scores were used to create historical happiness indices for the UK, the US, Germany, and Italy. The researchers took into account the fact that certain words change their meaning over time (gay, for example). The collection was drawn from Google Books, and it represents a digitized record of more than 6% of all books ever published. The method was validated by comparing the findings with survey data on well-being from the 1970s, collected through about 1,000 face-to-face interviews each year in every European Union country (the Eurobarometer). Their study is published in Nature Human Behavior today. 

The findings: There is an awful lot of data to work through, and the findings probably won’t be hugely surprising to anyone with a decent grasp of world history (both world wars made people generally very unhappy, for example). The low point of happiness in the US was around the time of the Fall of Saigon in 1975, while for the UK it was the Winter of Discontent of 1978-79, when there were widespread public sector strikes.  

The big picture: New ways to measure well-being and happiness could help to inform national policies. The UK’s statistics authority, for example, has been measuring levels of national well-being since 2010. And this year New Zealand included national well-being as an official metric in its economic planning. Using written material to provide historical context could prove invaluable to the fledgling discipline. 

Sign up here for our daily newsletter The Download to get your dose of the latest must-read news from the world of emerging tech.

Expand

Sponsored

AI in health care: Capacity, capability, and a future of active health in Asia

To address a critical gap in health-care resources, the region is becoming a center of innovation in health-care artificial intelligence, robotics, and automation.