Researchers are training AI to take our advice
In bid to keep the unexpected at bay, lab makes algorithms that accept human guidance
SAN FRANCISCO— At OpenAI, the artificial intelligence lab founded by Tesla’s chief executive, Elon Musk, machines are teaching themselves to behave like humans. But sometimes, this goes wrong.
Sitting inside OpenAI’s San Francisco offices on a recent afternoon, researcher Dario Amodei showed off an autonomous system that taught itself to play Coast Runners, an old boat-racing video game. The winner is the boat with the most points that also crosses the finish line.
The result was surprising: The boat was far too interested in the little green widgets that popped up on the screen. Catching these widgets meant scoring points. Rather than trying to finish the race, the boat went point-crazy. It drove in endless circles, colliding with other vessels, skidding into stone walls and repeatedly catching fire.
Amodei’s burning boat demonstrated the risks of the AI techniques that are rapidly remaking the tech world. Researchers are building machines that can learn tasks largely on their own. This is how Google’s DeepMind lab created a system that could beat the world’s best player at the ancient game of Go. But as these machines train themselves through hours of data analysis, they may find their way to unexpected, unwanted and perhaps harmful behaviour.
That’s a concern as such techniques move into online services, security devices and robotics. Now, a small community of AI researchers, including Amodei, is starting to explore mathematical techniques that aiming to keep the worst from happening.
At OpenAI, Amodei and his col- league Paul Christiano are developing algorithms that can not only learn tasks through hours of trial and error, but also receive guidance from human teachers along the way.
With a few clicks here and there, the researchers have a way of showing the autonomous system that it needs to win points in Coast Runners while also moving toward the finish line. They believe that these kinds of algorithms — a blend of human and machine instruction — can help keep automated systems safe.
For years, Musk, along with other pundits, philosophers and technologists, have warned machines could spin outside our control and learn malicious behaviour their designers didn’t anticipate. At times, these warnings have seemed overblown, given that today’s autonomous car systems can even get tripped up by the most basic tasks, such as recognizing a bike lane or a red light.
But researchers such as Amodei are trying to get ahead of the risks. In some ways, what these scientists are doing is a bit like a parent teaching a child right from wrong.
Many specialists in the AI field believe a technique called reinforcement learning — a way for machines to learn specific tasks through extreme trial and error — could be a primary path to artificial intelligence. Researchers specify a particular reward the machine should strive for and as it navigates a task at random, the machine keeps close track of what brings the reward and what doesn’t. When OpenAI trained its bot to play Coast Runners, the reward was more points.
This video-game training has realworld implications. If a machine can learn to navigate a racing game such as Grand Turismo, researchers believe, it can learn to drive a real car. This is why Amodei and Christiano are working to build reinforcement learning algorithms that accept human guidance along the way. This can ensure systems don’t stray from the task at hand.
Much of this work is still theoretical. But given the rapid progress of AI techniques and their growing importance across so many industries, researchers believe starting early is the best policy.
“There’s a lot of uncertainty around exactly how rapid progress in AI is going to be,” said Shane Legg, who oversees the AI safety work at DeepMind.