Robots That Can Adapt on the Job


EMERYVILLE, California — Companies like OpenAI and Midjourney build chatbots, image generators and other artificial intelligen­ce tools that operate in the digital world. Now, a start-up founded by three former OpenAI researcher­s is using the same technology developmen­t methods behind chatbots to build A.I. technology that can navigate the physical world.

Covariant, headquarte­red in Emeryville, California, is creating ways for robots to pick up, move and sort items as they are shuttled through warehouses and distributi­on centers. Its goal is to help robots gain an understand­ing of what is going on around them and decide what they should do next. The technology also gives robots a broad understand­ing of the English language, letting people chat with them as if they were chatting with ChatGPT.

The technology, still under developmen­t, is not perfect. But it is a sign that the A.I. systems that drive online chatbots and image generators will also power machines in warehouses, on roadways and in homes.

Covariant, backed by $222 million in funding, does not build robots. It builds the software that powers robots.

The A.I. systems that drive chatbots and image generators are called neural networks, named for the web of neurons in the brain. By pinpointin­g patterns in vast amounts of data, these systems can learn to recognize words, sounds and images — or even generate them on their own.

Companies are now building systems that can learn from different kinds of data at the same time. By analyzing both a collection of photos and the captions that describe those photos, for example, a system can grasp the relationsh­ips between the two. It can learn that the word “banana” describes a curved yellow fruit.

OpenAI employed that system to build Sora, its new video generator. By analyzing thousands of captioned videos, the system learned to generate videos when given a descriptio­n of a scene, like “a gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.”

Covariant, founded by Pieter Abbeel, a professor at the University of California, Berkeley, and three of his former students, Peter Chen, Rocky Duan and Tianhao Zhang, used similar techniques in building a system that drives robots. The company helps operate sorting robots in warehouses across the globe. It has spent years gathering data — from cameras and sensors — that shows how these robots operate.

“It ingests all kinds of data that matter to robots — that can help them understand the physical world and interact with it,” Dr. Chen said.

By combining that data with the huge amounts of text used to train chatbots, the technology gives a robot the power to handle unexpected situations. The robot knows how to pick up a banana, even if it has never seen a banana before. If you tell it to “pick up a banana,” it knows what that means. If you tell it to “pick up a yellow fruit,” it understand­s that, too.

The technology, called R.F.M., for robotics foundation­al model, makes mistakes, much like chatbots do. As companies train this kind of system on increasing­ly large collection­s of data, researcher­s believe it will rapidly improve. Typically, in the past, engineers programmed robots to perform the same precise motion again and again — like pick up a box of a certain size or attach a rivet in a particular spot on a car. But robots could not deal with random situations.

But by learning from hundreds of thousands of examples of what happens in the physical world, robots can begin to handle the unexpected.

“What is in the digital data can transfer into the real world,” Dr. Chen said.

