Skip to main content

The context you need, when you need it

When news breaks, you need to understand what actually matters — and what to do about it. At Vox, our mission to help you make sense of the world has never been more vital. But we can’t do it on our own.

We rely on readers like you to fund our journalism. Will you support our work and become a Vox Member today?

Join now

An AI learned to play hide-and-seek. The strategies it came up with on its own were astounding.

A new release from OpenAI shows how complex behavior emerges.

Cartoon-like figures peek around a corner at one another.
Cartoon-like figures peek around a corner at one another.
Kelsey Piper
Kelsey Piper is a contributing editor at Future Perfect, Vox’s effective altruism-inspired section on the world’s biggest challenges. She explores wide-ranging topics like climate change, artificial intelligence, vaccine development, and factory farms, and also writes the Future Perfect newsletter.

This week, leading AI lab OpenAI released their latest project: an AI that can play hide-and-seek. It’s the latest example of how, with current machine learning techniques, a very simple setup can produce shockingly sophisticated results.

The AI agents play a very simple version of the game, where the “seekers” get points whenever the “hiders” are in their field of view. The “hiders” get a little time at the start to set up a hiding place and get points when they’ve successfully hidden themselves; both sides can move objects around the playing field (like blocks, walls, and ramps) for an advantage.

The results from this simple setup were quite impressive. Over the course of 481 million games of hide-and-seek, the AI seemed to develop strategies and counterstrategies, and the AI agents moved from running around at random to coordinating with their allies to make complicated strategies work. (Along the way, they showed off their ability to break the game physics in unexpected ways, too; more on that below.)

It’s the latest example of how much can be done with a simple AI technique called reinforcement learning, where AI systems get “rewards” for desired behavior and are set loose to learn, over millions of games, the best way to maximize their rewards.

Reinforcement learning is incredibly simple, but the strategic behavior it produces isn’t simple at all. Researchers have in the past leveraged reinforcement learning among other techniques to build AI systems that can play complex wartime strategy games, and some researchers think that highly sophisticated systems could be built just with reinforcement learning. This simple game of hide-and-seek makes for a great example of how reinforcement learning works in action and how simple instructions produce shockingly intelligent behavior. AI capabilities are continuing to march forward, for better or for worse.

You can watch the whole video here, or check out these highlights.

The first lesson: how to chase and hide

It may have taken a few million games of hide-and-seek, but eventually the AI agents figured out the basics of the game: chasing one another around the map.

OpenAI via YouTube

The second lesson: how to build a defensive shelter

AI agents have the ability to “lock” blocks in place. Only the team that locked a block can unlock it. After millions of games of practice, the AI agents learned to build a shelter out of the available blocks; you can see them doing that here. In the shelter, the “seeker” agents can’t find them, so this is a win for the “hiders” — at least until someone comes up with a new idea.

OpenAI, via YouTube

Using ramps to breach a shelter

Millions of generations later, the seekers have figured out how to handle this behavior by the “hiders”: they can drag a ramp over, climb the ramp, and find the hiders.

OpenAI via YouTube

After a while, the hiders learned a counterattack: they could freeze the ramps in place so the seekers couldn’t move them. OpenAI’s team notes that they thought this would be the end of the game, but they were wrong.

Box surfing to breach shelters

Eventually, seekers learned to push a box over to the frozen ramps, climb onto the box, and “surf” it over to the shelter where they can once again find the hiders.

OpenAI via YouTube

Defending against box surfing

There’s an obvious counterstrategy for the hiders here: freezing everything around so the seekers have no tools to work with. Indeed, that’s what they learn how to do.

Open AI via YouTube

That’s how a game of hide-and-seek between AI agents with millions of games of experience goes. The interesting thing here is that none of the behavior on display was directly taught or even directly rewarded. Agents only get rewards when they win the game. But that simple incentive was enough to encourage lots of creative in-game behavior.

Many AI researchers think that reinforcement learning can be used to solve complicated tasks with real-world implications, too. The way powerful strategic decision-making emerges from simple instructions is promising — but it’s also concerning. Solving problems with reinforcement learning leads, as we’ve seen, to lots of unexpected behavior — charming in a game of hide-and-seek, but potentially alarming in a drug meant to treat cancer (if the unintended behavior causes life-threatening complications) or an algorithm meant to improve power plant output (if the AI arranges to exploit some obscure condition in its goals rather than simply provide consistent power).

That’s the hazardous flip side of techniques like reinforcement learning. On the one hand, they’re powerful techniques that can produce advanced behavior from a simple starting point. On the other hand, they’re powerful techniques that can produce unexpected — and sometimes undesired — advanced behavior from a simple starting point.

As AI systems grow more powerful, we need to give careful consideration to how to ensure they do what we want.

Sign up for the Future Perfect newsletter. Twice a week, you’ll get a roundup of ideas and solutions for tackling our biggest challenges: improving public health, decreasing human and animal suffering, easing catastrophic risks, and — to put it simply — getting better at doing good.

See More:
Future Perfect
The tax code rewards generosity. But probably not yours.The tax code rewards generosity. But probably not yours.
Future Perfect

Why giving to charity is a better deal if you’re rich.

By Sara Herschander
Technology
The case for AI realismThe case for AI realism
Technology

AI isn’t going to be the end of the world — no matter what this documentary sometimes argues.

By Shayna Korol
Climate
The electric grid’s next power source might be sitting in your drivewayThe electric grid’s next power source might be sitting in your driveway
Climate

Batteries that could help drive the switch to renewable energy are already, well, driving.

By Matt Simon
Future Perfect
Am I too poor to have a baby?Am I too poor to have a baby?
Future Perfect

How society convinced us that childbearing is morally wrong without a fat budget.

By Sigal Samuel
Future Perfect
How Austin’s stunning drop in rents explains housing in AmericaHow Austin’s stunning drop in rents explains housing in America
Future Perfect

We finally have some good news about housing affordability.

By Marina Bolotnikova
Future Perfect
Ozempic just got cheap enough to change the worldOzempic just got cheap enough to change the world
Future Perfect

Why the $14 drug could reshape global health.

By Pratik Pawar