Skip to main content

The context you need, when you need it

When news breaks, you need to understand what actually matters — and what to do about it. At Vox, our mission to help you make sense of the world has never been more vital. But we can’t do it on our own.

We rely on readers like you to fund our journalism. Will you support our work and become a Vox Member today?

Join now

It’s disturbingly easy to trick AI into doing something deadly

How “adversarial attacks” can mess with self-driving cars, medicine, and the military.

Javier Zarracina/Vox
Sigal Samuel
Sigal Samuel is a senior reporter for Vox’s Future Perfect. She writes primarily about the future of consciousness, tracking advances in artificial intelligence and neuroscience and their staggering ethical implications. Before joining Vox, Sigal was the religion editor at the Atlantic.

Artificial intelligence researchers have a big problem. Even as they design powerful new technologies, hackers are figuring out how to trick the tech into doing things it was never meant to — with potentially deadly consequences.

The scariest part is that hackers can do this using something as simple as...stickers.

In a recent report, Tencent’s Keen Security Lab showed how they were able to bamboozle a Tesla Model S into switching lanes so that it drives directly into oncoming traffic. All they had to do was place three stickers on the road, forming the appearance of a line. The car’s Autopilot system, which relies on computer vision, detected the stickers and interpreted them to mean that the lane was veering left. So it steered the car that way.

If this were happening in the real world, the results could’ve been lethal. Luckily, it was an experiment designed by experts who were testing out the technology to make sure it’s invulnerable to hackers who may want to carry out so-called “adversarial attacks” on machine learning systems.

That’s a very real risk, and it’s becoming an increasing source of concern to AI researchers. It has serious implications for fields that rely heavily on AI, from self-driving cars to medicine to the military.

Machine learning is a type of AI that involves feeding computers example after example of something, until they “learn” to make their own determinations. The aim of adversarial machine learning is to trick the computers by feeding them inputs that’ll mess up their determinations.

Placing stickers on the road is one example of that. In another commonly cited example, researchers placed stickers on a stop sign to make a self-driving car think the sign says there’s a speed limit of 45 miles per hour. This can be done with other types of objects, too. Here’s a sticker that fools AI into thinking a banana is a toaster:

Responding to the Keen Security Lab report, Tesla co-founder and CEO Elon Musk said it was “solid work by Keen, as usual.” This is not the first time Keen, a leading security research team, has probed the vulnerabilities of a Tesla.

However, a Tesla spokesperson responding to the recent report said the vulnerability it identifies is “not a realistic concern given that a driver can easily override Autopilot at any time by using the steering wheel or brakes and should always be prepared to do so.” But that seems too flippant. Realistically, people in self-driving cars are not going to be prepared to jump into action at any moment, because the very premise of Autopilot will have conditioned them to think they can afford to let their minds wander.

UC Berkeley computer science professor Dawn Song, who studies adversarial attacks, says that within the research community, people are taking the risk of such attacks seriously. “Everyone has recognized the importance of this topic — researchers from Google and Facebook as well as Open AI are actively working in this domain,” she told me, adding that the last two years have seen “an explosion” of interest as AI gets more powerful, more ubiquitous, and therefore more dangerous.

Although an adversarial attack involving the use of stickers to fool AI hasn’t yet been observed in the real world, there’s a sense that it may not be long before bad actors try this sort of thing. “Once you understand how to do it,” Song said, “it’s very cheap and easy to do.”

The presence of these risks doesn’t mean we should jettison all AI and the many benefits it offers us. But it does mean we should be figuring out how to make our AI systems robust in the face of attacks. To do that, we need to use our imaginations to anticipate what hackers might come up with, staying always one step ahead of them.

How adversarial attacks could affect medicine, warfare, and more

Song has studied various types of adversarial machine learning methods, one of which MIT Technology Review sums up like this:

One project, conducted in collaboration with Google, involved probing machine-learning algorithms trained to generate automatic responses from e-mail messages (in this case the Enron email data set). The effort showed that by creating the right messages, it is possible to have the machine model spit out sensitive data such as credit card numbers. The findings were used by Google to prevent Smart Compose, the tool that auto-generates text in Gmail, from being exploited.

Another scenario looks at an adversarial attack that targets the health care system. A study by Harvard and MIT researchers, published last month in Science, showed how machine-learning systems can be fooled into participating in medical fraud.

Let’s say you’re a doctor and your patient has a mole. An image of it is fed into a machine-learning system, which correctly identifies it as benign. But then you add a “perturbation” to the image — a layer of pixels that changes how the system reads the underlying image. Suddenly the mole is classified as malignant. You claim that an excision is necessary and you request reimbursement for it. Because you’ve gamed the classification, the health insurance company is willing to dish out the money.

Demonstration of how adversarial attacks could target various medical AI systems
Demonstration of how adversarial attacks could target various medical AI systems
N. Cary / Science

The study authors point out that adversarial attacks could also be carried out with noble intentions. They imagine a hypothetical opioid risk algorithm and how it could be fooled:

Many adversarial attacks could be motivated by a desire to provide high-quality care. A hypothetical illustration can be drawn from the opioid crisis. In response to rampant overprescription of opiates, insurance companies have begun using predictive models to deny opiate prescription filings on the basis of risk scores computed at the patient or provider level. What if a physician, certain that she had a patient who desperately needed oxycontin but would nonetheless run afoul of the prescription authorization algorithm, could type a special pattern of algorithmically selected billing codes or specific phrases into the record to guarantee approval?

The authors argue that “the specific contours of the healthcare insurance industry make it a very feasible ground zero for the movement of adversarial attacks from theory to practice.”

The military implications are even more worrisome. “Imagine you’re in the military and you’re using a system that autonomously decides what to target,” Jeff Clune told The Verge in 2017. “What you don’t want is your enemy putting an adversarial image on top of a hospital so that you strike that hospital. Or if you are using the same system to track your enemies; you don’t want to be easily fooled [and] start following the wrong car with your drone.”

DARPA, the Defense Department’s advanced research agency, is actively studying the risks of adversarial attacks — and how to defend against them — through a recently launched program called Guaranteeing AI Robustness against Deception (GARD). Program Director Hava Siegelmann says GARD wants to make AI resistant to a wide array of attacks, and is looking to biology for inspiration about how to do that. “The kind of broad scenario-based defense we’re looking to generate can be seen, for example, in the immune system, which identifies attacks, wins and remembers the attack to create a more effective response during future engagements,” said Siegelmann.

Song has also been working on methods to increase the resilience of machine-learning systems. One of her recent papers looks at how you can identify a perturbation overlaid on an image by checking for consistency between different patches of the image. Since adversarial attackers will have no way of knowing which patches of the image you’re going to test for consistency, it’ll theoretically be hard for them to design a perturbation that evades detection.

“I believe this is a very promising direction going forward,” she told me.

It was a relief to hear a researcher sounding a hopeful note. Even with strong defenses against adversarial attacks, it’s scary to think what could be coming round the bend. Without them, it’s downright terrifying.


Sign up for the Future Perfect newsletter. Twice a week, you’ll get a roundup of ideas and solutions for tackling our biggest challenges: improving public health, decreasing human and animal suffering, easing catastrophic risks, and — to put it simply — getting better at doing good.

Future Perfect
The tax code rewards generosity. But probably not yours.The tax code rewards generosity. But probably not yours.
Future Perfect

Why giving to charity is a better deal if you’re rich.

By Sara Herschander
Technology
The case for AI realismThe case for AI realism
Technology

AI isn’t going to be the end of the world — no matter what this documentary sometimes argues.

By Shayna Korol
Climate
The electric grid’s next power source might be sitting in your drivewayThe electric grid’s next power source might be sitting in your driveway
Climate

Batteries that could help drive the switch to renewable energy are already, well, driving.

By Matt Simon
Future Perfect
Am I too poor to have a baby?Am I too poor to have a baby?
Future Perfect

How society convinced us that childbearing is morally wrong without a fat budget.

By Sigal Samuel
Future Perfect
How Austin’s stunning drop in rents explains housing in AmericaHow Austin’s stunning drop in rents explains housing in America
Future Perfect

We finally have some good news about housing affordability.

By Marina Bolotnikova
Future Perfect
Ozempic just got cheap enough to change the worldOzempic just got cheap enough to change the world
Future Perfect

Why the $14 drug could reshape global health.

By Pratik Pawar