Future Perfect

It’s disturbingly easy to trick AI into doing something deadly

How “adversarial attacks” can mess with self-driving cars, medicine, and the military.

by Sigal Samuel

Apr 8, 2019, 1:10 PM UTC

Javier Zarracina/Vox

Part Of

The rapid development of AI has benefits — and poses serious risks

see all updates

Sigal Samuel is a senior reporter for Vox’s Future Perfect. She writes primarily about the future of consciousness, tracking advances in artificial intelligence and neuroscience and their staggering ethical implications. Before joining Vox, Sigal was the religion editor at the Atlantic.

Artificial intelligence researchers have a big problem. Even as they design powerful new technologies, hackers are figuring out how to trick the tech into doing things it was never meant to — with potentially deadly consequences.

The scariest part is that hackers can do this using something as simple as...stickers.

In a recent report, Tencent’s Keen Security Lab showed how they were able to bamboozle a Tesla Model S into switching lanes so that it drives directly into oncoming traffic. All they had to do was place three stickers on the road, forming the appearance of a line. The car’s Autopilot system, which relies on computer vision, detected the stickers and interpreted them to mean that the lane was veering left. So it steered the car that way.

If this were happening in the real world, the results could’ve been lethal. Luckily, it was an experiment designed by experts who were testing out the technology to make sure it’s invulnerable to hackers who may want to carry out so-called “adversarial attacks” on machine learning systems.

That’s a very real risk, and it’s becoming an increasing source of concern to AI researchers. It has serious implications for fields that rely heavily on AI, from self-driving cars to medicine to the military.

Machine learning is a type of AI that involves feeding computers example after example of something, until they “learn” to make their own determinations. The aim of adversarial machine learning is to trick the computers by feeding them inputs that’ll mess up their determinations.

Placing stickers on the road is one example of that. In another commonly cited example, researchers placed stickers on a stop sign to make a self-driving car think the sign says there’s a speed limit of 45 miles per hour. This can be done with other types of objects, too. Here’s a sticker that fools AI into thinking a banana is a toaster:

Responding to the Keen Security Lab report, Tesla co-founder and CEO Elon Musk said it was “solid work by Keen, as usual.” This is not the first time Keen, a leading security research team, has probed the vulnerabilities of a Tesla.

However, a Tesla spokesperson responding to the recent report said the vulnerability it identifies is “not a realistic concern given that a driver can easily override Autopilot at any time by using the steering wheel or brakes and should always be prepared to do so.” But that seems too flippant. Realistically, people in self-driving cars are not going to be prepared to jump into action at any moment, because the very premise of Autopilot will have conditioned them to think they can afford to let their minds wander.

UC Berkeley computer science professor Dawn Song, who studies adversarial attacks, says that within the research community, people are taking the risk of such attacks seriously. “Everyone has recognized the importance of this topic — researchers from Google and Facebook as well as Open AI are actively working in this domain,” she told me, adding that the last two years have seen “an explosion” of interest as AI gets more powerful, more ubiquitous, and therefore more dangerous.

Although an adversarial attack involving the use of stickers to fool AI hasn’t yet been observed in the real world, there’s a sense that it may not be long before bad actors try this sort of thing. “Once you understand how to do it,” Song said, “it’s very cheap and easy to do.”

The presence of these risks doesn’t mean we should jettison all AI and the many benefits it offers us. But it does mean we should be figuring out how to make our AI systems robust in the face of attacks. To do that, we need to use our imaginations to anticipate what hackers might come up with, staying always one step ahead of them.

How adversarial attacks could affect medicine, warfare, and more

Song has studied various types of adversarial machine learning methods, one of which MIT Technology Review sums up like this:

One project, conducted in collaboration with Google, involved probing machine-learning algorithms trained to generate automatic responses from e-mail messages (in this case the Enron email data set). The effort showed that by creating the right messages, it is possible to have the machine model spit out sensitive data such as credit card numbers. The findings were used by Google to prevent Smart Compose, the tool that auto-generates text in Gmail, from being exploited.

Another scenario looks at an adversarial attack that targets the health care system. A study by Harvard and MIT researchers, published last month in Science, showed how machine-learning systems can be fooled into participating in medical fraud.

Let’s say you’re a doctor and your patient has a mole. An image of it is fed into a machine-learning system, which correctly identifies it as benign. But then you add a “perturbation” to the image — a layer of pixels that changes how the system reads the underlying image. Suddenly the mole is classified as malignant. You claim that an excision is necessary and you request reimbursement for it. Because you’ve gamed the classification, the health insurance company is willing to dish out the money.

Demonstration of how adversarial attacks could target various medical AI systems

N. Cary / Science

The study authors point out that adversarial attacks could also be carried out with noble intentions. They imagine a hypothetical opioid risk algorithm and how it could be fooled:

Many adversarial attacks could be motivated by a desire to provide high-quality care. A hypothetical illustration can be drawn from the opioid crisis. In response to rampant overprescription of opiates, insurance companies have begun using predictive models to deny opiate prescription filings on the basis of risk scores computed at the patient or provider level. What if a physician, certain that she had a patient who desperately needed oxycontin but would nonetheless run afoul of the prescription authorization algorithm, could type a special pattern of algorithmically selected billing codes or specific phrases into the record to guarantee approval?

The authors argue that “the specific contours of the healthcare insurance industry make it a very feasible ground zero for the movement of adversarial attacks from theory to practice.”

The military implications are even more worrisome. “Imagine you’re in the military and you’re using a system that autonomously decides what to target,” Jeff Clune told The Verge in 2017. “What you don’t want is your enemy putting an adversarial image on top of a hospital so that you strike that hospital. Or if you are using the same system to track your enemies; you don’t want to be easily fooled [and] start following the wrong car with your drone.”

DARPA, the Defense Department’s advanced research agency, is actively studying the risks of adversarial attacks — and how to defend against them — through a recently launched program called Guaranteeing AI Robustness against Deception (GARD). Program Director Hava Siegelmann says GARD wants to make AI resistant to a wide array of attacks, and is looking to biology for inspiration about how to do that. “The kind of broad scenario-based defense we’re looking to generate can be seen, for example, in the immune system, which identifies attacks, wins and remembers the attack to create a more effective response during future engagements,” said Siegelmann.

Song has also been working on methods to increase the resilience of machine-learning systems. One of her recent papers looks at how you can identify a perturbation overlaid on an image by checking for consistency between different patches of the image. Since adversarial attackers will have no way of knowing which patches of the image you’re going to test for consistency, it’ll theoretically be hard for them to design a perturbation that evades detection.

“I believe this is a very promising direction going forward,” she told me.

It was a relief to hear a researcher sounding a hopeful note. Even with strong defenses against adversarial attacks, it’s scary to think what could be coming round the bend. Without them, it’s downright terrifying.

Sign up for the Future Perfect newsletter. Twice a week, you’ll get a roundup of ideas and solutions for tackling our biggest challenges: improving public health, decreasing human and animal suffering, easing catastrophic risks, and — to put it simply — getting better at doing good.

You’ve read 1 article in the last month

Here at Vox, we're unwavering in our commitment to covering the issues that matter most to you — threats to democracy, immigration, reproductive rights, the environment, and the rising polarization across this country.

Our mission is to provide clear, accessible journalism that empowers you to stay informed and engaged in shaping our world. By becoming a Vox Member, you directly strengthen our ability to deliver in-depth, independent reporting that drives meaningful change.

We rely on readers like you — join us.