Skip to main content

The context you need, when you need it

When news breaks, you need to understand what actually matters — and what to do about it. At Vox, our mission to help you make sense of the world has never been more vital. But we can’t do it on our own.

We rely on readers like you to fund our journalism. Will you support our work and become a Vox Member today?

Join now

Watson claims to predict cancer, but who trained it to ‘think?’

Watson cannot read handwriting. Machine learning’s true potential is tied to human inputs.

The computer housing case for IBM’s Watson computer in New York City.
The computer housing case for IBM’s Watson computer in New York City.
The computer housing case for IBM’s Watson computer in New York City.
Andrew Spear / Getty

By beating humans at games of Go and Jeopardy, artificial intelligence engines like Google’s DeepMind and IBM’s Watson have captured attention for their promise of solving bigger human problems. Watson, for example, is being enlisted to help doctors predict cancer in patients.

The American internet pioneer Douglas Engelbart suggests that AI’s grandest promise is the amplification of human ability. Whether it’s automating rote cognitive tasks like tagging people in photos or assisting in complex work flows like cancer treatment, the human-augmentation promise feels almost inevitable in every product and domain.

Self-driving cars rely on massive amounts of data collected over several years from efforts like Google’s people-powered street canvassing, which provides the ability to "see" roads.

Data has crowned a new king in AI. In deep learning, the technical approach at the root of AI fever, every breakthrough in the last several years has occurred because there exists a large and highly accurate training dataset — a dataset that relies on human input. It turns out that progress toward Engelbart’s hypothesis of amplification of human ability requires massive human effort first, in order to actually power the AI.

The emergence of large and highly accurate datasets have allowed deep learning to “train” algorithms to recognize patterns in digital representations of sounds, images and other data that have led to remarkable breakthroughs, ones that outperform previous approaches in almost every application area. For example, self-driving cars rely on massive amounts of data collected over several years from efforts like Google’s people-powered street canvassing, which provides the ability to “see” roads (and was started to power services like Google Maps). The photos we upload and collectively tag as Facebook users have led to algorithms that can “see” faces. And even Google’s 411 audio directory service from a decade ago was suspected to be an effort to crowdsource data to train a computer to “hear” about businesses and their locations.

Watson’s promise to help detect cancer also depends on data: decades of doctor notes containing cancer patient outcomes. However, Watson cannot read handwriting. In order to access the data trapped in the historical doctor reports, researchers must have had to employ an army of people to painstakingly type and re-type (for accuracy) the data into computers in order to train Watson. This is yet another example of the substantial manual effort required to capture training data that is the core input of deep learning.

Watson’s promise to help detect cancer also depends on data — decades of doctor notes containing cancer patient outcomes. However, Watson cannot read handwriting.

Just as Watson researchers recognized that the keys to cancer prediction lie within oncologists’ backroom shelves, a growing number of technology leaders in health and other regulated industries are realizing that they are not data-poor. They are turning toward their paper processes and legacy paper archives and seeing the stacks and folders with the eyes of a digital prospector looking at her iron mountain.

Large insurance organizations are sifting through the hieroglyphics of massive collections of hundreds of millions of pages containing policyholder data using deep learning models from my company, Captricity. They are extracting data from death certificates so the next generation of insurance products can leverage what they recognize to be their sole business advantage: Training data that literally spans lifetimes.

In the nonprofit sector, PATH, a global health nonprofit, uses the same deep learning models to digitize data out of photos of bound clinical registers’ pages, so that kids who attend rural clinics can more efficiently get their vaccines. A recent effort has allowed PATH to find systematic tracking problems and reprioritize their efforts to keep Tanzanian kids healthy.

Modern AI is in an era of building the foundation for interpreting the most common mediums of human communication: Photos, videos, sounds and writing. For AI to become truly revolutionary as is hoped (and expected), able to do such things as predicting cancer, it must focus on fundamental capabilities before subsequent augmentation. The hype around the potential of future applications of AI should first ask the question, where did the training data come from?


Kuang Chen, PhD, is the founder and CEO of Captricity, a leading Data-as-a-Service (DaaS) company that transforms handwritten paper forms into digital data. On a mission to democratize data access, the company’s crowd-guided deep learning software helps organizations in both the public and private sectors fight expensive, time-consuming and ineffective paper processes. Reach him @kuang.

This article originally appeared on Recode.net.

More in Technology

Technology
The case for AI realismThe case for AI realism
Technology

AI isn’t going to be the end of the world — no matter what this documentary sometimes argues.

By Shayna Korol
Politics
OpenAI’s oddly socialist, wildly hypocritical new economic agendaOpenAI’s oddly socialist, wildly hypocritical new economic agenda
Politics

The AI company released a set of highly progressive policy ideas. There’s just one small problem.

By Eric Levitz
Future Perfect
Human bodies aren’t ready to travel to Mars. Space medicine can help.Human bodies aren’t ready to travel to Mars. Space medicine can help.
Future Perfect

Protecting astronauts in space — and maybe even Mars — will help transform health on Earth.

By Shayna Korol
Podcasts
The importance of space toilets, explainedThe importance of space toilets, explained
Podcast
Podcasts

Houston, we have a plumbing problem.

By Peter Balonon-Rosen and Sean Rameswaram
Technology
What happened when they installed ChatGPT on a nuclear supercomputerWhat happened when they installed ChatGPT on a nuclear supercomputer
Technology

How they’re using AI at the lab that created the atom bomb.

By Joshua Keating
Future Perfect
Humanity’s return to the moon is a deeply religious missionHumanity’s return to the moon is a deeply religious mission
Future Perfect

Space barons like Jeff Bezos and Elon Musk don’t seem religious. But their quest to colonize outer space is.

By Sigal Samuel