Skip to main content

The context you need, when you need it

When news breaks, you need to understand what actually matters — and what to do about it. At Vox, our mission to help you make sense of the world has never been more vital. But we can’t do it on our own.

We rely on readers like you to fund our journalism. Will you support our work and become a Vox Member today?

Join now

Medicine’s Big Problem with Big Data: Information Hoarding

Information that may offer medical insights has been locked away in the filing cabinets of doctors’ offices.

iStock / Palau83

Researchers at IBM, Berg Pharma, Memorial Sloan Kettering, UC Berkeley and other institutions are exploring how artificial intelligence and big data can be used to develop better treatments for diseases (as we explored in a separate story on Saturday).

But one of the biggest challenges for making full use of these computational tools in medicine is that vast amounts of data have been locked away — or never digitized in the first place.

The results of earlier research efforts or the experiences of individual patients are often trapped in the archives of pharmaceutical companies or the paper filing cabinets of doctors’ offices.

Patient privacy issues, competitive interests and the sheer lack of electronic records have prevented information sharing that could potentially reveal broader patterns in what appeared to any single doctor like an isolated incident.

When you can analyze clinical trials, genomic data and electronic medical records for 100,000 patients, “you see patterns that you don’t notice in a couple,” said Michael Keiser, an instructor at the UC San Francisco School of Medicine.

Given that promise, a number of organizations are beginning to pull together medical data sources.

Late last year, the American Society of Clinical Oncology announced the initial development of CancerLinQ, a “rapid learning system” that allows researchers to enter, access and analyze anonymized medical records of cancer patients.

Similarly, in April the CEO Roundtable on Cancer, a nonprofit representing major pharmaceutical companies, announced the launch of Project Data Sphere. It’s an open platform populated with clinical datasets from earlier Phase III studies conducted by AstraZeneca, Bayer, Celgene, Memorial Sloan Kettering, Pfizer, Sanofi and others.

The data has been harmonized and scrubbed of patient identifying details, enabling independent researchers or those working for life sciences companies to use it freely. They have access to built-in analytical tools, or can plug the data into their own software.

It might uncover little known drug candidates that showed some effectiveness against certain mutations, but were basically abandoned when they didn’t directly attack the principle target of a particular study, said Dr. Martin Murphy, chief executive of the CEO Roundtable on Cancer.

In some cases, it could also eliminate the need for control groups — those who receive the standard of care plus a placebo instead of the experimental treatment — since earlier studies have already indicated the outcomes for those patients. (That would be an important development because the fear of receiving a placebo is a major reason many patients decide against participating in clinical trials.)

The effort is happening now in part because of improving technology and in part because companies are coming around to the view that they’ll all be better off with the insights gleaned from this pooled data.

“It’s a recognition that it’s costing a lot more money to develop another drug,” Murphy said. “The low-hanging fruit was long ago harvested.”

Other information sharing efforts include the Global Alliance for Genomics and Health, the molecular databases maintained by EMBL-EBI and the National Institute of Health’s Biomarker Consortium.

Meanwhile, last month Google Ventures led a $130 million round in Flatiron Health, which has built an “oncology cloud” that aggregates information from billing systems and electronic medical records.

The system makes sense of data stored in inconsistent and unstructured formats from doctors offices and hospitals, to enable analysis of what’s happening across broad cancer patient populations. Ideally it can highlight what’s working for which types of cancer patients.

“Flatiron is focused on what we (and the industry) call ‘real world’ patient clinical data, whereby we’re trying to aggregate and organize data on the 96 percent of patients who do not participate in a prospective clinical trial,” co-founder Nat Turner said in an email.

“To really understand what’s working and how others are treating and what outcomes are being achieved, institutions should be open to de-identified data sharing and anonymous benchmarking, which is part of the Flatiron vision,” he said.

To be sure, there is good reason to proceed with some caution here. Medical information is highly sensitive, so any privacy risks demand careful consideration.

Supposedly “de-identified” data has proven to be anything but on several notable occasions in the past (including here, here and here). And electronic medical records have been compromised already.

But to the degree that there’s a social tradeoff here, many come down firmly on the side of: let’s try to save lives. Old habits and out-of-date regulations still mean the shift isn’t happening nearly fast enough if you ask David Patterson, a professor of computer science at UC Berkeley developing machine learning tools for cancer research.

“Those of us in the computer field are used to Internet time and Moore’s law,” he said. “For me as an outsider, it’s very frustrating that we can’t get bureaucratic agreement so that we can collect lots of data sets together.”

“Patient privacy is important but so is making progress on cancer,” he said. “The upside of collecting lots of information together is we can make progress on this terrible disease.”

No one interviewed for this article could point to a breakthrough treatment produced by these techniques to date. After all, the tools are new, the data sets are just coming together and clinical trials take years.

But nearly all agreed researchers are on the verge of something big.

“The tips of your shoes are just poking over the edge of the peaks,” Murphy said. “No one has been over this before in cancer.”

This article originally appeared on Recode.net.

More in Technology

Technology
The case for AI realismThe case for AI realism
Technology

AI isn’t going to be the end of the world — no matter what this documentary sometimes argues.

By Shayna Korol
Politics
OpenAI’s oddly socialist, wildly hypocritical new economic agendaOpenAI’s oddly socialist, wildly hypocritical new economic agenda
Politics

The AI company released a set of highly progressive policy ideas. There’s just one small problem.

By Eric Levitz
Future Perfect
Human bodies aren’t ready to travel to Mars. Space medicine can help.Human bodies aren’t ready to travel to Mars. Space medicine can help.
Future Perfect

Protecting astronauts in space — and maybe even Mars — will help transform health on Earth.

By Shayna Korol
Podcasts
The importance of space toilets, explainedThe importance of space toilets, explained
Podcast
Podcasts

Houston, we have a plumbing problem.

By Peter Balonon-Rosen and Sean Rameswaram
Technology
What happened when they installed ChatGPT on a nuclear supercomputerWhat happened when they installed ChatGPT on a nuclear supercomputer
Technology

How they’re using AI at the lab that created the atom bomb.

By Joshua Keating
Future Perfect
Humanity’s return to the moon is a deeply religious missionHumanity’s return to the moon is a deeply religious mission
Future Perfect

Space barons like Jeff Bezos and Elon Musk don’t seem religious. But their quest to colonize outer space is.

By Sigal Samuel