Skip to main content

The context you need, when you need it

When news breaks, you need to understand what actually matters — and what to do about it. At Vox, our mission to help you make sense of the world has never been more vital. But we can’t do it on our own.

We rely on readers like you to fund our journalism. Will you support our work and become a Vox Member today?

Join now

We need a Data Journalism Archive. Before it becomes just another 404 error.

Are we about to enter a dark age of data journalism?

The internet has made it possible to see the world’s information without moving a muscle, not matter how old that information is. You can absorb the first news page of the Guardian, from May 1821, which had data journalism at its heart, even then.

And the web has revolutionized online journalism so that the way we consume the news changes daily; the basics of modern data journalism are grounded in that ability to visualize that data in more and more sophisticated ways.

It has also made the archiving of news content easier. In the past, archivists in each organization would preside over rooms full of old clippings and background information. The web made that process straightforward: Everything would be archived online, and those collections of the past would even become sources of present-day content, such as the New York Times’s archive, which is regularly sourced and raided by both academics and journalists.

But data journalism is not part of these archives. Much of it has become a victim of code rot – allowed to collapse or degrade so much that as software libraries update or improve, it is left far behind. Now when you try to find examples of this work, as likely as not you will end up at a 404 page.

Philip Meyer’s work on the Detroit riots.

Data journalism itself also has a long history, certainly predating 2009. You can see it in the first fall of Abraham Lincoln or in the work of Philip Meyer in investigating the causes of the Detroit riots. But the thing is, you can still see that work. Created in print before the word “interactive” had even been coined, it is kept so you can use it as inspiration. Without Precision Journalism, would Reading the Riots even have existed? The past (by which I mean less than five years ago) has a lot to teach us about the way we work today.

Paul Bradshaw has collated examples of modern data journalism, asking, “Is there a canon of data journalism?“ And while Minard’s famous chart or Florence Nightingale’s “butterflies” still exist, it is striking how much of it has vanished forever. This is just a sample:

1) ChicagoCrime.org

The progenitor of interactive news databases in the form you can see them at places like ProPublica, started by the godfather of modern data journalism, Adrian Holovaty (this May was its 10th anniversary). For years it produced just a 404 page. Now it links to a tiny section of EveryBlock.

2) The US Congress Votes Database

The site is frozen on a February 2014 vote; a tiny “this page has been archived” note at the top is an inadequate replacement for a project that has no equivalent now.

3) MPs expenses

Investigate your MPs expenses. Except you can’t.

It would be possible to fill this article with examples of things I worked on that no longer work at the Guardian — the World Data Explorer or the Libya bombing interactive, for instance. But I’ve chosen this because it was the first large-scale newsroom crowdsourcing exercise, switched off because it couldn’t be maintained. It’s now no longer even viewable.

4) Fixing DC’s Schools

Another pioneering piece of work from Holovaty, a forerunner of apps that are now commonplace among local media sources; the front page leads nowhere and the interactive design doesn’t work anymore.

5) Represent, from the New York Times

Represent. Or not.

This page has been “about to relaunch” for some time now (some say years). The site originally let people in New York find their members of Congress and track all sorts of things about them.

It’s an issue across journalism. In the Atlantic last week Adrienne LaFrance wrote how key pieces of journalism are disappearing from the web:

“If you want to save something online, you have to decide to save it. Ephemerality is built into the very architecture of the web, which was intended to be a messaging system, not a library.”

If even Pulitzer Prize–winning articles are at risk, where does that leave everyday data journalism?

At the same time, for many publishers, every word, no matter how facile or pointless, is saved as if it were a work of studied genius. This is the fantastic thing about archives: They give you a picture of a world from the past, one that can shape how you produce the future. But it’s only the words that are saved. Meanwhile, a map, an interactive guide, or even just a set of interactive charts will vanish as if they never ever existed.

Data journalism, at its best, bridges the gap between those who have the data and those who want to understand it. It raises data from the prerogative of the few into the consciousness of the many. It can change the world, illuminating that which others would rather keep secret and misunderstood.

But if we’re not careful, this golden age of data journalism will only be remembered in a few animated gifs, texty analysis pieces, and CSV downloads. Data will have returned to those who always owned it in the past; the rest of us will have to keep reinventing the wheel.

Nobody says archiving is easy, but what will be left otherwise? This article will. Plainly ironic: an article about the disappearing web left to survive. As will long academic and dry pieces of data analysis. But the apps, charts, and visuals that bring them to life? They will vanish as if they never existed.

It’s time for a Data Journalism Archive. Before we forget everything we know.

Simon Rogers is a data journalist and has worked at the Guardian, Twitter, and now at Google. This piece was originally published on his blog.

More in Technology

Technology
The case for AI realismThe case for AI realism
Technology

AI isn’t going to be the end of the world — no matter what this documentary sometimes argues.

By Shayna Korol
Politics
OpenAI’s oddly socialist, wildly hypocritical new economic agendaOpenAI’s oddly socialist, wildly hypocritical new economic agenda
Politics

The AI company released a set of highly progressive policy ideas. There’s just one small problem.

By Eric Levitz
Future Perfect
Human bodies aren’t ready to travel to Mars. Space medicine can help.Human bodies aren’t ready to travel to Mars. Space medicine can help.
Future Perfect

Protecting astronauts in space — and maybe even Mars — will help transform health on Earth.

By Shayna Korol
Podcasts
The importance of space toilets, explainedThe importance of space toilets, explained
Podcast
Podcasts

Houston, we have a plumbing problem.

By Peter Balonon-Rosen and Sean Rameswaram
Technology
What happened when they installed ChatGPT on a nuclear supercomputerWhat happened when they installed ChatGPT on a nuclear supercomputer
Technology

How they’re using AI at the lab that created the atom bomb.

By Joshua Keating
Future Perfect
Humanity’s return to the moon is a deeply religious missionHumanity’s return to the moon is a deeply religious mission
Future Perfect

Space barons like Jeff Bezos and Elon Musk don’t seem religious. But their quest to colonize outer space is.

By Sigal Samuel