Skip to main content

The context you need, when you need it

When news breaks, you need to understand what actually matters — and what to do about it. At Vox, our mission to help you make sense of the world has never been more vital. But we can’t do it on our own.

We rely on readers like you to fund our journalism. Will you support our work and become a Vox Member today?

Join now

The best illustration you’ll see that correlation doesn’t equal causation

At r ≈ .66 this is actually one of the weaker correlations Tyler Vigen found. Still, it makes you wonder…
At r ≈ .66 this is actually one of the weaker correlations Tyler Vigen found. Still, it makes you wonder…
At r ≈ .66 this is actually one of the weaker correlations Tyler Vigen found. Still, it makes you wonder…
Courtesy of Tyler Vigen
Dylan Matthews
Dylan Matthews was a senior correspondent and head writer for Vox’s Future Perfect section. He is particularly interested in global health and pandemic prevention, anti-poverty efforts, economic policy and theory, and conflicts about the right way to do philanthropy.

“Correlation doesn’t equal causation.” You’ve heard it in statistics class, as a caveat in a million blog posts writing up data or a study (including some of mine), as a critique of those studies, and, naturally, as the premise for an XKCD cartoon. But I’ve rarely seen the point made as vividly as it was by Tyler Vigen, a law student at Harvard who, in his spare time, put together a website that finds very, very high correlations between things that are absolutely not related, like margarine consumption and the divorce rate in Maine:

Screenshot_2014-05-12_12.46.40

Courtesy of Tyler Vigen

Or whole milk consumption and the marriage rate in Mississippi:

Screenshot_2014-05-12_12.46.30

Courtesy of Tyler Vigen

Or the amount of money spent on pets in the US and the number of lawyers in California:

Screenshot_2014-05-12_12.45.09

Courtesy of Tyler Vigen

Those all have correlation coefficients in excess of 0.99! That is very very high! By comparison, Alan Abramowitz’s extremely accurate “Time for Change” model of presidential elections (it predicted Obama would get 52.2 percent of the two-party vote; he got 51.4) has a correlation coefficient of 0.97, which Abramowitz correctly calls “extraordinary.” The point is that a strong correlation isn’t nearly enough to make strong conclusions about how two phenomena are related to each other. Abramowitz’s model is worth trusting not just because of its high correlation but because it predicts presidential elections based on factors that logically should matter to voters, like the state of the economy and what party currently controls the White House. That gives it theoretical plausibility, which a theory in which, say, US whole milk consumption is driven by the marital status of Mississippians, lacks.

Vigen tells me he got most of the data from the Centers for Disease Control and Prevention and the Census. “The death rates, precipitation data, and sunlight data were exported from the CDC,” he says. “I wrote a script to cull through the exported data and make it usable by my program. For the bulk of the rest of the data, I manually copied it from US Census spreadsheets directly into an master spreadsheet. I only did about 100 variables the second way, so a lot of the correlations are between the interesting variables I copied and the less interesting (but sometimes humorous) ones from the CDC.”

Viger says he might add more data in the future, but he’s sure producing some striking nonsense correlations with what’s in there now. The number of “suicides by hanging, strangulation and suffocation” seems to track the size of the legal profession quite well, both nationally:

Screenshot_2014-05-12_12.45.00

Courtesy of Tyler Vigen

And in North Carolina:

Screenshot_2014-05-12_12.45.30

Courtesy of Tyler Vigen

And “deaths by getting tangled in one’s bedsheets” jibes very well with trends in ski company revenue:

Screenshot_2014-05-12_18.25.52

Courtesy of Tyler Vigen

Hat-tip Business Insider.

See More:

More in archives

archives
Ethics and Guidelines at Vox.comEthics and Guidelines at Vox.com
archives
By Vox Staff
Supreme Court
The Supreme Court will decide if the government can ban transgender health careThe Supreme Court will decide if the government can ban transgender health care
Supreme Court

Given the Court’s Republican supermajority, this case is unlikely to end well for trans people.

By Ian Millhiser
archives
On the MoneyOn the Money
archives

Learn about saving, spending, investing, and more in a monthly personal finance advice column written by Nicole Dieker.

By Vox Staff
archives
Total solar eclipse passes over USTotal solar eclipse passes over US
archives
By Vox Staff
archives
The 2024 Iowa caucusesThe 2024 Iowa caucuses
archives

The latest news, analysis, and explainers coming out of the GOP Iowa caucuses.

By Vox Staff
archives
The Big SqueezeThe Big Squeeze
archives

The economy’s stacked against us.

By Vox Staff