← Back to Library
Wikipedia Deep Dive

Correlation does not imply causation

Based on Wikipedia: Correlation does not imply causation

In the Middle Ages, Europeans noticed something peculiar about lice. Healthy people were crawling with them, but sick people had almost none. The conclusion seemed obvious: lice must be good for you. When the lice left, people got sick.

They had it exactly backwards.

Lice, it turns out, are extraordinarily sensitive to body temperature. Even the slight fever that precedes noticeable illness sends them scurrying to find a new host. The lice weren't protecting anyone—they were just the first to notice something was wrong. But without thermometers, which hadn't been invented yet, the fever went undetected. By the time symptoms appeared, the lice had already fled, creating the illusion that their departure had caused the sickness.

This is perhaps history's most charming example of one of the most persistent errors in human reasoning: mistaking correlation for causation.

The Fallacy That Refuses to Die

The principle is simple enough to state: just because two things happen together doesn't mean one caused the other. Logicians have been warning us about this for centuries, giving it the Latin name cum hoc ergo propter hoc—"with this, therefore because of this."

It's closely related to another fallacy called post hoc ergo propter hoc—"after this, therefore because of this"—which assumes that because one event follows another, the first must have caused the second. Your team won every game when you wore your lucky socks? The socks must be magic.

And yet, despite being one of the oldest identified logical errors, we fall for it constantly. Our brains are pattern-recognition machines, evolved on the African savanna where noticing connections—that rustling in the grass often preceded a predator attack, for instance—was essential for survival. We're wired to see causation everywhere, even where it doesn't exist.

Which Way Does the Arrow Point?

Sometimes a correlation is real, but we get the direction completely wrong. This is called reverse causation, and it's surprisingly common.

Consider the windmill. A visitor from a pre-industrial society might observe that whenever windmills spin quickly, strong winds blow across the land. Windmills cause wind! The logic seems airtight until you remember that wind existed long before humans invented windmills, and plenty of windy places have no windmills at all.

A more consequential example involves cholesterol and mortality. For years, studies showed that people with low cholesterol died at higher rates than those with normal levels. Some researchers concluded that low cholesterol was dangerous—perhaps even that we should stop trying to lower it.

But the arrow pointed the other way. Certain diseases, particularly cancer, cause cholesterol levels to drop as the body wastes away. The low cholesterol wasn't killing anyone; it was a symptom of something that was. People weren't dying because their cholesterol dropped—their cholesterol dropped because they were dying.

The same pattern appears with alcoholism and liver disease. When alcoholics are diagnosed with cirrhosis, many stop drinking. But their mortality rates go up anyway. A careless analysis might conclude that quitting alcohol is deadly. In reality, these people are dying from the cirrhosis—their sobriety just happens to coincide with the advanced stage of their disease.

The Hidden Third Variable

Perhaps the sneakiest way correlations mislead us is through what statisticians call confounding variables—hidden factors that influence both of the things we're observing.

Here's a classic: people who sleep with their shoes on are much more likely to wake up with headaches. Should we launch a public health campaign against bedtime footwear?

Of course not. The missing variable is alcohol. People who go to bed drunk often don't bother removing their shoes. The drinking causes both the shod sleeping and the morning headache. The shoes are innocent.

Ice cream sales and drowning deaths rise and fall together throughout the year. Does ice cream cause drowning? No—summer causes both. Hot weather drives people to buy ice cream and to swim in lakes and pools. The ice cream and the drowning are connected only through their shared relationship to temperature and season.

A fascinating example from medical research involved children, nightlights, and nearsightedness. A study published in Nature in 1999 found that children who slept with a light on were more likely to develop myopia. Parents panicked. The study made headlines.

Then researchers at Ohio State University looked more carefully. They found no direct link between nightlights and myopia. What they did find was that nearsighted parents were more likely to leave lights on in their children's rooms—perhaps because they themselves had trouble navigating dark spaces. And nearsighted parents tend to have nearsighted children, thanks to genetics. The nightlight was just a marker for parental myopia, not a cause of anything.

When Both Directions Are True

Sometimes the relationship between two variables isn't one-way at all. They cause each other, creating feedback loops that are fiendishly difficult to untangle.

Think about poverty and education. Does poverty cause poor education? Obviously yes—families struggling to afford food and housing have fewer resources for books, tutoring, and the stability that helps children learn. But does poor education cause poverty? Also yes—without skills and credentials, people struggle to find well-paying jobs.

Neither direction is wrong. They're both right, simultaneously, reinforcing each other in a cycle that can persist across generations.

Predator and prey populations work the same way. More rabbits mean more food for foxes, so fox populations grow. But more foxes mean more rabbit deaths, so rabbit populations shrink. Then fox populations crash from hunger, rabbit populations rebound, and the cycle continues. Asking whether foxes cause rabbit population changes or vice versa misses the point entirely.

Cyclists tend to have lower body mass indexes than non-cyclists. The obvious explanation is that cycling burns calories and builds fitness. But studies that follow people as they take up cycling show smaller effects on weight than you'd expect from comparing cyclists to non-cyclists. What gives?

Part of the answer is reverse causation: thinner people are more likely to enjoy cycling in the first place. They're more comfortable on a bike seat, less winded going up hills, and perhaps less self-conscious in lycra. The causation runs both directions—cycling makes you thinner, and being thinner makes you more likely to cycle.

The Coincidences Are Endless

Sometimes correlations are pure coincidence—statistical flukes that emerge from the chaos of an infinitely complex world.

For over sixty years, the outcome of the Washington Redskins' (now Commanders') last home game before the presidential election predicted who would win the presidency. If the team won, the incumbent party kept the White House. If they lost, the challenger prevailed. This "Redskins Rule" worked from 1936 through 2000—seventeen elections in a row.

Football games don't influence elections. The correlation was meaningless noise that happened to persist, purely by chance, for an unusually long time. It finally broke down in 2004, as all such spurious correlations eventually do.

Germany has the "Mierscheid Law," which correlates the Social Democratic Party's vote share with crude steel production. Russia has had alternating bald and hairy leaders for nearly two hundred years. The Torah, if you search hard enough, appears to contain hidden codes predicting modern events.

None of these mean anything. When you examine enough relationships between enough variables, some will appear correlated by pure chance. The universe is very large, and coincidences are cheap.

The Opposite Error

Here's where things get tricky. Knowing that correlation doesn't prove causation, some people conclude that correlations are worthless—that we should dismiss any evidence that isn't a randomized controlled trial.

This is just as wrong as the original fallacy.

The tobacco industry exploited this for decades. Cigarettes and lung cancer are correlated, they acknowledged, but correlation isn't causation! Maybe people genetically predisposed to lung cancer also happen to enjoy smoking. Maybe some hidden third variable causes both. Without a controlled experiment—randomly assigning people to smoke or not smoke for decades—we can't prove anything.

The statistician Ronald Fisher, one of the founders of modern statistics, actually made this argument publicly, casting doubt on the smoking-cancer link. He neglected to mention that he was being paid by tobacco companies.

The problem is that controlled experiments are often impossible or unethical. We can't randomly assign children to be abused to study its effects on academic performance. We can't randomly assign countries to different economic policies to see what happens. We can't wait for a controlled trial before acting on strong correlational evidence about public health threats.

Correlation isn't proof of causation, but it's evidence. When multiple correlational studies from different angles all point the same direction, when we have plausible biological mechanisms, when we can rule out obvious confounders, when the correlation is strong and consistent—at some point, reasonable people conclude that the link is real.

This is how medicine often works. The Bradford Hill criteria—named after the epidemiologist Austin Bradford Hill, who helped establish the smoking-cancer link—provide a framework for judging when correlational evidence is strong enough to support causal conclusions. They include factors like the strength of the association, its consistency across studies, biological plausibility, and whether there's a dose-response relationship.

The Smartphone Question

All of this matters profoundly when we try to understand how technology affects human wellbeing—particularly in children.

Studies consistently find correlations between smartphone ownership and various measures of psychological health in adolescents. But which way do the arrows point? Does heavy phone use cause anxiety and depression? Or do anxious, depressed teenagers seek out the numbing distraction of infinite scrolling? Or is there some third variable—family instability, perhaps, or socioeconomic stress—that increases both phone use and psychological distress?

The honest answer is that we don't know with certainty. We can't randomly assign children to grow up with or without smartphones and check back in twenty years. We're left with correlational evidence, which must be interpreted carefully.

Some researchers emphasize the correlations, arguing that the consistency of findings across studies and countries suggests a real causal relationship. Others point to methodological problems—small effect sizes, publication bias toward alarming findings, the impossibility of controlling for all confounders.

What we can say is that correlation, while not proof, is not nothing. It's a starting point, a reason to investigate further, a justification for precaution while better evidence accumulates. The tobacco industry was wrong to dismiss correlational evidence of smoking's dangers, but public health authorities would have been equally wrong to ignore it.

Living With Uncertainty

The real lesson of "correlation does not imply causation" isn't that we should dismiss all correlations. It's that we should think carefully about them.

When you encounter a claimed connection between two things, ask yourself: Could the causation run the other way? Could there be a third factor causing both? Could this be pure coincidence, especially if someone went looking for patterns in a large dataset? What would it take to rule out these alternatives?

Modern statistical methods like Granger causality tests and convergent cross mapping try to tease out causal relationships from correlational data. They don't provide certainty, but they help. So does the simple practice of considering alternative explanations before jumping to conclusions.

The medieval Europeans who trusted lice weren't stupid. They observed a genuine pattern and tried to explain it. Their mistake was stopping at the first explanation that seemed to fit, rather than asking whether the arrow might point the other direction.

We are still making the same mistake every day—in medicine, in economics, in education, in our personal lives. The difference is that now we know better. We know that our pattern-seeking brains will see causation where none exists. We know that correlation is evidence but not proof.

The question is whether we remember this in the moments when a tidy explanation presents itself, when the pattern seems so obvious, when the conclusion feels so right. That's when we need to hear the medieval lice whispering their warning from across the centuries: things are not always as they appear.

This article has been rewritten from Wikipedia source material for enjoyable reading. Content may have been condensed, restructured, or simplified.