The Good Judgment Project
Based on Wikipedia: The Good Judgment Project
The Amateurs Who Beat the CIA
Here's a remarkable fact: a group of ordinary people—retired government workers, software engineers, curious retirees—consistently outperformed professional intelligence analysts with access to classified information. By thirty percent. These volunteers had no security clearances, no spy satellites, no covert sources. They just made predictions about world events and tracked how often they were right.
This wasn't a fluke. It happened year after year in a government-sponsored tournament designed to find better ways to forecast geopolitical events. The group that pulled off this upset called themselves the Good Judgment Project, and what they discovered about human prediction has profound implications for how we think about expertise, intelligence, and the future.
The Intelligence Community's Prediction Problem
Governments spend billions on intelligence agencies. The Central Intelligence Agency, the National Security Agency, the Defense Intelligence Agency—these organizations employ thousands of analysts poring over satellite imagery, intercepted communications, and reports from agents in the field. Their job is to tell policymakers what's likely to happen next. Will North Korea test another nuclear weapon? Will the euro survive the Greek debt crisis? Will a particular country's government fall?
But there was a problem nobody wanted to talk about. Nobody was keeping score.
Intelligence assessments used slippery language. "There is a significant possibility that..." What does "significant" mean? Thirty percent? Sixty percent? When analysts predicted something "might" happen, they could claim vindication whether it happened or not. This made it impossible to know which analysts were actually good at their jobs and which were just confident-sounding.
In 2006, a psychologist named Philip Tetlock published a book called "Expert Political Judgment" that dropped a bomb on this cozy arrangement. He had spent twenty years collecting predictions from 284 experts—political scientists, economists, journalists, government officials—and tracking whether they came true. The results were devastating. The experts barely beat random chance. A dart-throwing chimpanzee would have done about as well.
But buried in Tetlock's data was something more interesting than expert failure. Some forecasters were substantially better than others. The question was: what made them different?
A Tournament to Find the Best Forecasters
The Intelligence Advanced Research Projects Activity—or IARPA, pronounced "eye-AR-puh"—is like DARPA's lesser-known sibling. Where DARPA funds advanced military technology, IARPA funds research to improve intelligence analysis. In 2011, they launched something called the Aggregative Contingent Estimation program, a mouthful that basically meant: "Let's have a forecasting tournament and see what works."
IARPA posed around a hundred to a hundred fifty questions each year about real geopolitical events. Not vague questions like "Will there be conflict in the Middle East?" but specific, verifiable ones with deadlines. "Will the president of Tunisia flee the country before March 1?" "Will the price of gold exceed two thousand dollars per ounce by December 31?" Questions you could score definitively as right or wrong.
Multiple research teams competed. Each team could use whatever methods they wanted—prediction markets, expert panels, statistical algorithms. The teams would be scored using something called the Brier score, a mathematical formula that rewards both accuracy and calibration. It's not enough to be right; you have to know how confident to be.
Tetlock entered the competition along with two collaborators: Barbara Mellers, a decision scientist, and Don Moore, all professors at the University of Pennsylvania. They called their entry the Good Judgment Project.
The Wisdom of the Amateur Crowd
The Good Judgment Project took a counterintuitive approach. Instead of recruiting foreign policy experts—people with PhDs in international relations or decades at the State Department—they recruited talented amateurs. Curious, numerate people who enjoyed puzzles and didn't mind being proven wrong.
This might seem foolish. Wouldn't experts know more about, say, the political dynamics of the Syrian civil war? But Tetlock had learned from his earlier research that expertise often came bundled with overconfidence and ideological rigidity. Experts had reputations to protect and worldviews to defend. They made predictions that fit their theories rather than the evidence.
The amateurs had a different approach. They didn't have pet theories about how the world worked. They were willing to update their views when new information came in. And crucially, they treated forecasting as a skill to be improved, not a natural gift you either had or didn't.
The project gave these volunteers basic tutorials on forecasting best practices and cognitive biases. They learned about base rates—the background frequency of events. They learned about the planning fallacy—our tendency to underestimate how long things take. They learned to break big questions into smaller, more tractable ones.
Then the project let them loose on the IARPA questions.
Superforecasters Emerge
The Good Judgment Project won the first year of the tournament. They weren't just a little better than other teams—they were thirty-five to seventy-two percent more accurate. The second year, they won again. By the third year, IARPA stopped funding the other research teams entirely. The contest was over.
But something else emerged from those years of forecasting. The researchers noticed that some individual forecasters were dramatically better than others. Not just a little better—consistently, year after year, across many different types of questions. These people weren't famous experts. They were a pharmacist in Brooklyn, a filmmaker in Canada, a retired government worker in Virginia.
Tetlock called them "superforecasters."
What made them super? The researchers used personality tests and performance data to figure it out. Superforecasters shared certain traits. They were actively open-minded—willing to change their beliefs when evidence demanded it. They were numerate but not necessarily mathematical geniuses. They updated their forecasts frequently as new information came in, making many small adjustments rather than sticking stubbornly to their initial predictions.
Perhaps most importantly, they thought probabilistically. Where ordinary people might say "I think this will happen" or "I think this won't happen," superforecasters said "I think there's a sixty-three percent chance this will happen." That precision forced them to take their own beliefs seriously and made it possible to track whether their confidence was justified.
The Aggregation Algorithm
Individual superforecasters were impressive, but the real magic happened when their predictions were combined. The Good Judgment Project developed an aggregation algorithm that weighted each forecaster's prediction by their track record. Better forecasters got more influence. Extreme predictions were tempered. The result was a collective forecast that outperformed even the best individuals.
This is a variation on the "wisdom of crowds" phenomenon that the British scientist Francis Galton observed in 1906. At a county fair, he watched people guess the weight of an ox. Individual guesses were all over the map. But when Galton averaged all the guesses together, the crowd's answer was almost exactly right—better than any individual expert.
The same principle works for geopolitical forecasting, with a twist. You don't want to average everyone equally. You want to weight by track record. And you want to include forecasters who think differently from each other, because diversity of thought cancels out individual biases.
By the final season of the tournament, the Good Judgment Project had distilled their methods down to a group of 260 superforecasters whose aggregated predictions were, according to reports, thirty percent more accurate than intelligence analysts with access to classified information.
Think about what that means. The intelligence community's advantage—secret sources, satellite imagery, intercepted communications—was worth less than disciplined probabilistic thinking and good cognitive hygiene.
The Superforecasting Book
In 2015, Tetlock and journalist Dan Gardner published "Superforecasting: The Art and Science of Prediction," which told the story of the Good Judgment Project and distilled its lessons. The Wall Street Journal called it "the most important book on decision making since Daniel Kahneman's Thinking, Fast and Slow."
That comparison is apt. Kahneman's book, published in 2011, explained the cognitive biases that distort human judgment—anchoring, availability, confirmation bias. Superforecasting offered a kind of antidote: here's how to actually make good predictions despite those biases.
The book became influential in circles that care about prediction: finance, intelligence, policy. But its lessons have been slow to penetrate everyday discourse, which still treats prediction as a matter of credentials or confidence rather than track record and calibration.
From Research Project to Commercial Enterprise
In July 2015, the research project spun off into a company called Good Judgment Inc. They now offer forecasting services to clients who want probabilistic predictions on questions that matter to them—geopolitical risks, market movements, technology trends.
They also run a public forecasting tournament called Good Judgment Open, where anyone can make predictions and build a track record. The questions range across geopolitics, finance, US politics, entertainment, and sports. It's like fantasy football for news junkies, except instead of picking players, you're picking probabilities.
The tournament serves a dual purpose. It identifies new forecasting talent—future superforecasters who might join the elite team. And it generates data for ongoing research into what makes some people better at prediction than others.
Why This Matters for AI Forecasting
If you're reading this because of an interest in AI forecasting—and given the connection to AI 2027, you probably are—the Good Judgment Project offers both hope and caution.
The hope: prediction is a skill, and it can be learned. The superforecasters weren't born with special abilities. They developed good habits and stuck to them. Anyone willing to track their predictions honestly and update their beliefs can get better at forecasting.
The caution: expertise is less valuable than you might think. The foreign policy experts in Tetlock's original study were often worse than amateurs because their knowledge came bundled with ideological commitments. AI experts may face the same trap. Those who have spent years developing a particular view of how AI will unfold may find it hardest to update when evidence contradicts their theories.
The Good Judgment Project also demonstrated that crowd wisdom—properly structured and weighted—outperforms individual genius. This suggests that when thinking about something as complex as AI development, we should be skeptical of any single narrative and pay attention to the aggregated views of people with strong forecasting track records.
Finally, and perhaps most importantly, the project showed that vague predictions are worthless. "AI might be dangerous" or "AI will be transformative" are not forecasts. A real forecast says: "I assign a twenty percent probability that an AI system will cause more than one billion dollars in damage before January 1, 2028." You can track that. You can score it. You can learn from being wrong.
The superforecasters didn't beat the CIA by being smarter or knowing more. They beat them by taking prediction seriously—by treating it as a practice that demands precision, humility, and constant improvement. That's a lesson worth remembering as we try to anticipate what artificial intelligence will do next.
The Advisory Board and Key Figures
The Good Judgment Project assembled an impressive collection of minds. The advisory board included Daniel Kahneman himself—the Nobel laureate whose work on cognitive biases provided the intellectual foundation for the whole enterprise. Robert Jervis, a political scientist who studied how leaders misperceive each other's intentions. J. Scott Armstrong, an expert on forecasting methodology. Michael Mauboussin, an investment strategist known for his work on decision-making under uncertainty.
The core research team included David Budescu, who studies how people communicate uncertainty. Lyle Ungar, a computer scientist who helped develop the aggregation algorithms. Jonathan Baron, who studies rationality and judgment. And Emile Servan-Schreiber, an entrepreneur who had worked on prediction markets.
This was not a group of amateurs studying amateurs. It was a rigorous scientific enterprise with some of the world's leading researchers on judgment and decision-making. The fact that they found ordinary people outperforming credentialed experts makes the result harder to dismiss.
The Opposite of Superforecasting
To understand what superforecasters do right, it helps to understand what bad forecasters do wrong.
Bad forecasters make vague predictions that can't be scored. "Tensions will rise in the Middle East." What does that mean? How would you know if it was wrong?
Bad forecasters are overconfident. They say things are certain when they're not. They're surprised by events that were actually quite probable.
Bad forecasters are underconfident about their errors. When they're wrong, they explain it away. The fundamentals were right; something unexpected happened. This prevents learning.
Bad forecasters update too little when evidence comes in. They anchor on their initial estimate and adjust insufficiently. Or they swing wildly between extremes, overreacting to the latest news.
Bad forecasters think in narratives rather than probabilities. They have a story about how the world works, and they interpret everything through that story. Evidence that fits the story is proof they're right. Evidence that contradicts it is explained away or ignored.
Bad forecasters are hedgehogs—they know one big thing and see everything through that lens. Superforecasters are foxes—they know many things and synthesize information from multiple sources and perspectives.
How to Become a Better Forecaster
The Good Judgment Project's research suggests several practices for improving your own forecasting ability.
First, make your predictions specific and time-bound. Not "I think AI will be transformative" but "I assign a forty percent probability that a large language model will pass the Turing test by January 2027." Write it down. Track it.
Second, think about base rates before considering specific details. Before predicting whether a particular startup will succeed, ask: what fraction of startups in this category succeed? Start with that number and adjust based on what makes this case different.
Third, break big questions into smaller ones. "Will AI cause an existential catastrophe?" is too big to answer directly. But you can estimate the probability that AI reaches certain capability thresholds, the probability of misalignment given those capabilities, the probability of catastrophe given misalignment. Then combine them.
Fourth, update frequently and incrementally. When new information comes in, don't ignore it, but don't overreact either. Make small adjustments. Track whether your updates are improving your accuracy.
Fifth, seek out disagreement. Find smart people who think differently and try to understand why. The goal isn't to win arguments but to discover what you might be missing.
Sixth, keep score honestly. Write down your predictions with probabilities. Check them when the events resolve. Calculate your Brier score over time. This is uncomfortable, but it's the only way to actually improve.
The Good Judgment Project showed that ordinary people can become extraordinary forecasters through practice and discipline. That's good news for anyone trying to navigate an uncertain future—whether the uncertainty is about elections, markets, or artificial intelligence.