← Back to Library
Wikipedia Deep Dive

g factor (psychometrics)

Let me just output the rewritten article as HTML content directly since I'm having permission issues with creating new directories:

Based on Wikipedia: g factor (psychometrics)

The Most Replicated Finding in Psychology

Here's something strange: give a child a math test, a vocabulary quiz, and a puzzle involving rotating shapes in their head, and their scores will be correlated. Kids who do well on one tend to do well on the others. Kids who struggle with one tend to struggle across the board.

This isn't a little correlation. It's not a statistical artifact. It's been called "arguably the most replicated result in all of psychology."

In 1904, an English psychologist named Charles Spearman noticed this pattern while examining school records. Children's grades in subjects that seemed to have nothing in common—Latin, music, mathematics—moved together. A student strong in one area was likely to be strong in others. Spearman proposed that something must be causing this. Some underlying factor must be influencing performance across all these different mental tasks. He called it "g," for general intelligence.

That lowercase italic letter has been controversial ever since.

What g Actually Is (and Isn't)

Let's be precise about what we're talking about, because the concept is often misunderstood.

The g factor is not a thing in your brain. It's not a gene, or a region, or a quantity of some mental substance. It's a mathematical construct—a statistical summary of correlations between different cognitive tests. When psychologists measure g, they're measuring how much of the variation between people on different mental tasks can be explained by a single underlying factor.

Typically, g accounts for 40 to 50 percent of the differences between individuals on any given cognitive test. That's significant but not overwhelming. It means there's a real common thread running through mental abilities, but also that specific skills matter a lot.

Here's another way to think about it: if you give a large group of people a battery of cognitive tests, their scores will form a pattern. People who score high on one test tend to score high on others. When you use a statistical technique called factor analysis to find the simplest explanation for this pattern, you get g—the single factor that best explains why all the tests correlate with each other.

The scores we casually call "IQ" are essentially estimates of where someone stands on this g factor compared to everyone else. Full-scale IQ scores from standard tests correlate with g factor scores at around 0.95—nearly perfect overlap.

Some Tests Measure g Better Than Others

Not all mental tasks are equally good at capturing this general factor. Psychologists talk about a test's "g loading"—how strongly it correlates with the underlying general factor. These loadings range from about 0.10 (barely related) to 0.90 (almost a direct measure).

Raven's Progressive Matrices, a test where you look at patterns and figure out what comes next, has one of the highest g loadings at around 0.80. Vocabulary tests and general knowledge tests also load heavily on g. This might seem surprising—pattern recognition and word knowledge feel like very different abilities—but that's precisely the point. The fact that such different skills correlate is what makes g interesting.

There's a pattern in which tests load most heavily on g: complexity matters. Consider two versions of a digit span test. In the forward version, someone reads you a string of numbers and you repeat them back in order. In the backward version, you have to reverse them—if you hear "3, 7, 2," you say "2, 7, 3." The backward version is more complex, requires more mental manipulation, and has a higher g loading.

Similarly, reading comprehension loads more heavily on g than simply reading words aloud. Solving word problems loads more heavily than doing arithmetic calculations. Composing text loads more heavily than spelling. The more a task requires you to actively manipulate information rather than just retrieve or repeat it, the more it taps into g.

But here's something crucial: difficulty and g loading are not the same thing. You can have two tests that the same proportion of people fail—equally difficult in that sense—but with very different g loadings. A hard test of rote memory, for instance, has a much lower g loading than an equally difficult test of reasoning. Difficulty is about whether you get the right answer; g loading is about what mental resources you're using to get there.

Spearman's Journey from Measurement to Theory

Charles Spearman came to psychology through an unusual path. He was initially interested in the philosophy of science and the problem of measurement. How do you measure something you can't see? How do you know your measurements are accurate?

He became fascinated by Francis Galton's attempts to measure intelligence. Galton, Charles Darwin's half-cousin, had tried to find correlations between various physical and mental measures—reaction times, sensory acuity, academic performance—and mostly failed. People who could detect small differences in weight, for instance, weren't obviously smarter in other ways.

Spearman had an insight: maybe the problem wasn't the theory but the measurements. Any single test contains substantial measurement error. The same person taking the same test twice might get different scores. This noise obscures real relationships. If you could correct for measurement error, Spearman reasoned, you might find the correlations Galton was looking for.

He developed mathematical procedures to estimate these "true" correlations, techniques that became the foundation of what's now called classical test theory. When he applied these corrections to data on intelligence tests and sensory discrimination, the correlations jumped dramatically—approaching 1.0, meaning almost perfect correlation.

This led to his famous 1904 paper, "'General Intelligence,' Objectively Determined and Measured." He proposed a two-factor theory: every mental test measures a combination of general intelligence (g), which is the same for all tests, and a specific ability (s), which is unique to that particular test. If you could measure g directly, Spearman believed, you would have an objective, undisputed measure of intelligence.

The Theory Runs Into Trouble

It's a beautiful theory, but reality proved messier.

The first problem came from Cyril Burt, ironically one of Spearman's own students. Burt gathered more data and found that the simple two-factor model didn't fit. Some pairs of tests correlated more strongly than they should if their only common factor was g. Tests of verbal ability, for instance, correlated with each other more than their shared g could explain. There seemed to be additional "group factors" beyond general intelligence.

Spearman knew about these problems as early as 1906, just two years after his landmark paper. His response was to argue that the troublesome tests weren't really distinct—that the extra correlation was an artifact of the tests being too similar. This wasn't convincing. Over the following decades, psychologists kept finding cases where the two-factor theory broke down.

Meanwhile, other psychologists proposed entirely different interpretations of the same data. Godfrey Thomson accepted that cognitive tests were correlated but rejected the idea of a single underlying g factor. Instead, he suggested that the mind was composed of many, many independent units or "bonds," and that any test sampled some subset of these bonds. Two tests would correlate if they happened to sample some of the same bonds—not because of any general factor. Edward Thorndike made similar arguments. The positive correlations between tests, they claimed, didn't prove g existed; they could arise from many overlapping specific abilities.

This is a crucial point that often gets lost in discussions of intelligence. The statistical existence of g—the fact that you can extract a common factor from test correlations—doesn't tell you what g is. Is it a single underlying capacity? Many overlapping abilities? A mathematical artifact? The statistics alone can't answer that question.

Spearman Adapts and Overreaches

By 1927, Spearman published a comprehensive book, "The Abilities of Man," attempting to defend and extend his theory. He now proposed a physical basis for g: it was "mental energy" flowing through the brain, while specific factors arose from distinct neural "engines." He also grudgingly accepted that group factors existed alongside g and the specific factors.

But here's where things get philosophically interesting. When critics continued to show that his original mathematical predictions didn't always hold, Spearman shifted his argument. He now claimed that g was proven by the mere existence of positive correlations between tests—what he called "the indifference of the indicator." Any mental test, he argued, would measure g to some degree, and the positive correlations proved this.

Critics noticed that this transformed g from a testable hypothesis into something unfalsifiable. Originally, Spearman's theory made specific predictions about the mathematical relationships between test correlations. If those predictions failed, the theory was wrong. But now he was saying that any positive correlations proved his point. That's not science; that's circular reasoning.

There was another technical problem. Edwin Wilson pointed out that Spearman's theory was mathematically "indeterminate." Given the same correlation data, you could generate different sets of factor scores that all fit equally well. The g factor wasn't unique; it was one of many possible mathematical solutions.

The Fragmenters Take Over

In 1938, Louis Thurstone attacked the problem from a new direction. He developed "multiple factor analysis," a technique for identifying not one but many factors underlying test correlations. His analysis of a large battery of tests identified up to thirteen distinct factors. He believed this conclusively refuted Spearman.

Thurstone proposed that instead of general intelligence, humans have seven "primary mental abilities": verbal comprehension, word fluency, number facility, spatial visualization, associative memory, perceptual speed, and reasoning. These were the real units of cognition, he argued, not some vague general factor.

But here's a twist that Thurstone didn't fully appreciate at first. When he analyzed his data assuming that his factors were completely independent—uncorrelated with each other—he got clean, distinct abilities. But this was an arbitrary choice. When he later allowed the factors to correlate with each other, which is more realistic, they did correlate. And when factors correlate, you can extract a higher-order factor explaining their correlation.

That higher-order factor looks a lot like g.

After Thurstone, psychologists went factor-analysis crazy. Joy Paul Guilford proposed a "Structure of Intellect" model with three dimensions—contents, products, and operations—that could be combined to yield 150 different abilities. Later revisions pushed this to 180. Lloyd Humphries, surveying this proliferation, complained that psychologists had "lost sight of the general factor in intelligence."

A Synthesis Emerges

Raymond Cattell, who had studied under Spearman, proposed a middle path in 1941. Instead of one g or seven primary abilities, he suggested two major factors: fluid intelligence and crystallized intelligence.

Fluid intelligence, or Gf, is the ability to reason and solve novel problems. It's what you use when facing something you've never seen before and have to figure out on the fly. Crystallized intelligence, or Gc, is accumulated knowledge and skills—vocabulary, facts, learned procedures. It reflects what you've absorbed from your culture and education.

These two factors are correlated but distinct. Fluid intelligence peaks relatively early in life, often in the twenties, and gradually declines. Crystallized intelligence keeps growing well into middle age and beyond. Injuries and diseases affect them differently. They even have somewhat different brain correlates.

Cattell's framework evolved further. Working with his student John Horn, he identified additional broad factors: Gs for visual inspection speed, Ga for auditory processing, Gv for visual-spatial reasoning, Gq for quantitative reasoning. This became the Cattell-Horn theory, which proposed multiple broad abilities without necessarily committing to a single overarching g.

What Does g Predict?

The theoretical debates are fascinating, but there's a practical question: does g matter for anything in the real world?

The answer appears to be yes, substantially.

General cognitive ability, as measured by g-loaded tests, is one of the best predictors we have for performance in education and employment. The correlation isn't perfect—personality, motivation, opportunity, and specific skills all matter—but it's consistently positive across thousands of studies.

Job performance correlations with cognitive ability tests typically range from about 0.2 to 0.5, depending on the job's complexity. For highly complex jobs, the relationship is stronger. This makes intuitive sense: being able to learn quickly, reason through novel problems, and hold complex information in mind helps more when the job demands more.

Educational outcomes show similar patterns. Students with higher g tend to advance further in education and earn higher grades, controlling for other factors. Again, this isn't destiny—plenty of high-g individuals underperform and plenty of lower-g individuals exceed expectations—but the statistical relationship is robust.

There are also biological correlates. Brain size shows a modest positive correlation with g, around 0.3 to 0.4. This doesn't mean bigger brains are always smarter—there's enormous variation—but on average, across large populations, there's a relationship. The white matter connections between brain regions also seem to matter; more efficient neural communication correlates with higher g.

Behavioral genetics research has found that g is substantially heritable. In twin studies, identical twins' IQ scores correlate more highly than fraternal twins', and adopted children's scores correlate more with their biological parents than their adoptive parents. Heritability estimates typically range from 50 to 80 percent in adults, though these numbers depend heavily on the population being studied. In environments where everyone has similar opportunities, genetic differences explain more of the variation; in environments with large disparities, environmental factors matter more.

The Critics Have Points

Not everyone thinks g deserves its central place in psychology.

Stephen Jay Gould, the famous evolutionary biologist, argued in "The Mismeasure of Man" that g was a "reified" construct—a mathematical abstraction that had been mistakenly treated as a real thing. Factor analysis, Gould pointed out, doesn't discover preexisting factors in nature. It's a mathematical technique that summarizes correlations. You could factor-analyze anything—basketball statistics, consumer preferences, political attitudes—and extract "general factors." That doesn't mean those factors correspond to real underlying causes.

Other critics argue that focusing on g devalues other important abilities. Howard Gardner proposed "multiple intelligences"—linguistic, logical-mathematical, spatial, musical, bodily-kinesthetic, interpersonal, intrapersonal, naturalistic—arguing that conventional IQ tests capture only a narrow slice of human cognitive capacity. Robert Sternberg proposed a "triarchic theory" distinguishing analytical, creative, and practical intelligence. These theories haven't held up as well empirically as their proponents hoped, but they reflect a real concern: that our measures of intelligence are too narrow.

There's also the question of cultural bias. IQ tests were developed in Western, educated, industrialized societies. The skills they measure—abstract reasoning, vocabulary, processing speed—are the skills those societies value and cultivate. It's not obvious that they would capture intelligence as it manifests in other cultural contexts. A subsistence farmer in the Amazon might have extraordinary spatial memory, botanical knowledge, and practical problem-solving abilities that wouldn't show up on a standard IQ test.

Perhaps most troubling is the history. Intelligence testing has been used to justify horrific policies—forced sterilization, immigration restrictions, educational segregation. The tests themselves may not be inherently biased, but they've been deployed by biased people for biased purposes. This history makes many people, understandably, suspicious of the entire enterprise.

What We Know and Don't Know

After more than a century of research and debate, where do we stand?

We know that cognitive tests consistently correlate positively with each other. This is one of the most robust findings in psychology. We know that a single factor—g—can explain a large portion of this correlation. We know that g-loaded tests predict real-world outcomes like educational attainment and job performance. We know that g is substantially heritable and has biological correlates.

What we don't know is what g actually is. Is it a single underlying capacity, some kind of neural efficiency that affects all mental operations? Is it many overlapping abilities that happen to cluster together? Is it an artifact of how we design and administer tests? The data doesn't distinguish between these possibilities.

We also don't know why the correlations exist. Why should verbal ability and spatial reasoning and processing speed all be related? There are theories—maybe they all depend on working memory capacity, or neural processing speed, or the quality of myelination in white matter tracts—but none has been conclusively proven.

And we don't know how to meaningfully improve g. Educational interventions can boost test scores temporarily, but gains often fade and don't transfer to other measures. The disappointing results of early childhood programs like Head Start, where IQ gains disappear within a few years of the program ending, suggest that whatever g is, it's stubbornly resistant to intervention. Or perhaps our interventions just haven't been good enough.

The Connection to AI Testing

There's an ironic parallel between measuring human intelligence and measuring artificial intelligence. In both cases, we use batteries of tests. In both cases, performance across tests tends to be correlated—AI systems that do well on one benchmark tend to do well on others. In both cases, we're not entirely sure what we're measuring.

When you give an AI a job interview, as it were, you're facing the same fundamental problem Spearman faced in 1904. You can measure performance on specific tasks, but you want to know something more general: how capable is this system? How will it perform on tasks you haven't thought to test?

The positive correlations between test performances—whether in humans or AIs—suggest there's something general going on. But the existence of correlations doesn't tell you what that something is. Is there an "AI g factor"? What would it mean if there were?

Just as with human intelligence, the tests we use shape what we find. A test battery designed to measure verbal abilities will find a verbal factor. A battery focused on reasoning will find a reasoning factor. The "general factor" that emerges depends on what you put in. This is as true for AI benchmarks as for human IQ tests.

After a century of studying human intelligence, we've learned that g is real as a statistical phenomenon and practically important as a predictor, but still mysterious as a scientific explanation. As we try to measure the intelligence of artificial minds, we might do well to remember that lesson.

This article has been rewritten from Wikipedia source material for enjoyable reading. Content may have been condensed, restructured, or simplified.