Wikipedia Deep Dive

Kolmogorov complexity

10 min read

I've written the essay. Here it is: ---

Based on Wikipedia: Kolmogorov complexity

Here is a puzzle that sounds almost childish: What makes one string of characters more complicated than another? Consider these two sequences, each exactly thirty-two characters long:

abababababababababababababababab

4c1j5b2p0cv4w1x8rx2y39umgw5q85s7

You already know the first one is simpler. You can feel it. But can you prove it?

In the 1960s, a Soviet mathematician named Andrey Kolmogorov figured out how to make this intuition mathematically precise, and in doing so, he stumbled onto one of the deepest ideas in all of computer science. The answer involves programs, paradoxes, and the fundamental limits of what any machine can know about the universe.

The Shortest Program That Prints Your Name

The key insight is this: the complexity of something is the length of the shortest computer program that can produce it.

Think about those two strings again. To produce the first one, you could write a tiny program: "print 'ab' sixteen times." That instruction is about seventeen characters long. But to produce the second string? The shortest program is essentially just: "print '4c1j5b2p0cv4w1x8rx2y39umgw5q85s7'" — which is thirty-eight characters, longer than the string itself once you add the command around it.

This is what we now call Kolmogorov complexity: the length of the shortest possible description of something. A simple pattern has low Kolmogorov complexity because you can describe it briefly. A random jumble has high Kolmogorov complexity because there's no way to compress it — you just have to write the whole thing out.

The idea seems almost too simple to be profound. But watch what happens when you follow it to its logical conclusions.

The Programming Language Doesn't Matter (Much)

You might object: surely the complexity of something depends on what programming language you use? A description that's short in Python might be long in assembly code.

Kolmogorov anticipated this objection with an elegant theorem. Yes, different languages give different complexity values. But the difference is always bounded by a constant. Here's why.

Suppose you have a description of something in Language A. You can always convert it to Language B by writing an interpreter for Language A in Language B, and then appending your original description. The interpreter is a fixed-size program — it doesn't depend on what you're describing. So the difference between complexity in Language A and complexity in Language B is at most the size of that interpreter.

This means that when we talk about Kolmogorov complexity, we're really talking about something inherent to the object itself, not an artifact of our notation. The absolute numbers might shift depending on your choice of language, but the relative complexity of different objects stays the same, up to that constant offset.

Mathematicians express this by saying Kolmogorov complexity is "invariant up to an additive constant." Every equation in this field carries an implicit "plus or minus some fixed number of bytes" — a tolerance that acknowledges the arbitrariness of our description language while preserving the fundamental relationships.

Most Things Are Incompressible

Here's a fact that surprises most people when they first encounter it: almost all strings are incompressible.

Think about all possible strings of length n. There are 2^n of them. Now think about all possible programs shorter than n bits. There are fewer than 2^n of those (actually, fewer than 2^n minus 1). Since each program produces at most one output, there simply aren't enough short programs to go around. Most strings of length n cannot have a description shorter than n.

This is why random data doesn't compress. ZIP files work by finding patterns — repeated sequences, statistical regularities — and encoding them more efficiently. But a truly random string has no patterns to exploit. Its Kolmogorov complexity is approximately equal to its length.

Conversely, the strings we encounter in daily life — text, images, video — are highly compressible precisely because they're not random. English text has redundancy: if you see "q," you know "u" is coming. Images have spatial correlation: nearby pixels are usually similar colors. These patterns allow compression algorithms to find short descriptions.

The Impossible Measurement

Now comes the dark twist in this story. Kolmogorov complexity is perfectly well-defined mathematically. Every string has a precise complexity value. But we can never compute it.

Not "it's difficult to compute" or "we don't have fast enough computers." It's provably impossible. No algorithm can exist that takes a string as input and outputs its Kolmogorov complexity.

The proof is a beautiful exercise in self-reference. Suppose such an algorithm did exist. Then consider this program:

Search through all strings in order of length. For each string, compute its Kolmogorov complexity. When you find a string whose complexity exceeds one million bits, print it and halt.

This program is relatively short — maybe a few thousand bits at most. But it outputs a string with complexity greater than one million. That means we've found a short description of something that, by definition, has no short description. Contradiction.

The flaw must be in our assumption. The complexity-computing algorithm cannot exist.

Chaitin's Incompleteness Theorem

Gregory Chaitin, one of the co-discoverers of Kolmogorov complexity, pushed this impossibility result even further. He showed that any formal mathematical system can only prove complexity bounds up to a certain limit — a limit determined by the complexity of the system itself.

Picture a proof system as a very elaborate computer program. It has axioms, inference rules, and a procedure for checking whether proofs are valid. This entire apparatus can be encoded as a program of some finite length — call it L.

Chaitin proved that such a system cannot prove statements of the form "this string has Kolmogorov complexity greater than L + C," where C is a fixed constant. The proof is similar to the one above: if the system could prove such statements, we could use it to find strings of high complexity with a short program, creating a contradiction.

This is a kind of incompleteness theorem, like Godel's famous result, but with a different flavor. Godel showed there are true statements that can't be proven. Chaitin showed there are complexity facts that can't be proven. In both cases, the limitation arises from self-reference — from what happens when mathematical systems try to reason about themselves.

Plain Versus Prefix-Free

There's a technical subtlety that researchers in this field care about deeply, even though it might seem like nitpicking at first glance.

Suppose you want to describe two objects together. In the naive approach, you'd concatenate their descriptions. But here's the problem: when the decoder receives this concatenated string, how does it know where the first description ends and the second begins?

You could include the length of the first description, but that takes extra bits. This overhead creates annoyances in the mathematics — inequalities that should be clean and simple acquire logarithmic correction terms.

The elegant solution is to require that no valid program is a prefix of any other valid program. This is called a prefix-free code. If you've ever used Morse code, you know that each letter's encoding is designed so you can tell exactly when it ends, even without spaces between letters. Prefix-free complexity applies the same principle to program descriptions.

With prefix-free complexity, the mathematics becomes cleaner. You can concatenate descriptions freely. Inequalities that held "up to logarithmic terms" now hold "up to a constant." This is why most modern research uses prefix-free complexity, even though plain complexity is more intuitive.

The Three Inventors

The history of Kolmogorov complexity is a case study in simultaneous discovery. Three researchers, working independently in different countries, converged on essentially the same idea within a few years of each other.

Ray Solomonoff, an American researcher interested in artificial intelligence and inductive reasoning, published the first version in 1960. He was trying to formalize the problem of prediction: given the first part of a sequence, what comes next? His insight was that simpler explanations should be weighted more heavily, and he needed a mathematical definition of "simpler."

Andrey Kolmogorov, already one of the twentieth century's greatest mathematicians, published his version in 1965. His interest was in defining randomness precisely. Informally, we say a sequence is random if it has no pattern. Kolmogorov made this rigorous: a sequence is random if its shortest description is approximately as long as the sequence itself.

Gregory Chaitin, an American-Argentine mathematician, developed the theory independently while still a teenager. His 1966 paper emphasized the connections to Godel's incompleteness theorems and the fundamental limits of formal systems.

When Kolmogorov learned of Solomonoff's earlier work, he acknowledged the priority. But history has a way of attaching names to ideas somewhat arbitrarily. The complexity measure became "Kolmogorov complexity," while Solomonoff's name became attached to the related concept of algorithmic probability — the idea of assigning probabilities to outcomes based on the length of programs that generate them.

What This Means for Consciousness

If you're reading this because of its connection to questions about consciousness and artificial intelligence, here's the link. Kolmogorov complexity gives us a rigorous way to ask: how much information is actually in something?

A system that produces highly compressible outputs — things easily described by short programs — might be simpler than it appears. A chatbot that generates text by recombining patterns from its training data is doing something with low Kolmogorov complexity, in some sense. The outputs are predictable, compressible, describable.

Human consciousness, whatever it is, seems to generate behavior that's harder to compress. Our responses to novel situations don't obviously reduce to simple rules. Whether this observation can be made mathematically precise, and whether it actually distinguishes conscious from unconscious systems, remains deeply contested.

But Kolmogorov complexity at least gives us the vocabulary to ask the question precisely. It transforms "how complicated is this?" from a vague intuition into a mathematical quantity, even if that quantity turns out to be uncomputable.

The Deeper Pattern

Kolmogorov complexity connects to some of the most profound results in mathematics and computer science. It's intimately related to Turing's proof that no algorithm can decide whether an arbitrary program halts. It echoes Godel's demonstration that no sufficiently powerful formal system can prove all truths about itself. It connects to Cantor's diagonal arguments about the uncountability of real numbers.

All of these results share a common structure: they arise when systems try to describe themselves, when the observer and the observed are somehow entangled. The limits they reveal aren't practical limitations we might overcome with faster computers or cleverer algorithms. They're structural features of mathematics and computation themselves.

Perhaps the most striking thing about Kolmogorov complexity is that it defines something we can never actually measure. We can sometimes prove lower bounds ("this string has complexity at least n"), but we can never prove a string has low complexity with certainty — there might always be a shorter program we haven't discovered. It's a perfect definition of a quantity that exists but remains forever partly out of reach.

And yet this uncomputable quantity turns out to be exactly the right concept for formalizing our intuitions about simplicity, randomness, and information. Sometimes the most useful ideas are the ones we can't fully grasp.