Wikipedia Deep Dive

In-memory processing

13 min read

Based on Wikipedia: In-memory processing

The Bottleneck That Shaped Computing

Here's a peculiar fact about your computer: the processor, the brain of the machine, spends most of its time waiting. It's not thinking. It's not calculating. It's just sitting there, twiddling its metaphorical thumbs, while data crawls over from memory.

This isn't a flaw in any particular device. It's a fundamental tension built into how we've designed computers for decades. And increasingly, it's becoming intolerable.

The problem has a name: the von Neumann bottleneck, named after the Hungarian-American mathematician John von Neumann who, in 1945, laid out the architecture that still underlies almost every computer you've ever used. In this design, there's a clear separation between where you store data and where you process it. Data lives in memory. Processing happens in the central processing unit, the CPU. And between them? A narrow bridge that everything must cross, one piece at a time.

For most of computing history, this worked well enough. But we've reached an inflection point. The amounts of data we're trying to analyze have exploded. Artificial intelligence demands processing enormous datasets. And that narrow bridge between memory and processor? It's become a traffic jam of epic proportions.

In-memory processing is one answer to this problem. But here's where things get confusing: that term actually refers to two very different approaches, both trying to solve variations of the same fundamental issue.

Two Meanings, One Goal

When computer scientists talk about in-memory processing—sometimes called compute-in-memory or processing-in-memory—they mean something quite radical: what if we stopped shuttling data back and forth entirely? What if we could perform calculations right where the data already sits?

Think of it this way. In a traditional computer, if you want to add two numbers together, those numbers have to travel from memory to the CPU, get added in special holding areas called registers, and then the result travels back to memory. It's like having a chef who refuses to cook in the kitchen. Instead, every ingredient must be brought to a special cooking room, prepared there, and then the finished dish carried back. For a single meal, this is merely inconvenient. For a restaurant serving thousands of dishes per minute, it's chaos.

Computer scientists are exploring ways to do the cooking right in the kitchen—to add processing capabilities directly to memory itself. This might mean building simple calculation units right into the memory modules, or stacking layers of silicon with memory on some levels and processors on others, creating three-dimensional chips that blur the traditional boundaries.

Software engineers, meanwhile, use the same term to mean something quite different but philosophically related. When they talk about in-memory processing, they mean keeping an entire database in your computer's Random Access Memory—RAM—rather than on a traditional hard disk or solid-state drive.

RAM is volatile, meaning it forgets everything when you turn off the power. But it's also blazingly fast compared to permanent storage. Reading data from RAM can be hundreds of times faster than reading from a hard disk. For certain applications, this speed difference changes everything.

The Speed of Thought

Let's put some numbers to this. Reading from a traditional spinning hard disk takes about ten milliseconds. That's ten-thousandths of a second. Sounds fast, right?

Reading from RAM takes about one hundred nanoseconds. That's one hundred billionths of a second.

The difference is approximately one hundred thousand times faster. It's the difference between waiting one second and waiting about a day and a half. Or, to use another analogy: if accessing RAM is like walking across your living room, accessing a hard disk is like driving from New York to Los Angeles.

This matters enormously for certain kinds of work. Imagine a call center where customer service representatives need to pull up complex customer histories while someone waits on the phone. Every second of delay compounds into frustrated customers and longer calls. Or consider a warehouse where inventory systems need to track thousands of items in real time, matching incoming orders against available stock, calculating optimal picking routes, updating quantities as workers move through the aisles.

With traditional disk-based databases, these queries might take seconds or even minutes. With in-memory databases, they can happen in milliseconds.

Why Disk Databases Dominated

If RAM is so much faster, why didn't we always keep databases in memory? The answer comes down to three letters: cost.

For most of computing history, RAM was exorbitantly expensive compared to disk storage. A gigabyte of RAM might cost a hundred times more than a gigabyte of disk space. Since business databases can easily grow into the hundreds of gigabytes or terabytes—a terabyte being roughly a thousand gigabytes—keeping everything in memory simply wasn't economical.

There was also the volatility problem. RAM forgets everything when power is lost. Disks, whether spinning platters or solid-state drives, retain their data indefinitely. For a business that needs to preserve years of transaction records, customer histories, and financial data, this persistence was non-negotiable.

So engineers built their systems around disks. They developed sophisticated database systems—Oracle, MySQL, Microsoft's SQL Server, and many others—optimized for reading from and writing to permanent storage. They created clever caching strategies to keep frequently-accessed data in memory while letting rarely-used data sleep on disk. They invented techniques for organizing data to minimize the number of disk reads required for common queries.

But no matter how clever these optimizations became, they couldn't escape a fundamental truth: at some point, complex queries required reading from disk, and disk access was slow.

The OLAP Workaround

Business intelligence teams developed an elaborate workaround for this problem. They created what are called OLAP cubes—where OLAP stands for Online Analytical Processing.

The idea was clever. Instead of asking complex questions of raw data and waiting forever for answers, you would pre-calculate answers to common questions and store those summaries. Want to know total sales by region for each quarter? Don't query the millions of individual sales records. Instead, build a cube that already contains those aggregations, organized in a multi-dimensional structure that makes retrieval fast.

But cubes have their own problems. Designing a good cube is an elaborate process requiring specialized expertise. If business needs change and executives suddenly want to slice the data in a new way, someone has to redesign and rebuild the cube. This can take weeks or months. And cubes are really only good for the specific questions they were designed to answer. Ask something unexpected and you're back to querying the slow underlying data.

Information technology staff could spend enormous amounts of time on this kind of optimization work—building indexes, designing aggregations, constructing cubes, analyzing query performance. It was a constant battle against the fundamental slowness of disk access.

What Changed

Several forces converged to make in-memory processing practical. The most important was the inexorable march of Moore's Law.

Gordon Moore, one of the founders of Intel, observed in 1965 that the number of transistors on a chip seemed to double roughly every two years. This pattern held for decades, and it applied not just to processors but to memory as well. The price of RAM dropped precipitously. What cost a fortune in 1990 became affordable by 2000, cheap by 2010, and almost trivial by 2020.

The move to 64-bit computing was equally crucial. Earlier 32-bit systems could only address about four gigabytes of memory—a hard ceiling that made large in-memory databases impossible regardless of cost. When 64-bit systems became standard, that ceiling lifted to a theoretical maximum of sixteen exabytes, far more memory than anyone could practically install.

Flash memory added another option. While not as fast as traditional RAM, flash memory is faster than spinning disks, and it retains data without power. For datasets too large to fit entirely in RAM, flash provided a middle ground—slower than memory but faster than disk, and more economical for very large datasets.

Column-oriented databases were a software innovation that complemented these hardware advances. Traditional databases organize data by rows—all the information about one customer, then all the information about the next customer. Column-oriented databases flip this, storing all the customer names together, then all the addresses together, and so on.

This might seem like a trivial difference, but it has profound implications for analytical queries. If you want to calculate the average order value across millions of customers, you only need to read the order value column. With row-based storage, you'd have to read past all the irrelevant data in each row—names, addresses, phone numbers, account creation dates—to get to the values you actually need. Column storage also compresses more efficiently, since similar data clusters together.

The Architecture in Practice

In an in-memory database system, data is loaded once from permanent storage into RAM when the system starts. After that, all queries run against the in-memory copy. The source database on disk only gets accessed during that initial load or when the data needs to be refreshed.

This is fundamentally different from caching, though the two concepts are often confused. A cache holds a subset of data—specific, frequently-accessed pieces chosen by the system to speed up common operations. An in-memory database holds everything, or at least everything needed for a particular analytical workload. Where a cache answers "what pieces of data are accessed most often?", an in-memory database answers "what if we just kept it all in memory?"

The benefits extend beyond raw speed. With disk-based systems, IT teams spend enormous effort on performance tuning. They create indexes to speed up common queries. They pre-aggregate data to avoid recalculating the same sums millions of times. They carefully design schemas to minimize disk access. All of this work becomes less critical—or unnecessary—when data lives in memory. The system is simply fast by default.

Many in-memory systems also offer visual, interactive dashboards. Business analysts can explore data, modify queries, drill down into details, all with near-instant response times. What previously required a ticket to IT and a multi-day wait can happen in a few clicks.

The Devices in Your Life

In-memory processing isn't just for enterprise data warehouses. It's embedded in many devices you use daily.

Your smartphone uses in-memory techniques constantly. When you switch between apps and they load instantly rather than taking several seconds, that's often because the app's working data stayed in memory. Game consoles like the PlayStation and Xbox rely heavily on in-memory processing to maintain the illusion of continuous, seamless worlds. The split-second responsiveness that modern games require would be impossible if every texture, every character model, every piece of level geometry had to be fetched from storage.

Fitness trackers and smartwatches process sensor data in memory to give you real-time feedback. The immediate display of your heart rate or step count requires fast in-memory processing of continuous data streams from the device's sensors.

Digital cameras, especially high-end ones, use in-memory processing for real-time image manipulation. When you see effects applied instantly in your viewfinder, or when the camera analyzes the scene to set exposure and focus, that's happening in memory. Smart TVs keep their interfaces snappy through similar techniques.

Voice assistants—Alexa, Siri, Google Assistant—benefit from in-memory processing when performing local operations, though the more complex understanding typically happens on distant servers.

The Hardware Revolution

Computer scientists are taking the concept even further, exploring ways to process data without moving it at all.

One approach, called processing-using-memory, adds limited computational capability directly to memory modules. Imagine a stick of RAM that can not only store numbers but also perform simple operations on them—multiplication, basic logical operations, copying data from one location to another—all without involving the CPU.

This isn't about replacing the processor. The CPU remains the sophisticated general-purpose engine that handles complex calculations. But for simple, repetitive operations on large amounts of data, having the memory do the work itself eliminates the bottleneck of moving data back and forth.

Another approach, processing-near-memory, exploits advances in three-dimensional chip manufacturing. Instead of keeping processor and memory on separate chips, engineers are stacking thin layers of silicon on top of each other. Some layers hold memory. Others hold processing units. The layers connect through tiny vertical channels called through-silicon vias. Data travels vertically through the chip stack rather than horizontally across a circuit board, dramatically reducing distance and therefore latency.

These technologies are moving from research labs to real products. The AI and machine learning boom has accelerated this transition. Training large neural networks requires processing unfathomable amounts of data, and traditional architectures struggle with the data movement involved. In-memory and near-memory processing architectures are particularly well-suited to the massively parallel, data-intensive calculations that AI demands.

The Trade-offs

In-memory processing isn't universally superior to disk-based approaches. Like any engineering choice, it involves trade-offs.

RAM remains more expensive per gigabyte than disk storage, even after decades of price drops. For truly enormous datasets—petabytes of data, roughly a million gigabytes—keeping everything in memory might be prohibitively expensive. Flash memory offers a middle ground, but even that costs more than traditional disk.

Security presents another consideration. When vast amounts of data sit readily accessible in memory, the attack surface expands. Anyone who gains access to the system potentially has access to everything, immediately. With disk-based systems, the sheer slowness of reading large amounts of data provides a kind of accidental protection—you'd notice someone trying to exfiltrate gigabytes of data because it would take a long time.

The volatility of RAM means in-memory systems need careful strategies for persistence. If the power fails, what happens to data that only exists in memory? Modern systems typically maintain copies on disk and implement recovery procedures, but this adds complexity.

And for data that changes rarely and is queried only occasionally, the speed benefits of in-memory processing might not justify the cost. A historical archive that gets accessed a few times per month doesn't need to live in expensive, power-hungry RAM.

The Direction of the Field

The fundamental physics favors in-memory approaches. Moving data costs energy. Moving data over longer distances costs more energy. And as chips have become more powerful, the energy cost of data movement has become an increasingly dominant factor in overall system power consumption.

This matters enormously for data centers, which already consume about one percent of global electricity. AI training runs, which can require weeks of computation on thousands of specialized processors, are particularly power-hungry. Any architecture that reduces data movement saves energy—and money.

The trend extends to edge computing, where processing happens close to where data is generated rather than in distant data centers. Autonomous vehicles, industrial sensors, medical devices—all benefit from processing data locally and quickly, which often means in memory.

We're also seeing the traditional boundaries between memory and storage blur further with technologies like Intel's now-discontinued Optane persistent memory, which combined the speed of memory with the persistence of storage. Though that particular product line ended, the concept of persistent memory—fast like RAM, permanent like disk—continues to develop.

John von Neumann designed the architecture that bears his name in 1945, when memory was a few thousand bits stored in glass tubes and processing happened in rooms full of vacuum tubes. The idea of separating storage and computation made perfect sense for that technology. Nearly eighty years later, we're finally reconsidering that fundamental choice—not abandoning it entirely, but finding clever ways to work around its limitations.

The processor that spends its time waiting may finally have less to wait for.