A Close Look at SRAM for Inference in the Age of HBM Supremacy
The recent news of static RAM (SRAM) based accelerators has led to a flurry of discussion about memory on social media. It is uniquely attractive because it avoids the use of high bandwidth memory (HBM) and chip-on-wafer-on-substrate (CoWoS) packaging, both of which are heavily supply constrained.
However, there is a lot of misunderstanding on what SRAM actually is, and how it differs from the incumbent HBM solution. There are also misguided fears that SRAM will affect demand for HBM in future AI accelerators. Even Jensen Huang got asked about SRAM vs. HBM. I made a quick clarification on X about some basic SRAM facts that can help people cut through the noise, which became my most viral post ever.
In this article, we will discuss the pros and cons of SRAM compared to HBM and provide objective perspectives on the role of each kind of memory for AI inference. We will compare SRAM and HBM across five categories: structure, scaling, capacity, bandwidth, and cost.
For this piece, I am joined by who writes the publication on Substack. He is an expert in memory interfaces for AI accelerators, and works for d-Matrix building their next generation Raptor inference architecture. Subscribe to his Substack for deep insights into chip design from someone who has worked in semiconductors for two decades.
Here is a post outline:
SRAM Overview
Unit Cell Structure: 6T SRAM vs 1T1C DRAM
Process and Density Scaling
Capacity
Bandwidth Performance
SRAM Scaling Limits
For paid subscribers:
SRAM vs. HBM cost comparisons
If you would like to purchase an ebook version of this post, use the button below. Paid subscribers get the same downloadable epub for free, after the paywall.
Note: This article has been updated since it was originally published and corrections have been propagated to all sources, including downloadable digital content. Here is the published errata.
SRAM Overview
SRAM is a form of memory often found in processors that holds digital information without the need for constant refreshing. Once a 0 or 1 is written to an SRAM unit cell, it holds its state as long as power is available, unless intentionally changed.
The use of SRAM in computing is as old as computers themselves. They are most commonly used in registers and low level caches often referred to as L1-L3 (L1 is closest, L3 is farthest) depending on how close to the processor it is located. The ...
This excerpt is provided for preview purposes. Full article content is available on the original publication.