H100 vs GB200 NVL72 Training Benchmarks - Power, TCO, and Reliability Analysis, Software Improvement Over Time
Frontier model training has pushed GPUs and AI systems to their absolute limits, making cost, efficiency, power, performance per TCO, and reliability central to the discussion on effective training. The Hopper vs Blackwell comparisons are not as simple as Nvidia would have you believe.
In this report, we will start by present the results of benchmark runs across over 2,000 H100 GPUs, analyzing data on model flops utilization (MFU), total cost of ownership (TCO) and cost per training 1M tokens. We will also discuss energy use, examining the energy in utility Joules consumed for each token trained and compare it to the average US household annual energy usage, reframing power efficiency in societal context. We will also show the results of this analysis when scaling the GPU cluster from 128 H100s to 2048 H100s and across different versions of Nvidia software.
Later in this report, we will also analyze GB200 NVL72 benchmark results across Llama4 400B MoE and DeepSeek 670B MoE and compare this data to our earlier results from the H100. We will discuss whether the GB200 NVL72 performance per $ advantages survives once reliability issues are factored in.
Downtime from poor reliability and lost engineering time is one of the main factors that we will capture in our perf per TCO calculations. Currently there are no large-scale training runs done yet on GB200 NVL72 as software continues to mature and reliability challenges are worked through. This means that Nvidia’s H100 and H200 as well as Google TPUs remain the only GPUs that are today being successfully used to complete frontier-scale training. As it stands today, even the most advanced operators at frontier labs and CSPs are not yet able to carry out mega training runs on the GB200 NVL72.
With that said, every new architecture naturally requires time for the ecosystem to ramp software to effectively utilize the architecture. The GB200 NVL72 ramp is slightly slower than prior generations, but not by much, and we are confident that before the end of the year, GB200 NVL72 software would have improved considerably. Combined with frontier models architecture being codesigned with the larger scale up world size in mind, we expect that there will be significant efficiency gains from using the GB200 NVL72 by the end of the year.
On the reliability front, there will continue to be significant challenges that Nvidia must work even closer with its partners to ...
This excerpt is provided for preview purposes. Full article content is available on the original publication.