
The new Blackwell architecture promises a 25x reduction in cost and energy consumption while delivering unprecedented performance for trillion-parameter models.
NVIDIA has once again raised the bar for AI hardware with the introduction of the Blackwell B200 GPU. Designed specifically for the demands of generative AI, Blackwell features 208 billion transistors and utilizes a custom-built 4NP TSMC process. The architecture is capable of supporting models with up to 10 trillion parameters, a scale that was previously considered computationally prohibitive.
One of the most significant innovations in Blackwell is the second-generation Transformer Engine, which uses new 4-bit floating point (FP4) precision. This allows for double the compute and model size while maintaining high accuracy. Furthermore, the fifth-generation NVLink interconnect provides 1.8 TB/s bidirectional throughput per GPU, ensuring seamless communication in massive server clusters.
Sustainability has also been a focal point. Jensen Huang, CEO of NVIDIA, highlighted that Blackwell can reduce energy consumption for LLM inference by up to 25 times compared to the H100 Hopper architecture. As global data centers face increasing pressure to go green, this efficiency gain is as crucial as the raw performance boost.
