The GB200 NVL72 connects 72 Blackwell GPUs via NVLink into a single 130 TB/s domain that acts as one massive GPU. This enables 30x faster real-time inference for trillion-parameter models compared to H100-based systems, with 30TB of fast memory per rack.
The Grace CPU's coherent NVLink-C2C connection to both Blackwell GPUs eliminates PCIe bottlenecks for CPU-GPU data transfer. This superchip architecture enables seamless model serving where preprocessing, tokenization, and inference happen without data copy overhead.
Grace Blackwell's unified memory architecture allows CPU and GPU to share a coherent memory pool, ideal for hybrid HPC workloads that mix traditional simulation with AI-accelerated analysis — from molecular dynamics to digital twin simulations.
With 384GB of combined HBM3e per superchip, the GB200 supports training and serving massive multi-modal models that process text, images, video, and audio simultaneously without memory-constrained tensor parallelism across multiple nodes.
| Configuration | 1x Grace CPU + 2x Blackwell GPUs |
| GPU Transistors | 208 Billion per GPU (4NP Process) |
| Grace CPU | 72-Core Arm Neoverse V2 |
| CPU Memory | Up to 480GB LPDDR5X |
| GPU Memory | 384 GB HBM3e (2x 192GB) |
| GPU Memory Bandwidth | 16.0 TB/s Combined |
| CPU↔GPU Interconnect | NVLink-C2C 900 GB/s |
| NVLink GPU↔GPU | 1.8 TB/s per GPU |
| Form Factor | Liquid-Cooled Module (NVL72 Rack) |
| Thermal Design Power | ~2700W (Full Superchip) |