Compute Performance (FP4 Tensor)▲ 10x vs H100

40.0 PetaFLOPS

Combined FP4 performance from two Blackwell GPUs + Grace CPU superchip

GB200 Superchip40.0 PF

B200 (Single GPU)20.0 PF

H100 Hopper2.0 PF

Memory System

384GB HBM3e

Combined 2x 192GB GPUs / NVLink-C2C coherent with Grace CPU

16.0 TB/s Bandwidth

Interconnect & I/O

1.8 TB/s NVLink 5

Per-GPU bi-directional, 900 GB/s NVLink-C2C CPU↔GPU

PCIe Gen 6.0 x16

Real-World Applications

Rack-Scale AI Training

The GB200 NVL72 connects 72 Blackwell GPUs via NVLink into a single 130 TB/s domain that acts as one massive GPU. This enables 30x faster real-time inference for trillion-parameter models compared to H100-based systems, with 30TB of fast memory per rack.

Unified CPU+GPU Inference

The Grace CPU's coherent NVLink-C2C connection to both Blackwell GPUs eliminates PCIe bottlenecks for CPU-GPU data transfer. This superchip architecture enables seamless model serving where preprocessing, tokenization, and inference happen without data copy overhead.

Large-Scale HPC Simulation

Grace Blackwell's unified memory architecture allows CPU and GPU to share a coherent memory pool, ideal for hybrid HPC workloads that mix traditional simulation with AI-accelerated analysis — from molecular dynamics to digital twin simulations.

Multi-Modal Foundation Models

With 384GB of combined HBM3e per superchip, the GB200 supports training and serving massive multi-modal models that process text, images, video, and audio simultaneously without memory-constrained tensor parallelism across multiple nodes.

Full Technical Specifications

Configuration	1x Grace CPU + 2x Blackwell GPUs
GPU Transistors	208 Billion per GPU (4NP Process)
Grace CPU	72-Core Arm Neoverse V2
CPU Memory	Up to 480GB LPDDR5X
GPU Memory	384 GB HBM3e (2x 192GB)
GPU Memory Bandwidth	16.0 TB/s Combined
CPU↔GPU Interconnect	NVLink-C2C 900 GB/s
NVLink GPU↔GPU	1.8 TB/s per GPU
Form Factor	Liquid-Cooled Module (NVL72 Rack)
Thermal Design Power	~2700W (Full Superchip)

NVIDIA GB200 Grace Blackwell Superchip