Compute Performance (FP8 Tensor)▲ 2.0x vs H100

3.9 PetaFLOPS

FP8 Tensor performance with Transformer Engine

H200 Hopper3.9 PF

H100 Hopper2.0 PF

Memory System

141GB HBM3e

8 Hi-Stacks / 5120-bit interface

4.8 TB/s Bandwidth

Interconnect & I/O

900 GB/s NVLink 4

Bi-directional total bandwidth

PCIe Gen 5.0 x16

Real-World Applications

Large Language Model Inference

The H200's 141GB HBM3e allows serving larger models entirely in GPU memory, eliminating costly model parallelism for models up to 70B parameters. Delivers nearly 2x the inference throughput of H100 for LLM workloads.

Recommendation Systems at Scale

Embedding tables for production recommendation models fit entirely in HBM3e memory, reducing latency by eliminating host memory round-trips. Ideal for real-time ad serving and content recommendation at hyperscale.

High-Performance Computing

The 4.8 TB/s memory bandwidth accelerates memory-bound HPC workloads including weather forecasting, seismic analysis, and computational chemistry simulations with up to 110x speedup over CPUs.

Multi-Modal AI Training

Train vision-language models and diffusion models with larger batch sizes thanks to expanded memory capacity. The H200 enables training runs on datasets combining text, images, and video without memory constraints.

Full Technical Specifications

GPU Architecture	NVIDIA Hopper
Transistor Count	80 Billion (4N Process)
CUDA Cores	16,896
Tensor Cores	4th Gen (528 cores)
Memory Capacity	141 GB HBM3e
Memory Interface	5120-bit
Memory Bandwidth	4.8 TB/s
L2 Cache	50 MB
Form Factor	SXM5
Thermal Design Power	700W (Configurable)

NVIDIA Hopper H200