The H200's 141GB HBM3e allows serving larger models entirely in GPU memory, eliminating costly model parallelism for models up to 70B parameters. Delivers nearly 2x the inference throughput of H100 for LLM workloads.
Embedding tables for production recommendation models fit entirely in HBM3e memory, reducing latency by eliminating host memory round-trips. Ideal for real-time ad serving and content recommendation at hyperscale.
The 4.8 TB/s memory bandwidth accelerates memory-bound HPC workloads including weather forecasting, seismic analysis, and computational chemistry simulations with up to 110x speedup over CPUs.
Train vision-language models and diffusion models with larger batch sizes thanks to expanded memory capacity. The H200 enables training runs on datasets combining text, images, and video without memory constraints.
| GPU Architecture | NVIDIA Hopper |
| Transistor Count | 80 Billion (4N Process) |
| CUDA Cores | 16,896 |
| Tensor Cores | 4th Gen (528 cores) |
| Memory Capacity | 141 GB HBM3e |
| Memory Interface | 5120-bit |
| Memory Bandwidth | 4.8 TB/s |
| L2 Cache | 50 MB |
| Form Factor | SXM5 |
| Thermal Design Power | 700W (Configurable) |