Train frontier-scale models with 192GB HBM3e per GPU. The B200's FP4 Tensor performance enables training runs that previously required 5x more H100s, dramatically reducing cluster size and operational costs for models above 70B parameters.
Serve production LLM workloads at scale with 2nd Gen Transformer Engine and native FP4 quantization. A single B200 node can handle the inference throughput of an entire H100 rack for latency-sensitive applications like real-time chat and code completion.
Accelerate molecular dynamics, climate modeling, and computational fluid dynamics. The 9.5 TB/s memory bandwidth and NVLink 5 interconnect make multi-GPU simulation workloads up to 4x faster than previous generation.
Power video generation, 3D rendering, and multimodal AI pipelines. The B200's massive memory capacity supports models like Sora-class video generators and real-time neural radiance fields without the memory bottlenecks that limit H100-based deployments.
| Transistor Count | 208 Billion (4NP Process) |
| Die Size | Dual-Die CoWoS-L (Reticle Limit x2) |
| CUDA Cores | 160 Streaming Multiprocessors (Est.) |
| Tensor Cores | 5th Gen Tensor Core Architecture |
| Memory Interface | 8192-bit HBM3e |
| L2 Cache | 128MB Unified |
| Form Factor | SXM6 / PCIe Add-in Card |
| Thermal Design Power | 1000W (Configurable) |