Runpod × OpenAI: Parameter Golf challenge is live
You've unlocked a referral bonus! Sign up today and you'll get a random credit bonus between $5 and $500
You've unlocked a referral bonus!
Claim Your Bonus
Claim Bonus
Moe Kaloub
Moe Kaloub

Nvidia H100 GPU: Specs, VRAM, Price, and AI Performance

The Nvidia H100 is the company's most important data-center GPU, the chip that has defined the current AI compute era. Built on Hopper architecture and available in both SXM and PCIe form factors, the H100 delivers a generational leap over the A100 in memory bandwidth, AI compute throughput, and inference efficiency. It is the de facto standard for large-model training and high-throughput inference at scale.

This guide covers everything you need to know: full specs, VRAM capacity and bandwidth, SXM vs PCIe differences, current pricing, how the H100 compares to the A100 and H200, and how to access H100 instances on-demand through Runpod.

How Much VRAM Does the H100 Have?

The Nvidia H100 has 80 GB of HBM3 memory (SXM variant) with approximately 3.35 TB/s of memory bandwidth. The PCIe variant also offers 80 GB but uses HBM2e memory with ~2 TB/s bandwidth. Both variants are available in an 80 GB configuration; there is no 40 GB H100.

80 GB is the decisive VRAM advantage of the H100 over consumer GPUs. For context:

  • H100 SXM: 80 GB HBM3, ~3.35 TB/s bandwidth
  • H100 PCIe: 80 GB HBM2e, ~2 TB/s bandwidth
  • A100 80GB: 80 GB HBM2e, ~2 TB/s bandwidth
  • RTX 5090: 32 GB GDDR7, ~1.79 TB/s bandwidth
  • RTX 4090: 24 GB GDDR6X, ~1 TB/s bandwidth

The HBM3 memory in the H100 SXM is what separates it from everything else at scale. For workloads involving models over 30-40 GB, including 70B+ parameter LLMs, large vision transformers, and multimodal models, the 80 GB capacity and 3.35 TB/s bandwidth are the enabling factors. Consumer GPUs, including the RTX 5090, cannot substitute for this in large-model contexts.

Nvidia H100 Specs

The H100 is built on Nvidia's Hopper architecture (GH100 die) using TSMC's 4N process. It is available in two form factors with meaningfully different performance profiles:

H100 SXM5 (High-Performance Variant)

  • CUDA Cores: 16,896
  • Tensor Cores: 528 (4th generation) with FP8 Transformer Engine
  • VRAM: 80 GB HBM3
  • Memory Bandwidth: 3.35 TB/s
  • FP8 Throughput: 3,958 TFLOPS
  • TF32 Throughput: 989 TFLOPS (with sparsity)
  • FP64 Throughput: 34 TFLOPS
  • TDP: 700W
  • NVLink: NVLink 4.0, 900 GB/s bidirectional bandwidth per GPU
  • Multi-Instance GPU (MIG): Up to 7 isolated GPU instances
  • Form Factor: SXM5 (requires compatible server platform: DGX H100, HGX H100)

H100 PCIe (Standard Variant)

  • CUDA Cores: 14,592
  • Tensor Cores: 456 (4th generation)
  • VRAM: 80 GB HBM2e
  • Memory Bandwidth: ~2 TB/s
  • FP8 Throughput: 3,026 TFLOPS (with sparsity)
  • TDP: 350W
  • NVLink: Not supported (PCIe only)
  • MIG: Up to 7 instances
  • Form Factor: Standard PCIe, fits in conventional server slots

H100 SXM vs PCIe: Which Should You Use?

The SXM and PCIe variants of the H100 are substantially different products, not minor packaging variations:

  • Memory bandwidth: The SXM5's HBM3 delivers 3.35 TB/s vs ~2 TB/s on the PCIe variant, a 68% difference that directly impacts training throughput and inference speed for large models.
  • NVLink: SXM supports NVLink 4.0 with 900 GB/s bidirectional bandwidth per GPU, enabling high-efficiency multi-GPU training. The PCIe variant uses standard PCIe interconnects, which are significantly slower for GPU-to-GPU communication.
  • Power and infrastructure: SXM requires a compatible server platform (DGX H100 or HGX H100) and a 700W power budget per card. PCIe fits in standard servers at 350W and is much easier to deploy.
  • Cost: SXM variants command a significant premium over PCIe. Complete DGX H100 systems (8x SXM H100) are priced at approximately $250,000-$400,000.

For large-scale distributed training, the SXM variant is the right choice, as the NVLink bandwidth and memory throughput advantages compound at scale. For inference serving or single-GPU fine-tuning workloads, the PCIe variant offers the same 80 GB VRAM at lower cost and infrastructure complexity.

H100 AI Performance

The H100's performance advantage over the A100 is substantial and comes from three main sources:

  • FP8 Transformer Engine: The H100 introduces FP8 precision with automatic mixed-precision switching via the Transformer Engine. For transformer-based models, this delivers roughly 3-4x the throughput of the A100 at FP16. The engine automatically adjusts precision layer-by-layer to maintain accuracy without manual tuning.
  • Memory bandwidth: The SXM variant's 3.35 TB/s is ~68% faster than the A100's 2 TB/s. For large-batch inference where memory bandwidth is the primary bottleneck, this translates directly to higher throughput.
  • NVLink 4.0: 900 GB/s bidirectional bandwidth per GPU (vs 600 GB/s on A100 NVLink 3.0) enables more efficient distributed training, with multi-GPU systems maintaining higher efficiency as cluster size scales.

In practice, H100 clusters have reduced large-model training times by 2-3x compared to equivalent A100 setups. For inference, throughput improvements depend heavily on model size and batch configuration, but gains of 2-4x are common on large transformer models.

H100 Price: What Does It Cost?

The H100 is priced as an enterprise data-center component. Standalone PCIe cards sell for approximately $25,000-$30,000. SXM variants are typically only available as part of complete server systems:

  • H100 PCIe (standalone): $25,000-$30,000
  • DGX H100 (8x SXM H100): $250,000-$400,000
  • HGX H100 (4x or 8x SXM H100 baseboard): Varies by OEM configuration

Secondary market pricing has moderated from peak 2023 levels (when H100s were selling for $80,000-$120,000 each) but remains above MSRP due to persistent demand from hyperscalers and AI companies. Lead times through authorized channels remain unpredictable.

For most AI developers, researchers, and teams, renting H100 instances on Runpod is significantly more practical than hardware procurement. Cloud access eliminates the upfront capital cost, infrastructure requirements, and availability risk entirely.

H100 vs A100: How Do They Compare?

The H100 is a substantial generational upgrade over the A100, not an incremental refresh:

  • Compute throughput: The H100 delivers roughly 3-4x better performance on transformer workloads via FP8 and the Transformer Engine. A100 supports FP16 and BF16 but does not have FP8 or automatic precision switching.
  • Memory bandwidth: H100 SXM: 3.35 TB/s. A100: 2 TB/s. The 68% bandwidth advantage matters significantly for large-batch inference and large-model training.
  • NVLink: H100 uses NVLink 4.0 at 900 GB/s bidirectional vs A100's NVLink 3.0 at 600 GB/s, 50% more inter-GPU bandwidth.
  • VRAM: Both offer 80 GB maximum. A100 also has a 40 GB variant for lower-cost deployments.
  • MIG: Both support up to 7 MIG instances, but the H100's higher throughput per instance improves MIG utilization economics.
  • Cost: A100 PCIe has fallen to $10,000-$15,000 on the secondary market as H100 supply has improved. For workloads that fit within the A100's capabilities, it remains a compelling value.

The A100 remains relevant for inference workloads that don't require FP8 throughput, and for teams where the H100 price premium isn't justified by workload requirements.

H100 vs H200: What's the Difference?

The H200 is Nvidia's updated Hopper GPU, announced in late 2023. It uses the same GH100 die as the H100 but upgrades the memory subsystem:

  • H200 VRAM: 141 GB HBM3e (vs 80 GB HBM3 on H100 SXM)
  • H200 memory bandwidth: ~4.8 TB/s (vs 3.35 TB/s on H100 SXM)
  • Compute: Identical die, same CUDA and Tensor Core counts
  • Price: H200 commands a premium over H100, and availability is primarily through major cloud providers and large enterprise buyers

For workloads that benefit from memory capacity beyond 80 GB (very large models, long context windows), the H200's 141 GB is meaningful. For most workloads that already fit in 80 GB, the H100 and H200 will perform similarly, as the bandwidth improvement primarily helps memory-bound inference at large batch sizes.

Rent an H100 on Runpod

Runpod provides on-demand access to H100 GPU instances, both PCIe and SXM configurations, without hardware procurement, infrastructure setup, or long-term commitment. You get full root access to a containerized GPU environment and can be running within minutes of signing up.

  • On-demand H100 instances: Hourly billing with no upfront cost. Access the same hardware that powers production AI at hyperscalers without the $25,000+ purchase requirement.
  • SXM and PCIe variants: Choose the right H100 configuration for your workload. PCIe for single-GPU fine-tuning and inference, SXM for high-throughput multi-GPU training.
  • Serverless endpoints: Deploy models on H100 workers and pay per inference request. Scale to zero when idle with no idle GPU cost.
  • MIG instances: Access fractional H100 compute for lighter workloads at lower per-hour rates.
  • Network Volumes: Attach persistent storage to your H100 instances to keep model weights, datasets, and checkpoints available across sessions. Standard storage starts at $0.07/GB/mo (first TB), with high-performance storage at $0.14/GB/mo for faster data loading on large training runs. A single volume can be shared across multiple pods simultaneously.
  • Templates: Pre-built environments for PyTorch, vLLM, Jupyter, TGI, and other frameworks with no manual CUDA setup required.
  • Flexibility: Switch between H100, A100, H200, or consumer GPU instances based on workload requirements.

See current H100 availability and pricing on the Runpod pricing page.

H100 FAQs

How much VRAM does the H100 have?

The H100 has 80 GB of HBM memory in all variants. The SXM5 variant uses HBM3 with ~3.35 TB/s bandwidth; the PCIe variant uses HBM2e with ~2 TB/s bandwidth. Unlike the A100, there is no 40 GB H100 configuration.

What is the Nvidia H100?

The Nvidia H100 is a data-center GPU built on the Hopper architecture (GH100 die), released in 2022. It is designed for large-scale AI training and inference, featuring 80 GB of HBM memory, 4th-generation Tensor Cores with FP8 support, the Transformer Engine for automatic mixed-precision training, NVLink 4.0 for high-bandwidth multi-GPU scaling, and MIG support for workload partitioning.

How much does the H100 cost?

H100 PCIe cards sell for approximately $25,000-$30,000 through authorized resellers. SXM variants are primarily available as part of complete server systems (DGX H100 at $250,000-$400,000). On Runpod, you can access H100 instances on-demand at hourly rates, eliminating the upfront hardware cost entirely.

H100 vs A100: which is better for AI?

The H100 is substantially better for AI training workloads, roughly 3-4x faster on transformer models via FP8 and the Transformer Engine, with 68% more memory bandwidth on the SXM variant. For inference workloads where the A100's capabilities are sufficient, the A100 offers better cost-per-performance given its lower price. Both are available on Runpod.

What is the difference between H100 SXM and PCIe?

The SXM variant uses HBM3 memory with 3.35 TB/s bandwidth, supports NVLink 4.0 for multi-GPU training, and has a 700W TDP requiring a specialized server platform. The PCIe variant uses HBM2e with ~2 TB/s bandwidth, fits in standard servers at 350W, and does not support NVLink. For distributed training at scale, SXM is the correct choice. For inference or single-GPU workloads, PCIe is more practical.

How does the H100 compare to the RTX 4090 and RTX 5090?

The H100 has 80 GB of HBM memory vs 24 GB (4090) or 32 GB (5090) on consumer cards. Its memory bandwidth (3.35 TB/s SXM) is roughly 3x the RTX 5090 and nearly 4x the RTX 4090. It supports NVLink for multi-GPU training and includes enterprise features like ECC memory and MIG. For workloads under 32 GB, consumer GPUs offer far better cost-per-FLOP. For large-model training, multi-GPU jobs, or workloads requiring 40-80 GB of VRAM, the H100 is the right choice, and both consumer and datacenter GPUs are available on Runpod.

Is the H100 good for LLM inference?

Yes. The H100 is the standard GPU for production LLM inference at scale. Its 80 GB VRAM enables serving large models (30B-70B+ parameters) at full precision without quantization. The Transformer Engine and FP8 support deliver high token throughput. For teams running inference-heavy workloads at production scale, H100 instances on Runpod Serverless provide cost-effective per-request pricing without idle GPU overhead.

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.