Runpod × OpenAI: Parameter Golf challenge is live
You've unlocked a referral bonus! Sign up today and you'll get a random credit bonus between $5 and $500
You've unlocked a referral bonus!
Claim Your Bonus
Claim Bonus
Emmett Fear
Emmett Fear

Rent H100 SXM in the Cloud – Deploy in Seconds on Runpod

Instant access to NVIDIA H100 SXM GPUs—ideal for training large language models and high-performance computing—with hourly pricing, global availability, and fast deployment. Rent cloud GPUs on Runpod cloud for AI to benefit from flexible scaling and cutting-edge performance without upfront investment. The H100 SXM's advanced Tensor Cores and Transformer Engine deliver up to 4x faster AI training, making it perfect for demanding AI applications.

Why Choose the NVIDIA H100 SXM

The NVIDIA H100 SXM GPU, built on the advanced Hopper architecture, offers unparalleled performance for AI and machine learning workloads. With features like fourth-generation Tensor Cores and native FP8 precision, it dramatically accelerates AI training, making it an ideal choice and one of the best GPUs for AI models for cutting-edge AI development and research.

Benefits

  • Unprecedented AI Performance
    Equipped with fourth-generation Tensor Cores and a dedicated Transformer Engine, the H100 SXM supports FP8 precision, enabling up to 4x faster AI training compared to previous generations. For a detailed comparison, see H100 NVL vs H100 SXM. This capability is crucial for training massive models like large language models (LLMs) and vision transformers with minimal loss in accuracy.
  • High Memory Capacity and Bandwidth
    The H100 SXM features HBM3 memory with options for 80 GB and 96 GB capacities and delivers a massive bandwidth of 3.35–3.36 TB/s. This combination supports large-scale AI training and inference, handling extensive datasets and complex models with ease.
  • Enhanced Multi-GPU Communication
    The SXM form factor, when compared to the PCIe variant (H100 PCIe vs H100 SXM), provides superior GPU-to-GPU communication via NVLink, essential for distributed training and parallel workloads. This ensures near-linear scaling across multiple GPUs, maximizing performance efficiency in multi-GPU setups.
  • Flexible Resource Utilization
    With Multi-Instance GPU (MIG) technology, the H100 SXM can be partitioned into up to 7 instances, each with isolated compute and memory resources. This feature supports multi-tenant AI workloads and flexible deployment scenarios, optimizing GPU utilization.
  • Cost-Effective Access to Cutting-Edge Hardware
    Renting H100 SXM GPUs offers access to state-of-the-art hardware without the hefty capital investment. For current rental rates and instance options, refer to the Runpod pricing page.

Specifications

Feature Value
Architecture Hopper (GH100)
Process Technology TSMC's 5nm
Transistor Count 80 billion
Die Size 814 mm²
Memory Capacity 80 GB and 96 GB HBM3
Memory Bandwidth 3.35–3.36 TB/s
FP64 Performance 34 TFLOPS (67 TFLOPS with Tensor Core)
FP32 Performance 67 TFLOPS
Tensor Core TF32 Performance 989 TFLOPS
BFLOAT16 / FP16 Tensor Core Performance 1,979 TFLOPS
FP8 Tensor Core Performance 3,958 TFLOPS
INT8 Tensor Core Performance 3,958 TOPS
Base Clock Speed ~1,590–1,665 MHz
Boost Clock Speed ~1,837–1,980 MHz
Thermal Design Power (TDP) Up to 700W
Multi-Instance GPU (MIG) Support Up to 7 instances per physical GPU
System Interface PCIe 5.0 x16
Decoders 7 NVDEC, 7 JPEG

For a comprehensive understanding of the FLOPS performance of the H100 and details on the power consumption of the NVIDIA H100, refer to our detailed FAQs.

FAQ

What is the minimum rental duration for H100 SXM GPUs?

Minimum rental durations vary by provider. Runpod offers per-second billing, meaning there is no minimum commitment—you pay only for what you use. For details on available instance types and billing, see the Runpod pricing page.

How are data handling and security procedures managed?

Cloud providers offering H100 SXM rentals typically implement robust security measures, including data encryption at rest and in transit. Look for providers with SOC 2, ISO 27001, or other relevant certifications. The H100 itself supports hardware-level confidential computing features, adding an extra layer of security for sensitive AI workloads.

What happens in case of hardware failure?

Reputable providers have redundancy and failover protocols in place. In the event of hardware failure, your workload should be automatically migrated to functional hardware. Always check the provider's SLA for specific guarantees and compensation policies related to downtime or hardware issues.

How do I choose between on-demand and reserved instances?

On-demand instances offer maximum flexibility. Reserved instances provide significant discounts for longer-term commitments. Choose on-demand for variable or short-term workloads, and reserved for predictable, ongoing projects. See the Runpod pricing page for current on-demand and reserved rates to compare options for your workload.

What storage options are available, and how do they impact performance?

H100 SXM rentals often come with high-performance storage options like NVMe SSDs to match the GPU's capabilities. Some providers offer tiered storage solutions, allowing you to balance cost and performance. For optimal performance, especially in distributed training scenarios, ensure your provider offers storage solutions with throughput matching the H100's data processing capabilities.

What are the networking capabilities and limitations?

H100 SXM configurations typically support high-bandwidth networking, often up to 350 Gbps or more. This is crucial for multi-GPU setups and distributed training. Verify that your provider's networking infrastructure can fully support the H100's capabilities, especially if you're planning to run multi-node workloads.

How compatible are H100 SXMs with common AI frameworks and software?

H100 SXMs are highly compatible with popular AI frameworks like PyTorch and TensorFlow. Many providers bundle the NVIDIA AI Enterprise software suite, which includes optimized versions of these frameworks. Always ensure you're using the latest versions of your preferred frameworks to take full advantage of the H100's features, such as FP8 precision and the Transformer Engine.

What multi-GPU configuration options are available?

Providers typically offer various multi-GPU configurations, from single nodes with multiple H100 SXMs to multi-node clusters. The SXM form factor enables high-speed GPU-to-GPU communication via NVLink, which is crucial for scaling performance in distributed training scenarios. Check the Runpod pricing page for specific multi-GPU options and their associated pricing.

How many H100 SXM GPUs do I need for my workload?

The number of GPUs required depends on your specific use case: For large language model training (70B+ parameters), 8 or more GPUs are often recommended. For smaller models or fine-tuning tasks, 1–4 GPUs may suffice. Real-time inference workloads can often be handled by a single GPU, leveraging the H100's MIG technology to serve multiple models concurrently. Always benchmark your specific workload to determine the optimal configuration.

What Runpod-specific features or limitations should I be aware of?

Runpod is one of the leading serverless GPU platforms, offering flexible GPU rental options, including H100 SXMs. Check Runpod's documentation for details on available regions and data centers, supported frameworks and software environments, persistent storage options, networking configurations, and support for custom containers or environments. Additionally, refer to the Runpod pricing page for current rates including any discounts for sustained usage or reserved instances.

How can I monitor usage and manage costs effectively?

Most providers, including Runpod, offer detailed monitoring and billing dashboards. Best practices include setting up alerts for usage thresholds, regularly reviewing utilization metrics to right-size your resources, leveraging auto-scaling features for dynamic workloads, considering reserved instances for long-term predictable usage, and using spot instances for fault-tolerant workloads to save costs.

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.