Instant Clusters

GPU clusters,
deployed instantly.

Launch high-performance multi-node GPU clusters for AI, ML, LLMs, and HPC workloads—fully optimized, rapidly deployed, and cost-effective.

Launch in minutes.

Get clusters up and running faster than traditional cloud providers.

Pay by the second.

Ultra-flexible, on-demand billing—no commitments.

Scale globally.

Spin up hundreds of GPUs with a single command.
Product

Deploy multi-node interconnected clusters.

Self-service provisioning, per-second billing, and complete flexibility for AI workloads.
Enterprise

Enterprise-grade.
From day one.

Built for scale, secured for trust, and designed to meet your most demanding needs.

99.9% uptime

Run critical workloads with confidence, backed by industry-leading reliability.

Secure by default

We are in the process of obtaining SOC2, HIPAA and GDPR certifications.

Scale to thousands of GPUs

Adapt instantly to demand with infrastructure that grows with you.
Clients

Trusted by today's leaders, built for tomorrow's pioneers.

Engineered for teams building the future.
FAQs

Questions? Answers.

Curious about unlocking GPU power in the cloud? Get clear answers to accelerate your projects with on-demand high-performance compute.
What is the difference between a GPU pod and an Instant Cluster?
A GPU pod is a single instance with one or more GPUs within the same node. An Instant Cluster consists of multiple nodes interconnected with high-speed networking, allowing for workloads that span across multiple machines. Clusters are ideal for large model inference and distributed training that exceeds the capacity of a single node.
What is the minimum and maximum cluster size?
Anyone can access 2 nodes on-demand with up to 16 GPUs. To access larger clusters up to 8 nodes (64 GPUs), you'll need to request a spend limit increase.
How is billing handled for Instant Clusters?
Instant Clusters are billed by the second, just like our regular GPU pods. You're only charged for the compute time you actually use, with no minimum commitments or upfront costs. When you're done with your work, simply terminate the cluster to stop billing.
What network bandwidth is available between nodes?
Our data centers provide robust network connectivity with WWAN capacity ranging from 20Gbps to 400Gbps, and east-west bandwidth between servers ranging from 800Gbps to 3200Gbps depending on your configuration.
What storage solutions are available for large models?
Runpod offers native Network Storage integration where available, providing a shared filesystem layer that can be utilized across all nodes in your cluster. This is ideal for storing large models ranging from tens to hundreds of gigabytes close to your computing resources.
Can I connect my cluster to AWS?
Yes, you can establish connections between your Runpod cluster and AWS environment through application layer mTLS, enabling secure bridging of workloads between platforms.
Do you support Kubernetes or other container orchestration tools?
Currently, Instant Clusters are not compatible with Kubernetes. The cluster environment is managed by Runpod's native orchestration system, eliminating the need for additional container orchestration tools or CNI configuration.
Can I run Slurm on Instant Clusters?
Yes, Instant Clusters fully support Slurm for workload management.
Are there any minimum lease terms or contract requirements?
No, there are absolutely no minimum lease terms for Instant Clusters. You have complete flexibility to deploy and terminate clusters as needed to support your workloads, with no long-term commitments or contract obligations.

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.