GPU Pricing

GPU Cloud Pricing

Runpod pricing depends on the GPU workload you run: Pods for dedicated GPU instances, Serverless for API inference, and Clusters for multi-node jobs. For enterprise capacity, review compliance resources or request enterprise support.

Pods

Thousands of GPUs across 30+ regions. Simple pricing plans for teams of all sizes, designed to scale with you.

Get started

GPU

Community Cloud

Secure Cloud

Per hour

Per second

>80GB VRAM

B300

288 GB HBM3e

251 GB RAM

vCPUs

$7.39/hr

H200

141 GB VRAM

276 GB RAM

vCPUs

$4.39/hr

B200

180 GB VRAM

283 GB RAM

vCPUs

$5.89/hr

RTX Pro 6000

96 GB VRAM

188 GB RAM

vCPUs

$1.99/hr

H100 NVL

94 GB VRAM

94 GB RAM

vCPUs

$3.19/hr

80GB VRAM

H100 PCIe

80 GB VRAM

188 GB RAM

vCPUs

$2.89/hr

H100 SXM

80 GB VRAM

125 GB RAM

vCPUs

$2.99/hr

A100 PCIe

80 GB VRAM

117 GB RAM

vCPUs

$1.39/hr

A100 SXM

80 GB VRAM

125 GB RAM

vCPUs

$1.49/hr

48GB VRAM

L40S

48 GB VRAM

94 GB RAM

vCPUs

$0.99/hr

RTX 6000 Ada

48 GB VRAM

167 GB RAM

vCPUs

$0.84/hr

A40

48 GB VRAM

50 GB RAM

vCPUs

$0.44/hr

L40

48 GB VRAM

94 GB RAM

vCPUs

$0.82/hr

RTX A6000

48 GB VRAM

50 GB RAM

vCPUs

$0.53/hr

32GB VRAM

RTX 5090

32 GB VRAM

35 GB RAM

vCPUs

$0.99/hr

24GB VRAM

24 GB VRAM

50 GB RAM

vCPUs

$0.39/hr

RTX 3090

24 GB VRAM

125 GB RAM

vCPUs

$0.5/hr

RTX 4090

24 GB VRAM

41 GB RAM

vCPUs

$0.69/hr

RTX A5000

24 GB VRAM

25 GB RAM

vCPUs

$0.27/hr

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Serverless

Cost effective for every inference workload. Save 25% over other Serverless cloud providers on flex workers alone.

Get started

GPU

Per hour

Per second

Workers

280

B300

Maximum throughput for big models.

9.98

/hr

180

B200

Maximum throughput for big models.

8.64

/hr

140

H200

Extreme throughput for big models.

5.93

/hr

RTX 6000 Pro

PRO

High throughput for large model inference workloads.

3.49

/hr

H100

PRO

Extreme throughput for big models.

4.55

/hr

A100

High throughput GPU, yet still very cost-effective.

2.72

/hr

L40, L40S, 6000 Ada, MIG 48GB

PRO

Extreme inference throughput on LLMs like Llama 3 7B.

1.75

/hr

A6000, A40

A cost-effective option for running big models.

1.22

/hr

5090

PRO

Extreme throughput for small-to-medium models.

1.58

/hr

RTX PRO 4500 Blackwell

Cost-effective Blackwell inference for 32GB workloads.

1.15

/hr

4090

PRO

Extreme throughput for small-to-medium models.

1.10

/hr

L4, A5000, 3090, MIG 24GB

Great for small-to-medium sized inference workloads.

0.69

/hr

A4000, A4500, RTX 4000, RTX 2000

The most cost-effective for small models.

0.58

/hr

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Clusters

Launch multi-GPU clusters in minutes with no commitments. Scale up to 64 GPUs, attach shared storage, and pay only for what you use.

Get started

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Reserved Clusters

Dedicated GPU clusters with guaranteed availability, custom configurations, SLA-backed uptime, and discounted rates for enterprises scaling to 10,000+ GPUs.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Storage

Flexible and persistent storage options starting at $0.05/GB/mo with standard and high-performance tiers.

Get started

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Public Endpoints

Instant access to pre-deployed AI models via API. No infrastructure setup required.

Get started

Model Name

Audio

Pruna / Whisper V3 Large

$0.05 per 1000 characters

resembleai / Chatterbox Turbo

$0.00 per 1000 characters.

minimax / Minimax Speech 02 HD

$0.05 per 1000 characters

minimax / Minimax Speech 02 HD

$0.05 per 1000 characters

Image

bytedance / Seedream 4.0 Edit

$0.0270 per request

bytedance / Seedream 4.0 T2I

$0.0270 per request

google / Nano Banana Edit

$0.0380 per request

google / Nano Banana Pro Edit

$0.14 per request

pruna / Pruna Image T2I

$0.0050 per request

pruna / Pruna Image Edit

$0.01 per request

alibaba / WAN 2.6 T2I

$0.03 per request

qwen / Qwen Image Edit 2511

$0.02 per request

qwen / Qwen Image Edit 2511 LoRA

$0.025 per request

Tongyi-MAI / Z Image Turbo

$0.0050 per request.

black-forest-labs / FLUX.1 [dev]

$0.02 per megapixel

black-forest-labs / FLUX.1 Kontext [dev]

$0.0250 per request

black-forest-labs / FLUX.1 Schnell

$0.0024 per megapixel

Bytedance / Seedream 3.0

$0.0300 per request

qwen / Qwen Image Edit

$0.0200 per request

qwen / Qwen Image

$0.0200 per request

qwen / Qwen Image LoRA

$0.0250 per request

Language

deep-cogito / Deep Cogito v2 Llama 70B

$0.00001 per 1m tokens

qwen / Qwen3 32B AWQ

$10.00 per 1m tokens

minimax / Minimax Speech 02 HD

$0.05 per 1000 characters

minimax / Minimax Speech 02 HD

$0.05 per 1000 characters

ibm / IBM Granite 4.0 H Small

$1.00 per 1m tokens

Video

Bytedance / Seedance 1.0 pro

5s: $0.12(480p) per request

Alibaba / Wan 2.2 I2V 720p

5s: $0.30 per request

Alibaba / Wan 2.2 T2V 720p

5s: $0.30 per request

Alibaba / Wan 2.1 I2V 720p

$0.30 per request

Alibaba / Wan 2.1 T2V 720p

$0.30 per request

kwaivgi / Kling v2.6 Standard Motion Control

1-3s $0.21 per request

Alibaba / WAN 2.6 T2V

5s: $0.50 per request

bytedance / Seedance V1.5 Pro I2V

$0.024 per second

kwaivgi / Kling Video O1 R2V

$0.112 per second

Alibaba / Wan 2.6 I2V

5s: $0.50 per request

OpenAI / SORA 2 Pro I2V

4s $1.20 pre request

MeiGen-AI / InfiniteTalk

$0.25 · 720p per request

OpenAI / SORA 2 I2V

4s: $0.40 per request

Alibaba / Wan 2.5 I2V

5s $0.25 pe request

kwaivgi / Kling v2.1 I2V Pro

5s: $0.45 per request

Alibaba / Wan 2.2 I2V 720p LoRA

5s: $0.35 per request

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Gain additional savings with reservations.

Save more with long-term commitments. Speak with our team to reserve discounted active and flex workers.

How Runpod pricing works

Runpod pricing is based on the type of GPU workload you run. Pods are dedicated GPU instances for development and long-running jobs, Serverless bills inference workers based on usage, and Clusters support multi-node workloads and reserved capacity.

Storage and deployment choices affect total cost, so teams should choose the model that matches workload duration, traffic pattern, and control needs. Updated July 27, 2026.

Build what’s next.

Build, train, and scale AI workloads on Runpod with cloud GPUs, Serverless, and Clusters.

Get started

GPU Cloud Pricing

Pods

Serverless

Clusters

Reserved Clusters

Storage

Public Endpoints

Gain additional savings with reservations.

How Runpod pricing works

Build what’s next.

Gain additional savings with reservations.