Respond to user demand in real time with GPU workers that scale from 0 to 100s in seconds.
Flex
Workers
Active
Workers
10 GPUs
6:24AM
100 GPUs
11:34AM
20 GPUs
1:34PM
Active Workers
-40% DISCOUNT
Dedicated GPUs that handle consistent workloads 24/7. Get them at a lower cost so you don't break the bank for stable usage.
Flex Workers
Flexible GPUs that cost nothing when idle. Ready to scale up as soon as your launch goes viral.
Zero Cold-Starts with Active Workers
No cold-start time — because the workers are always running. Get instant execution when speed is all that matters.
<250ms Cold-Start with Flashboot
Flashboot is an optimization layer for our container system to manage deployments and scale up workers in real time.
Handle more consistent workloads like fine-tuning
Scale workers by Queue Delay or Request count
Monitor your endpoint with
real-time analytics
Usage Analytics
Real-time usage analytics for your endpoint with metrics on completed and failed requests. Useful for endpoints that have fluctuating usage profiles throughout the day.
Debug your endpoints with detailed metrics on execution time. Useful for hosting models that have varying execution times, like large language models. You can also monitor delay time, cold start time, cold start count, GPU utilization, and more.
Extremely performant GPUs, yet still very cost effective for running any machine learning model.
Flex
$0.0013/s
Active
$0.00078/s
80 GB
H100
PRO
Our most powerful GPUs. Most useful when maximizing inference throughput is critical.
Flex
$0.0025/s
Active
$0.0015/s
48 GB
A6000
A cost-effect option for running diffusion models, LoRAs, whisper, and many others. Less effective for large language models.
Flex
$0.00048/s
Active
$0.00029/s
48 GB
L40
PRO
Useful for when having high inference throughput on LLMs like Llama 3 7B and medium sized models like Yi 34B.
Flex
$0.00069/s
Active
$0.00041/s
24 GB
A500
Great for small-to-medium sized models with consistent workloads, lower throughput than 24GB PRO.
Flex
$0.00026/s
Active
$0.00016/s
24 GB
4090
PRO
Extremely high throughput for small to medium sized models.Great for running Llama3 8B and Mistral 7B.
Flex
$0.00044/s
Active
$0.00026/s
16 GB
A4000
The most cost-effective option for running inference on small models like LoRAs, diffusion models, and whisper.
Flex
$0.0002/s
Active
$0.00012/s
Thousands of GPUs across 9 Regions
Update your endpoint's region in two clicks. Scale up to 9 regions at a time. Global automated failover is supported out-of-the-box, so you won't have to worry about GPU errors interrupting your ML inference.
Pending Certifications
Although many of our data center partners have these compliance certifications, RunPod is in the process of getting SOC 2, ISO 27001, and HIPAA. We aim to have all three by early Q3, 2024.
North America
UR-OR-1
CA-MTL-1
CA-MTL-2
European Union
EUR-IS-1
EUR-IS-2
EUR-NO-1
Europe
EU-NL-1
EU-RO-1
EU-SE-1
Serverless Pricing Calculator
seconds
$ 55 /mo
1
72,000 requests per month
1. Cost estimation includes 50% of the requests using active price & running into 1s cold-start.
We're with you from seed to scale
Book a call with our sales team to learn more.
Gain
Additional Savings
with Reservations
Save more by committing to longer-term usage. Reserve discounted active and flex workers by speaking with our team.
"There are definitely providers who offer much cheaper pricing than Runpod. But they always have an inferior developer experience. If you're paying 50% less for a GPU elsewhere, that cost is coming out somewhere else, be it developer time or lack of reliability. For the value, Runpod provides competitive prices and we're willing to pay a premium to reduce the headache that normally comes with ML ops."
"The setup process was great! Very quick and easy. RunPod had the exact GPUs we needed for AI inference and the pricing was very fair based on what I saw out on the market. The main value proposition for us was the flexibility RunPod offered. We were able to scale up effortlessly to meet the demand at launch."
"The cost savings on RunPod have been incredible. Since switching, our team has been able to focus on building the product instead of the infrastructure.
We often have unpredictable demand from our users which makes it hard to manage our cloud costs. But with RunPod, we've been able to scale up and down quickly and painlessly.
Great reliability in multiple regions and great customer support is why we've been with them for over a year now."