We raised 20M to revolutionize AI/ML cloud computing

Run machine learning
inference
at scale.

Only pay for what you use — no idle costs, just unparalleled speed and scalability.

Try it now Read the docs

1
2
3
4
5

import runpod
def handler(job):

job_input = job["input"]
return "Running on Runpod!"

runpod.serverless.start({"handler":handler})

runpodctl -- zsh

Spend more time training your models.
Let us handle your
inference.

For your expected load, provision active workers running 24/7 with a 40% discount & flex workers to handle any sudden traffic.

Try it now

99.99% Uptime

Bring Your Container

Network Storage

9 Regions

Streaming

Webhooks

Autoscale in seconds

Respond to user demand in real time with GPU workers that
scale from 0 to 100s in seconds.

Flex

Workers

Active

Workers

10 GPUs

6:24AM

100 GPUs

11:34AM

20 GPUs

1:34PM

Active Workers

-40% DISCOUNT

Dedicated GPUs that handle consistent workloads 24/7.
Get them at a lower cost so you don't break the bank for stable usage.

Flex Workers

Flexible GPUs that cost nothing when idle.
Ready to scale up as soon as your launch goes viral.

Zero Cold-Starts with Active Workers

No cold-start time — because the workers are always
running. Get instant execution when speed is all that matters.

<250ms Cold-Start with Flashboot

Flashboot is an optimization layer for our container system to manage deployments and scale up workers in real time.

Handle more consistent workloads like fine-tuning

Scale workers by Queue Delay or Request count

Monitor your endpoint with

real-time analytics

Usage Analytics

Real-time usage analytics for your endpoint with metrics on completed and failed requests. Useful for endpoints that have fluctuating usage profiles throughout the day.

See the console

Active

Requests

Completed:

2,277

Retried:

Failed:

Execution Time

Total:

1,420s

P70:

P90:

19s

P98:

22s

Execution Time Analytics

Debug your endpoints with detailed metrics on execution time. Useful for hosting models that have varying execution times, like large language models. You can also monitor delay time, cold start time, cold start count, GPU utilization, and more.

See the console

Real-Time Logs

Get descriptive, real-time logs to show you exactly what's happening across your active and flex GPU workers at all times.

See the console

worker logs -- zsh

2024-03-15T19:56:00.8264895Z INFO | Started job db7c79
2024-03-15T19:56:03.2667597Z
0% | | 0/28 [00:00<?, ?it/s]
12% |██ | 4/28 [00:00<00:01, 12.06it/s]
38% |████ | 12/28 [00:00<00:01, 12.14it/s]
77% |████████ | 22/28 [00:01<00:00, 12.14it/s]
100% |██████████| 28/28 [00:02<00:00, 12.13it/s]
2024-03-15T19:56:04.7438407Z INFO | Completed job db7c79 in 2.9s
2024-03-15T19:57:00.8264895Z INFO | Started job ea1r14
2024-03-15T19:57:03.2667597Z
0% | | 0/28 [00:00<?, ?it/s]
15% |██ | 4/28 [00:00<00:01, 12.06it/s]
41% |████ | 12/28 [00:00<00:01, 12.14it/s]
80% |████████ | 22/28 [00:01<00:00, 12.14it/s]
100% |██████████| 28/28 [00:02<00:00, 12.13it/s]
2024-03-15T19:57:04.7438407Z INFO | Completed job ea1r14 in 2.9s
2024-03-15T19:58:00.8264895Z INFO | Started job gn3a25
2024-03-15T19:58:03.2667597Z
0% | | 0/28 [00:00<?, ?it/s]
18% |██ | 4/28 [00:00<00:01, 12.06it/s]
44% |████ | 12/28 [00:00<00:01, 12.14it/s]
83% |████████ | 22/28 [00:01<00:00, 12.14it/s]
100% |██████████| 28/28 [00:02<00:00, 12.13it/s]
2024-03-15T19:58:04.7438407Z INFO | Completed job gn3a25 in 2.9s

Cost effective
for every inference workload

Save 15% over other Serverless cloud providers on flex workers alone.

Create active workers and configure queue delay for even more savings.

Get started Book a call

80 GB

A100

Extremely performant GPUs, yet still very cost effective for running any machine learning model.

Flex

$0.0013/s

Active

$0.00078/s

80 GB

H100

PRO

Our most powerful GPUs. Most useful when maximizing inference throughput is critical.

Flex

$0.0025/s

Active

$0.0015/s

48 GB

A6000

A cost-effect option for running diffusion models, LoRAs, whisper, and many others. Less effective for large language models.

Flex

$0.00048/s

Active

$0.00029/s

48 GB

L40

PRO

Useful for when having high inference throughput on LLMs like Llama 3 7B and medium sized models like Yi 34B.

Flex

$0.00069/s

Active

$0.00041/s

24 GB

A5000

Great for small-to-medium sized models with consistent workloads, lower throughput than 24GB PRO.

Flex

$0.00026/s

Active

$0.00016/s

24 GB

4090

PRO

Extremely high throughput for small to medium sized models.Great for running Llama3 8B and Mistral 7B.

Flex

$0.00044/s

Active

$0.00026/s

16 GB

A4000

The most cost-effective option for running inference on small models like LoRAs, diffusion models, and whisper.

Flex

$0.0002/s

Active

$0.00012/s

Thousands of GPUs across 9 Regions

Update your endpoint's region in two clicks. Scale up to 9 regions at a time. Global automated failover is supported out-of-the-box, so you won't have to worry about GPU errors interrupting your ML inference.

Pending Certifications

Although many of our data center partners have these compliance certifications, RunPod is in the process of getting SOC 2, ISO 27001, and HIPAA. We aim to have all three by early Q3, 2024.

North America

UR-OR-1

CA-MTL-1

CA-MTL-2

European Union

EUR-IS-1

EUR-IS-2

EUR-NO-1

Europe

EU-NL-1

EU-RO-1

EU-SE-1

Serverless Pricing Calculator

Requests / Hour

Execution Time / Request

seconds

$ 442 /mo

72,000 requests per month

1. Cost estimation includes 50% of the requests using active price & running into 1s cold-start.

We're with you from seed to scale

Book a call with our sales team to learn more.

Gain

Additional Savings

with Reservations

Save more by committing to longer-term usage. Reserve discounted active and flex workers by speaking with our team.

Book a call

Are you an early-stage startup or ML researcher?

Get up to $25K in free compute credits with Runpod. These can be used towards on-demand GPUs and Serverless endpoints.

Apply

CTO, LOVO AI

Hara Kang

"It really shows that RunPod is made by developers. They know exactly what engineers really want and they ship those features in order of importance."

Hara Kang - CTO, LOVO AI

4,477,410,398

4,477,409,898

requests & 100k+ developers since launch

Join our Discord Book a call

Get started with RunPod

today.

We handle millions of serverless requests a day. Scale your machine learning inference while keeping costs low.

Products

Secure Cloud Community Cloud Serverless

Resources

Docs FAQ Blog Become a Host