On-Demand vs Spot GPUs: When to Use Each

Spot GPUs can cost 60–80% less than on-demand pricing. For a team spending $5,000/month on GPU compute, that's up to $4,000 in potential monthly savings. The tradeoff is simple: your instance can be terminated with little or no warning when demand rises. Whether that tradeoff makes sense depends entirely on what you're running.

How Spot Pricing Works

Spot instances — sometimes called preemptible or interruptible instances — represent unused GPU capacity that providers sell at a steep discount. When other customers need that capacity at full price, your spot instance gets terminated, typically with a 30-second to 2-minute warning depending on the provider.

The discounts are substantial and consistent across the market. A100 80GB instances typically see 40–60% spot discounts (on-demand ~$2.00/hr drops to ~$0.85–$1.20/hr). RTX 4090s offer 50–70% savings (on-demand ~$0.55/hr drops to ~$0.18–$0.28/hr). H100 SXMs see 30–50% off (on-demand ~$3.50/hr drops to ~$1.80–$2.50/hr). GPUs with higher general availability tend to offer deeper spot discounts.

When Spot GPUs Make Sense

Spot instances work for workloads that are either short enough to likely complete before interruption, or resilient enough to survive termination without losing meaningful progress.

Batch inference — processing a dataset one chunk at a time, where each chunk is independent. If the instance terminates between chunks, you resume from where you left off.
Hyperparameter search — running many independent short experiments. If one gets interrupted, you re-queue it. The savings across hundreds of experiments vastly outweigh occasional restarts.
Development and prototyping — interactive work where you're iterating on code and running quick experiments. An interruption loses minutes, not hours.
Training with regular checkpointing — any training job that saves state every 30–60 minutes. You lose at most one checkpoint interval of compute per interruption.

When You Need On-Demand

Some workloads cannot tolerate interruption at any cost savings. The math simply doesn't work out.

Production inference endpoints — real-time serving with latency requirements. A terminated instance means dropped requests and broken user experiences.
Multi-node distributed training — when 8 GPUs run a synchronized job, losing one node stops all 8. The restart overhead for large distributed setups can be substantial.
Time-critical runs — if you have a deadline (paper submission, model launch), the risk of delays from spot termination may exceed the savings.
Jobs without checkpointing — if your training script doesn't save checkpoints, a spot termination means starting from scratch. Fix your training script first.

Making Spot Work: Checkpointing

The key to using spot GPUs effectively is making your workloads interruption-resilient. This is primarily a checkpointing problem.

Checkpoint every 30–60 minutes. More frequent wastes I/O bandwidth; less frequent wastes compute on restart.
Save to persistent storage. Checkpoints must survive instance termination — use cloud object storage, not just local disk.
Automate resume logic. Your training script should detect and load the latest checkpoint on startup, turning interruptions into minor delays.
Monitor spot availability. A100s and 4090s tend to have more stable spot markets than H100s. Check availability patterns before committing.

The goal isn't to avoid all interruptions — it's to make each one cost a small, bounded amount of progress. When your checkpoint interval is 30 minutes and your training run is 48 hours, each interruption costs less than 1% of total compute time while saving you 60%+ on the hourly rate.

Spot InstancesCost OptimizationML Training

On-Demand vs Spot GPUs: When to Use Each

How Spot Pricing Works

When Spot GPUs Make Sense

When You Need On-Demand

Making Spot Work: Checkpointing

More from DAIRX

Why GPU Cloud Pricing Is Broken

How to Choose the Right GPU for ML Training

The Hidden Costs of GPU Cloud Computing