DAIRX Documentation — GPU Cloud Platform
What is DAIRX
DAIRX is an AI Resource Exchange — the intelligent GPU compute layer. It connects a global multi-cloud network of 10+ GPU cloud providers with automatic smart routing to the cheapest available GPU, always.
Up to 80% cost savings vs AWS on-demand. H100 80GB from $1.36/hr vs $9/hr on AWS. Average deployment time under 90 seconds from API call to SSH access. 10+ regions across the global compute network.
Why DAIRX?
- Multi-cloud — one platform connects 10+ datacenter regions worldwide
- Smart Routing — real-time pricing scanned every 30s, deployment lands on the cheapest available GPU automatically
- Browse and Compare — real-time GPU marketplace with filters, side-by-side comparison, and one-click deploy
- Spot Protection — automatic checkpointing means spot evictions do not lose your work
- Smart Defaults — CUDA version, storage, Docker image, and checkpoint intervals set for you based on GPU and workload
- JupyterLab — SSH-provisioned notebooks that work across every region
- DRX Intelligence — AI advisor that analyzes configs, costs, and optimization inline in the dashboard
- No lock-in — workloads move freely across the network based on price and availability
How Smart Routing Works
Smart Routing is the core engine behind DAIRX. When you request a GPU, the routing engine queries all providers simultaneously, scores them on price, availability, region latency, and reliability, then deploys to the optimal target.
Routing Pipeline
- Request — you specify GPU type, duration, and constraints
- Discovery — all providers queried in parallel for live inventory
- Scoring — Fibonacci-weighted: price 55%, availability 21%, region 13%, reliability 8%, features 3%
- Selection — highest-scoring option selected, circuit breakers prevent routing to unhealthy providers
- Deploy — instance provisioned, SSH keys injected, JupyterLab configured
Compute Network — Providers
DAIRX aggregates GPU compute across 10+ cloud providers in three tiers: Premium (Lambda, RunPod, DataCrunch), Performance (Vast.ai, TensorDock), and Cost-Optimized (Massed Compute, Shadeform).
Lambda: H100 SXM, H200, A100 — enterprise-grade, NVLink, 99.9% uptime. RunPod: H100, A100, RTX 4090. DataCrunch: H100, A100, L40S — European datacenters, GDPR compliant. Vast.ai: RTX 4090, RTX 3090, A100 — community marketplace. TensorDock: RTX 4090, RTX 3090, T4.
Getting Started — Launch Your First GPU
Step 1: Create your DAIRX account — $25 free credits, no credit card required. Step 2: Go to the Exchange and select your GPU type. Use Smart Deploy for automatic cheapest routing, or browse and compare manually. Step 3: Click Deploy. Instance ready in under 90 seconds.
SSH and Connect
Every DAIRX instance includes SSH access. Generate an SSH key pair in dashboard settings or paste your existing public key. Private keys encrypted with AES-256-GCM. Connect via ssh root@host -p port -i key, or use the built-in Web Terminal.
Credits and Billing
All new accounts start with $25 free credits — no credit card required. Credits are consumed first; once depleted, usage is charged to your card on file. GPU pricing: RTX 4090 from $0.30/hr, A100 from $0.67/hr, L40S from $0.50/hr, H100 SXM from $1.36/hr, H200 from $1.94/hr, B200 from $2.65/hr.
Smart Deploy
Smart Deploy automatically finds the cheapest available instance across all providers. Select GPU type and duration, click Smart Deploy. DAIRX queries all providers, compares prices, selects cheapest, and provisions with optimal configuration.
Manual Deploy
Manual Deploy gives full control: provider, GPU model, CUDA version, Docker image, storage (20GB-2TB), environment variables, startup scripts, SSH keys. Review everything in Deploy Review before confirming.
Smart Defaults
Smart Defaults auto-configure deployments based on GPU and workload. CUDA version matched to GPU architecture (Hopper: 12.4, Ampere: 12.1, Ada Lovelace: 12.2). Storage sized by workload. Docker image selected for compatibility.
Spot Instances
Spot instances offer 40-80% discounts vs on-demand. Can be interrupted when demand increases. DAIRX Spot Protection auto-saves checkpoints every 5 minutes. Resume instantly on a new instance with latest checkpoint.
Checkpoint Persistence
Save and restore training state across instances. Checkpoints stored on Cloudflare R2 (encrypted). Auto-save every 5 minutes for spot instances. Resume training on any GPU type from any provider.
Budget and Spend Alerts
Set spend limits to avoid unexpected charges. Alerts at 50%, 75%, 90%, 100% of budget. Channels: email, in-app, Slack, Discord, Microsoft Teams webhooks. Auto-shutdown can terminate instances when budget reached.
Auto Shutdown
Auto shutdown terminates idle GPU instances. If GPU under 5% utilization for 30+ minutes, instance is gracefully shut down. Customizable timeout and threshold. Enabled by default to prevent runaway costs.
JupyterLab
Every instance includes pre-configured JupyterLab accessible via browser. GPU-accelerated kernels. Python kernel with PyTorch, TensorFlow, and common ML libraries pre-installed.
Web Terminal
Browser-based SSH access via xterm.js. Features: file upload/download, GPU status bar, command palette, search, copy/select, environment tracker, MOTD, auto-reconnect, keyboard shortcuts.
DRX Intelligence
AI-powered advisor analyzing deployment config, spending, and GPU utilization. Recommendations: cheaper GPU alternatives, spot savings, right-sizing storage, optimal training schedules.
Presets
Save deployment configurations for one-click reuse. Store GPU type, provider preferences, CUDA version, Docker image, storage, env vars, startup scripts. Built-in templates for PyTorch training, TensorFlow inference, LLM fine-tuning.
Choosing the Right GPU
H100 SXM 80GB: LLM training, 3958 TFLOPS FP8, NVLink, from $1.36/hr. H200 141GB: large batch training, HBM3e, from $1.94/hr. A100 40/80GB: fine-tuning, research, 312 TFLOPS FP16, from $0.67/hr. L40S 48GB: inference, Ada Lovelace, from $0.50/hr. RTX 4090 24GB: experimentation, from $0.30/hr.
ML Training Setup
Deploy GPU, SSH in or open JupyterLab, clone training repo, install dependencies, launch training with checkpointing, monitor via dashboard or DRX Intelligence.
Cost Optimization
Use Smart Deploy for cheapest price. Spot instances for 40-80% savings. Right-size GPU. Budget alerts. Auto-shutdown for idle instances. Presets for efficient configs.
API and SDKs
Python SDK: pip install dairx. REST API with JSON. Auth via API key. Launch instances, manage checkpoints, query pricing programmatically. Currently in early access.
Troubleshooting
Deploy failures: try different GPU type, enable Smart Deploy for fallback. SSH issues: verify key, wait 60s for startup, try Web Terminal. JupyterLab: wait 2-3 minutes, check instance status. Billing: allow 5 minutes for credits, contact support with payment ref.