Honest pricing. Receipts on every line.
All prices USD. Billing starts only after a verified ReadyEvent. Designed to start at Ready.
Rates are placeholders pending captain ruling F2 and Sprint 005 bench data. See the rate-card log for effective-date history. Data Pack add-ons from +30% (1 pack) — see FAQ for multi-pack rates.
Model Lane — T1 / T2 / T3
Curated Qwen3.5 + Gemma 4 + DeepSeek V4 Flash recipes with LoRA adapter composition. Billing is per 1K input tokens + per 1K output tokens (rates split — docs/41 §2).
| Tier | Description | Input / 1K tokens | Output / 1K tokens | Monthly base | |
|---|---|---|---|---|---|
| T3 — Prompt-only | Base model only; no adapter VRAM overhead. Best for dev, small-batch, and latency-tolerant workloads. Shared K8s resource key: nvidia.com/gpu.shared-<lane>.Not for production workloads requiring fault isolation.Designed to start at Ready. | TBD(doc ref: $0.003) | TBD(doc ref: $0.009) | TBD | Compare tiers → |
| T2 — Elastic | Per-use inference with shared adapter amortisation. No standing reservation fee. Optional NVMe warm-pool fee while adapter is NVMe-resident but not VRAM-resident (captain decides blended vs. separate line item). Isolated K8s resource key: gpodz.com/mig-<profile>.Designed to start at Ready. | TBD(doc ref: $0.006) | TBD(doc ref: $0.018) | TBD | Compare tiers → |
| T1 — Pinned | Dedicated GPU slot reserved 24/7. Composed adapter stack (up to 5 layers: Base + Toolkit-Agent + Geo + Archetype + Behavioral). Full hardware VRAM + SM isolation. Dedicated K8s resource key: nvidia.com/gpu: 1 + full-GPU pin.Idle-billing notice (LEGAL-8): Pinned reservation bills 24/7 while reserved. $X/hr base + $Y per 1K tokens served — accrues whether or not you send requests. To stop billing, release the reservation from your dashboard.Designed to start at Ready. | TBD(doc ref: $0.010) | TBD(doc ref: $0.030) | TBD | Compare tiers → |
Raw Compute — Shared / Isolated / Dedicated
Direct GPU access without a model recipe. Billing is per GPU-hour. The Disclosed Match card shows backend GPU, delivered VRAM, isolation mode, and cache state before any payment authorisation. Same Terms of Service as the Model Lane (LEGAL-7).
| Tier | Description | Hourly rate | Minimum | |
|---|---|---|---|---|
| Shared | Time-sliced GPU; no hardware memory or fault isolation. Best for development and best-effort batch workloads. K8s key: nvidia.com/gpu.shared-<lane> provisioned by gpodz-device-plugin.Not for workloads requiring hardware isolation.Designed to start at Ready. | TBD | 1 hr | Compare tiers → |
| Isolated | One tenant per discovered MIG profile; hardware memory + SM isolation. Best for production inference requiring memory guarantees. K8s key: gpodz.com/mig-<profile> provisioned by node-agent.Designed to start at Ready. | TBD | 1 hr | Compare tiers → |
| Dedicated | Full physical GPU; one-tenant-per-GPU. Manual sales path in Phase 1. Minimum commitment applies. K8s key: nvidia.com/gpu: 1 + full-GPU pin.Idle-billing notice (LEGAL-8): Dedicated sessions bill $X/hr regardless of utilisation while the GPU is reserved.Designed to start at Ready. | TBD | 4 hr | Compare tiers → |
All prices USD. Token rates carry explicit TODO comments pending captain ruling F2. Volume discounts apply at 1M / 10M / 100M tokens per month (docs/41 §2). Discounts do not stack with subscription packs.
Both lanes are subject to the same Terms of Service. Trust & transparency →