Menu

Honest pricing. Receipts on every line.

All prices USD. Billing starts only after a verified ReadyEvent. Designed to start at Ready.

Rates are placeholders pending captain ruling F2 and Sprint 005 bench data. See the rate-card log for effective-date history. Data Pack add-ons from +30% (1 pack) — see FAQ for multi-pack rates.

Model Lane — T1 / T2 / T3

Curated Qwen3.5 + Gemma 4 + DeepSeek V4 Flash recipes with LoRA adapter composition. Billing is per 1K input tokens + per 1K output tokens (rates split — docs/41 §2).

TierDescriptionInput / 1K tokensOutput / 1K tokensMonthly base
T3 — Prompt-onlyBase model only; no adapter VRAM overhead. Best for dev, small-batch, and latency-tolerant workloads. Shared K8s resource key: nvidia.com/gpu.shared-<lane>.Not for production workloads requiring fault isolation.Designed to start at Ready.TBD(doc ref: $0.003)TBD(doc ref: $0.009)TBDCompare tiers →
T2 — ElasticPer-use inference with shared adapter amortisation. No standing reservation fee. Optional NVMe warm-pool fee while adapter is NVMe-resident but not VRAM-resident (captain decides blended vs. separate line item). Isolated K8s resource key: gpodz.com/mig-<profile>.Designed to start at Ready.TBD(doc ref: $0.006)TBD(doc ref: $0.018)TBDCompare tiers →
T1 — PinnedDedicated GPU slot reserved 24/7. Composed adapter stack (up to 5 layers: Base + Toolkit-Agent + Geo + Archetype + Behavioral). Full hardware VRAM + SM isolation. Dedicated K8s resource key: nvidia.com/gpu: 1 + full-GPU pin.Idle-billing notice (LEGAL-8): Pinned reservation bills 24/7 while reserved. $X/hr base + $Y per 1K tokens served — accrues whether or not you send requests. To stop billing, release the reservation from your dashboard.Designed to start at Ready.TBD(doc ref: $0.010)TBD(doc ref: $0.030)TBDCompare tiers →

Raw Compute — Shared / Isolated / Dedicated

Direct GPU access without a model recipe. Billing is per GPU-hour. The Disclosed Match card shows backend GPU, delivered VRAM, isolation mode, and cache state before any payment authorisation. Same Terms of Service as the Model Lane (LEGAL-7).

TierDescriptionHourly rateMinimum
SharedTime-sliced GPU; no hardware memory or fault isolation. Best for development and best-effort batch workloads. K8s key: nvidia.com/gpu.shared-<lane> provisioned by gpodz-device-plugin.Not for workloads requiring hardware isolation.Designed to start at Ready.TBD1 hrCompare tiers →
IsolatedOne tenant per discovered MIG profile; hardware memory + SM isolation. Best for production inference requiring memory guarantees. K8s key: gpodz.com/mig-<profile> provisioned by node-agent.Designed to start at Ready.TBD1 hrCompare tiers →
DedicatedFull physical GPU; one-tenant-per-GPU. Manual sales path in Phase 1. Minimum commitment applies. K8s key: nvidia.com/gpu: 1 + full-GPU pin.Idle-billing notice (LEGAL-8): Dedicated sessions bill $X/hr regardless of utilisation while the GPU is reserved.Designed to start at Ready.TBD4 hrCompare tiers →

All prices USD. Token rates carry explicit TODO comments pending captain ruling F2. Volume discounts apply at 1M / 10M / 100M tokens per month (docs/41 §2). Discounts do not stack with subscription packs.

Both lanes are subject to the same Terms of Service. Trust & transparency →