Intermediate7 min readpricing.*

Cloud GPU Pricing Explained

How GPU cloud pricing works: on-demand, spot, and reserved. Billing granularity, hidden costs, and how to compare prices across providers.

The three pricing models

Every GPU cloud provider offers some combination of three pricing models:

On-demand — pay by the hour (or second/minute), no commitment. Start and stop anytime. This is the most common and most expensive option. An H100 on-demand ranges from $2.00-$4.00/GPU/hour depending on provider.

Spot / Preemptible — discounted instances that can be interrupted with little notice (typically 30 seconds to 2 minutes). Prices are 50-80% lower than on-demand. Great for training jobs that can checkpoint and resume.

Reserved — commit to 1 or 3 years, get 30-60% off on-demand pricing. Guaranteed capacity. Used by companies with predictable, steady GPU needs.

Billing granularity matters

Providers bill at different granularities:

GranularityProvidersImpact
Per-secondAWS, GCP, LambdaPay only for what you use. A 45-minute job costs 75% of an hour.
Per-minuteSome smaller providersRounded up to the next minute.
Per-hourSome providers, reservedA 1-minute job costs the same as a 59-minute job.

For short, iterative workloads (experimentation, hyperparameter tuning), per-second billing can save 20-40% compared to per-hour billing.

Example
Running 10 experiments, each 20 minutes, on a $3.00/hr GPU:
• Per-second billing: 10 × 20min × $0.05/min = $10.00
• Per-hour billing: 10 × 1hr × $3.00 = $30.00
3× difference for the same work.

How to compare prices across providers

Raw instance prices are misleading. An AWS p5.48xlarge costs $98.32/hr, but that's for 8 GPUs. A Lambda H100 costs $2.49/hr for 1 GPU. You can't compare $98.32 to $2.49 — you need to normalize.

GIS solves this with the normalized section:

  • cost_per_gpu_hour — the price per individual GPU per hour. AWS: $98.32 ÷ 8 = $12.29/GPU/hr. Lambda: $2.49/GPU/hr.
  • cost_per_tflop_hour — price per TFLOP of compute. Accounts for different GPU performance levels.
  • vram_per_dollar — GB of VRAM per dollar per hour. Useful for memory-bound workloads.

Always compare cost_per_gpu_hour as the baseline. Use cost_per_tflop_hour when comparing different GPU models (e.g., A100 vs H100).

How it appears in GIS

{
  "pricing": {
    "currency": "USD",
    "billing_unit": "per-hour",
    "billing_granularity": "per-second",
    "on_demand": 2.49,
    "spot": null,
    "reserved_1yr": null,
    "reserved_3yr": null
  },
  "normalized": {
    "cost_per_gpu_hour": 2.49,
    "cost_per_tflop_hour": 0.00252,
    "vram_per_dollar": 32.13
  }
}

The pricing section captures the raw price structure. The normalized section provides computed comparison metrics. Together, they give you everything needed to compare any two GPU offerings on equal footing.

Key takeaways
  • ·Three pricing models: on-demand (flexible), spot (cheap, interruptible), reserved (committed, discounted)
  • ·Billing granularity varies: per-second, per-minute, per-hour — it matters for short jobs
  • ·Always compare cost_per_gpu_hour, not raw instance price (multi-GPU instances are misleading)
  • ·Hidden costs: egress, storage, networking can add 10-30% to your bill
  • ·GIS normalizes all pricing into comparable metrics