Foundational5 min readgpu.vram_gb

What is VRAM?

Video RAM is the GPU's own dedicated memory. It determines how large a model you can load, how big your batch sizes can be, and ultimately what workloads a GPU can handle.

What it is

VRAM (Video RAM) is the GPU's own dedicated memory. It sits physically on the GPU board, connected to the GPU chip via a very fast bus. Think of it as the GPU's personal workspace — everything the GPU is actively working on must fit in VRAM.

This is separate from your system RAM (the compute.ram_gb in GIS). System RAM is connected to the CPU. VRAM is connected to the GPU. Data must be copied from system RAM to VRAM before the GPU can process it.

The critical difference is bandwidth. An H100's VRAM delivers 3,350 GB/s. System DDR5 RAM delivers ~50 GB/s. That's a 67× difference. The GPU needs this speed because it processes data so fast that slower memory would starve it.

Why VRAM is the #1 spec for ML

For machine learning, VRAM is often the single most important specification. Here's why:

  • Model must fit in VRAM. A 7B parameter model in FP16 needs ~14 GB. A 70B model needs ~140 GB. If it doesn't fit, it doesn't run (without complex techniques like model parallelism).
  • Batch size is limited by VRAM. Larger batches = faster training. But each batch element consumes VRAM for activations and gradients.
  • Optimizer states live in VRAM. Adam optimizer stores 2 extra copies of every parameter. A 7B model needs ~14 GB for weights + ~28 GB for optimizer = 42 GB minimum for training.
VRAM requirements by model size (FP16 inference)
7B parameters → ~14 GB VRAM
13B parameters → ~26 GB VRAM
70B parameters → ~140 GB VRAM (needs 2× 80GB GPUs)
405B parameters → ~810 GB VRAM (needs 11× 80GB GPUs)

Types of VRAM

Not all VRAM is equal. The two main types in cloud GPUs:

  • GDDR6/GDDR6X — Used in consumer and some professional GPUs. Cheaper, lower bandwidth. The RTX 4090 has 24 GB GDDR6X at 1,008 GB/s.
  • HBM (High Bandwidth Memory) — Used in data center GPUs. Stacked memory chips on the GPU package. Much higher bandwidth. The H100 has 80 GB HBM3 at 3,350 GB/s.

HBM is why data center GPUs cost 10-20× more than consumer GPUs. The memory technology alone accounts for a significant portion of the chip cost.

See VRAM Types: GDDR6 to HBM3e for the full breakdown.

How it appears in GIS

VRAM is captured in the gpu.vram_gb field:

{
  "gpu": {
    "model": "nvidia-h100",
    "vram_gb": 80,
    "memory_bandwidth_tbps": 3.35
  }
}

The vram_gb field is the total VRAM per GPU in gigabytes. For multi-GPU instances, this is per GPU — an 8×H100 instance has 80 GB per GPU, 640 GB total.

The memory_bandwidth_tbps field captures how fast that VRAM can be read/written. Both matter: capacity determines what fits, bandwidth determines how fast it runs.

The normalized metric vram_per_dollar tells you how much VRAM you get per dollar per hour — useful for comparing value across providers.

Key takeaways
  • ·VRAM is the GPU's own memory — separate from system RAM
  • ·It determines the maximum model size you can load
  • ·More VRAM = larger models, bigger batches, faster training
  • ·Current range: 16 GB (entry) to 80 GB (data center) to 192 GB (B200)
  • ·In GIS: gpu.vram_gb — one of the most important fields for ML workloads