LLM VRAM / RAM Calculator

Estimate memory requirements for running large language models.

Preset

Model Architecture

Parameters (B)Layers (L)d_modelAttn heads (H)

KV heads (H_kv)

Context

Custom tokens

Quantization

KV Cache

Total Memory

6.13 GB

Q4_K_M weights · FP16 KV · 4K ctx

Breakdown

Weights4.84 GB

Overhead754.00 MB

KV Cache536.87 MB

GPU Fit · 6.1 GB needed

RTX 3060 ✓

RTX 3090 ✓

RTX 4090 ✓

A100 40G ✓

A100 80G ✓

H100 80G ✓

GGUF k-quants use effective bits. KV ratio accounts for GQA/MQA. Overhead = 512 MB + 5% proportional.