← All tools
LLM VRAM / RAM Calculator
Estimate memory requirements for running large language models.
Preset
Model Architecture
Context
Quantization
KV Cache
Total Memory
6.13 GB
Q4_K_M weights · FP16 KV · 4K ctx
Breakdown
Weights4.84 GB
Overhead754.00 MB
KV Cache536.87 MB
GPU Fit · 6.1 GB needed
RTX 3060 ✓
RTX 3090 ✓
RTX 4090 ✓
A100 40G ✓
A100 80G ✓
H100 80G ✓
GGUF k-quants use effective bits. KV ratio accounts for GQA/MQA. Overhead = 512 MB + 5% proportional.