← All tools

LLM VRAM / RAM Calculator

Estimate memory requirements for running large language models.

Preset

Model Architecture

Context

Quantization

KV Cache

Total Memory

6.13 GB

Q4_K_M weights · FP16 KV · 4K ctx

Breakdown

Weights4.84 GB
Overhead754.00 MB
KV Cache536.87 MB

GPU Fit · 6.1 GB needed

RTX 3060
RTX 3090
RTX 4090
A100 40G
A100 80G
H100 80G

GGUF k-quants use effective bits. KV ratio accounts for GQA/MQA. Overhead = 512 MB + 5% proportional.

LLM VRAM / RAM Calculator — Tools — Syed Khalid