Local LLMs, ranked by VRAM fit and quality

Each score is the model’s result on its best public benchmarkfor that skill — we name the benchmark on every row (no blended index). Scores compare cleanly when the benchmark matches; where it differs, the name tells you why. Pick a VRAM size to see only the models that fit.

Frontier (Claude, GPT, Gemini, the largest open models) is shown separately below as a ceiling— they’re measured on harder, contamination-resistant tests, so their numbers aren’t directly comparable to the local ones above.

VRAM to fit — showing models that run on 12 GB

Rank by

#ModelGeneral · best benchmarkFits 12GBCost / run

1Gemma 4 12B12.0B · 7.3 GB (Q4) · May 2026 · 79GPQA79GPQAroomysoon 2Gemma 2 9B9.2B · 5.6 GB (Q4) · Jun 2024 · 71MMLU71MMLUroomysoon 3Mistral NeMo 12B Instruct12B · 7.3 GB (Q4) · Jul 2024 · 68MMLU68MMLUroomysoon 4MiniCPM-SALA9.5B · 5.8 GB (Q4) · Feb 2026 · 67MMLU-Pro67MMLU-Proroomysoon 5DeepSeek R1 Distill Qwen 14B14.8B · 9.0 GB (Q4) · Jan 2025 · 59GPQA59GPQAroomysoon 6Gemma 4 E4B8B · 4.8 GB (Q4) · Apr 2026 · 59GPQA59GPQAroomysoon 7Granite 3.3 8B Base8.2B · 5.0 GB (Q4) · Apr 2025 · 58Arena Hard58Arena Hardroomysoon 8Granite 3.3 8B Instruct8B · 4.8 GB (Q4) · Apr 2025 · 58Arena Hard58Arena Hardroomysoon 9Phi-4 14B14.7B · 8.9 GB (Q4) · Dec 2024 · 56GPQA56GPQAroomysoon 10Llama 3.1 Nemotron Nano 8B V18B · 4.8 GB (Q4) · Mar 2025 · 54GPQA54GPQAroomysoon 11DeepSeek R1 Distill Qwen 7B7.6B · 4.6 GB (Q4) · Jan 2025 · 49GPQA49GPQAroomysoon 12DeepSeek R1 Distill Llama 8B8.0B · 4.9 GB (Q4) · Jan 2025 · 49GPQA49GPQAroomysoon 13Gemma 4 E2B5.1B · 3.1 GB (Q4) · Apr 2026 · 43GPQA43GPQAroomysoon 14Gemma 3 12B12B · 7.3 GB (Q4) · Mar 2025 · 41GPQA41GPQAroomysoon 15DeepSeek R1 Distill Qwen 1.5B1.8B · 1.1 GB (Q4) · Jan 2025 · 34GPQA34GPQAroomysoon 16Llama 3.2 11B Instruct10.6B · 6.4 GB (Q4) · Sep 2024 · 33GPQA33GPQAroomysoon 17Llama 3.2 3B Instruct3.2B · 1.9 GB (Q4) · Sep 2024 · 33GPQA33GPQAroomysoon 18Gemma 3 4B4B · 2.4 GB (Q4) · Mar 2025 · 31GPQA31GPQAroomysoon 19IBM Granite 4.0 Tiny Preview7B · 4.2 GB (Q4) · May 2025 · 27Arena Hard27Arena Hardroomysoon 20Gemma 3n E2B Instructed LiteRT (Preview)1.9B · 1.2 GB (Q4) · May 2025 · 25GPQA25GPQAroomysoon

Frontier — the paid ceiling

Cloud-only flagships, for reference. Their generalnumbers come from harder, contamination-resistant tests (named below), so they’re not directly comparable to the local scores above — they show the ceiling, and what you trade in privacy and cost to reach it.

Claude Opus 4.8subscriptionCloud-only · rented per token94GPQARented Gemini 3 ProsubscriptionCloud-only · rented per token92GPQARented Kimi K2.6open · cloudCloud-only · rented per token91GPQARented GPT-5.1subscriptionCloud-only · rented per token88GPQARented

Each score is a single named benchmark (not a blend) — open a model to see it and its source. Frontier models are tested on harder benchmarks, so their numbers aren’t 1:1 comparable with the local scores. Fit is computed for the Q4 build against the selected VRAM; colour shows whether it sits comfortably or tightly in that memory.