Local LLMs, ranked by VRAM fit and quality
Each score is the model’s result on its best public benchmarkfor that skill — we name the benchmark on every row (no blended index). Scores compare cleanly when the benchmark matches; where it differs, the name tells you why. Pick a VRAM size to see only the models that fit.
Frontier (Claude, GPT, Gemini, the largest open models) is shown separately below as a ceiling— they’re measured on harder, contamination-resistant tests, so their numbers aren’t directly comparable to the local ones above.
Cloud-only flagships, for reference. Their generalnumbers come from harder, contamination-resistant tests (named below), so they’re not directly comparable to the local scores above — they show the ceiling, and what you trade in privacy and cost to reach it.
Each score is a single named benchmark (not a blend) — open a model to see it and its source. Frontier models are tested on harder benchmarks, so their numbers aren’t 1:1 comparable with the local scores. Fit is computed for the Q4 build against the selected VRAM; colour shows whether it sits comfortably or tightly in that memory.