LLM reviews · 20 models

Large language models, compared

Every major LLM that working professionals are evaluating: Claude (Opus, Sonnet, Haiku), GPT-5 / GPT-4o, Gemini 2.5, Llama 4, Mistral Large 2, Grok 4. Each model gets a profession-aware review.

How we evaluate LLMs

Most LLM comparisons are abstract — "GPT-5 scored 87.4 on benchmark X." That tells working professionals very little about whether a model fits their day. Our LLM reviews focus on professional fit: what kind of work each model handles best, where it falls down, and which profession pages on this site recommend it.

Re-review cadence is 60 days for active LLMs (vs 90 for general AI tools) — model behavior changes faster. Major version bumps trigger an immediate re-review.

Comparisons coming in Phase 2

Common comparisons get their own pages: /compare/claude-opus-4-7-vs-gpt-5/, /compare/claude-opus-vs-claude-haiku/, /compare/gemini-2-5-pro-vs-claude-opus-4-7/. Both within-vendor and cross-vendor comparisons live under one flat /compare/ URL pattern so they match user search queries directly.