How we evaluate LLMs
Most LLM comparisons are abstract — "GPT-5 scored 87.4 on benchmark X." That tells working professionals very little about whether a model fits their day. Our LLM reviews focus on professional fit: what kind of work each model handles best, where it falls down, and which profession pages on this site recommend it.
Re-review cadence is 60 days for active LLMs (vs 90 for general AI tools) — model behavior changes faster. Major version bumps trigger an immediate re-review.
Comparisons coming in Phase 2
Common comparisons get their own pages: /compare/claude-opus-4-7-vs-gpt-5/, /compare/claude-opus-vs-claude-haiku/, /compare/gemini-2-5-pro-vs-claude-opus-4-7/. Both within-vendor and cross-vendor comparisons live under one flat /compare/ URL pattern so they match user search queries directly.