Why monthly
AI tool capabilities change between months. A static "best of 2026" listicle is stale by mid-year. The monthly benchmark dataset is the AI-resistance moat — AI Overviews can summarize our scores but cannot replace the run-it-every-month commitment.
What we measure
- Accuracy — does the output get the facts right?
- Completeness — does it address every part of the prompt?
- Tone — does it match the professional register the role needs?
- Cost — per-output token-budget cost across the eight tools
- Latency — time-to-first-token and total response time
Open dataset
Every monthly run publishes as a CSV under CC-BY-4.0. Researchers, journalists, and other AI-evaluation publishers can cite our methodology and dataset with attribution. The full methodology document at /benchmarks/methodology/ versions every methodology change.