AI tool benchmarks · monthly dataset

Why monthly

AI tool capabilities change between months. A static "best of 2026" listicle is stale by mid-year. The monthly benchmark dataset is the AI-resistance moat — AI Overviews can summarize our scores but cannot replace the run-it-every-month commitment.

What we measure

Accuracy — does the output get the facts right?
Completeness — does it address every part of the prompt?
Tone — does it match the professional register the role needs?
Cost — per-output token-budget cost across the eight tools
Latency — time-to-first-token and total response time

Open dataset

Every monthly run publishes as a CSV under CC-BY-4.0. Researchers, journalists, and other AI-evaluation publishers can cite our methodology and dataset with attribution. The full methodology document at /benchmarks/methodology/ versions every methodology change.

Benchmark scores

Apr 2026

How we score

Why monthly

What we measure

Open dataset