Monthly benchmark dataset · BENCH

Benchmark scores

From Month 11, every month we run 30 profession-specific tasks against the major AI tools and publish the scores. Open dataset · CC-BY-4.0 · CSV download per run.

Why monthly

AI tool capabilities change between months. A static "best of 2026" listicle is stale by mid-year. The monthly benchmark dataset is the AI-resistance moat — AI Overviews can summarize our scores but cannot replace the run-it-every-month commitment.

What we measure

  • Accuracy — does the output get the facts right?
  • Completeness — does it address every part of the prompt?
  • Tone — does it match the professional register the role needs?
  • Cost — per-output token-budget cost across the eight tools
  • Latency — time-to-first-token and total response time

Open dataset

Every monthly run publishes as a CSV under CC-BY-4.0. Researchers, journalists, and other AI-evaluation publishers can cite our methodology and dataset with attribution. The full methodology document at /benchmarks/methodology/ versions every methodology change.