Run · 2026-04 · methodology 1.0

April 2026 benchmark run

Profession-specific tasks across the major AI tools, run between 15–21 Apr 2026 against current public model versions. Scores are mean across 3 runs per task. Open dataset under CC-BY-4.0.

⬇ Download CSV Methodology 1.0 →

Scores by profession × tool

Profession Claude Opus 4.7 Gemini 2.5 Pro GPT-5 Mistral Large 2 Perplexity
Nurse 8.4 7.2 7.8 6.3 6.9
Accountant 8.2 7.2 7.6 6.7 6.3
Lawyer 8.5 6.6 7.8 6.4 7.4
Software Developer 8.5 7.1 7.9 6.2 6.8
Graphic Designer 7.2 8.5 7.8 6.7 6.3
Journalist 7.2 7.1 8.3 6.3 7.9

What changed since last run

First publication run after methodology lock. Tools tested: Claude Opus 4.7, GPT-5, Gemini 2.5 Pro, Perplexity Pro Search, Mistral Large 2. Most-improved-since-last-run: Claude Opus 4.7 on legal-research tasks (long-context). Top score this run: Claude Opus 4.7 on lawyer/case-summary task (8.6/10).