Knowledge

MMLU-Pro: 2026 AI Leaderboard

MMLU's harder successor: 10 answer choices and more reasoning.

What it tests

MMLU-Pro is a successor to MMLU that expands each question to 10 answer choices (up from 4) and rewrites prompts to require multi-step reasoning rather than pure recall.

How it is scored

Same accuracy metric as MMLU but on the harder reformulated question bank. Frontier models score roughly 10-20 points lower here than on base MMLU.

Why it matters

Worth watching because base MMLU has saturated. MMLU-Pro is less saturated and still has headroom, making it a better discriminator for top-tier models in 2026.

Leaderboard (5 models)

Sorted by MMLU-Proscore. Tier column shows the tool's overall AIToolTier rank, which blends this benchmark with pricing, features, and real-world usability.

#ModelTierMMLU-Pro score
1DeepSeek
DeepSeek V4-Pro (launched 2026-04-24; scores below are the V3.2 baseline pending third-party V4 verification, which typically lands 3-7 days post-launch)
A85%
2Qwen (Alibaba)
Qwen3.5-397B MoE
A83.5%
3GLM / Z.ai (Zhipu AI)
GLM-5.1 (744B MoE / 40B active)
A81.2%
4Llama 4 (Meta)
Llama 4 Maverick (17B/400B MoE)
B80.5%
5Nemotron (Nvidia)
Nemotron 3 Ultra (253B)
B79.8%

About MMLU-Pro

Creator
TIGER-Lab, 2024
Unit
% (max 100)

Other benchmarks