Knowledge

MMLU-Pro: 2026 AI Leaderboard

MMLU's harder successor: 10 answer choices and more reasoning.

What it tests

MMLU-Pro is a successor to MMLU that expands each question to 10 answer choices (up from 4) and rewrites prompts to require multi-step reasoning rather than pure recall.

How it is scored

Same accuracy metric as MMLU but on the harder reformulated question bank. Frontier models score roughly 10-20 points lower here than on base MMLU.

Why it matters

Worth watching because base MMLU has saturated. MMLU-Pro is less saturated and still has headroom, making it a better discriminator for top-tier models in 2026.

Leaderboard (5 models)

Sorted by MMLU-Proscore. Tier column shows the tool's overall AIToolTier rank, which blends this benchmark with pricing, features, and real-world usability.

#	Model	Tier	MMLU-Pro score	Variant	Overall
1	DeepSeek DeepSeek V4-Pro (launched 2026-04-24; scores below are the V3.2 baseline pending third-party V4 verification, which typically lands 3-7 days post-launch)	A	85%	MMLU-Pro	8.0/10
2	Qwen (Alibaba) Qwen3.5-397B MoE	A	83.5%	MMLU-Pro	8.8/10
3	GLM / Z.ai (Zhipu AI) GLM-5.1 (744B MoE / 40B active)	A	81.2%	MMLU-Pro	8.0/10
4	Llama 4 (Meta) Llama 4 Maverick (17B/400B MoE)	B	80.5%	MMLU-Pro	7.9/10
5	Nemotron (Nvidia) Nemotron 3 Ultra (253B)	B	79.8%	MMLU-Pro	7.8/10

About MMLU-Pro

Creator: TIGER-Lab, 2024
Unit: % (max 100)
Official source: https://arxiv.org/abs/2406.01574

Other benchmarks