Kimi K2.6 (Moonshot)
A Tier · 8.1/10
Moonshot's 1T-parameter MoE open-weights flagship -- Kimi K2.6 (GA 2026-04-20) is #1 open-weights on Artificial Analysis Intelligence Index v4.0 (score 54, ranked #4 overall). Native video input, 256K context, Modified MIT license
Score Breakdown
Benchmark Scores
Benchmarks for Kimi K2.6 (1T/32B active MoE) -- Artificial Analysis Intelligence Index v4.0 score 54 (#1 open-weights, #4 overall as of 2026-04-27). MMLU/GPQA/AIME shown below are K2.5-baseline numbers retained until K2.6-specific third-party runs publish
| Benchmark | Description | Score | |
|---|---|---|---|
| SWE-Bench Pro | 58.6% | ||
| MMLU-Pro (K2.5 baseline) | 84.8% | ||
| GPQA Diamond (K2.5 baseline) | 80.5% | ||
| AIME 2025 (K2.5 baseline) | 91.2% | ||
| LiveCodeBench (K2.5 baseline) | 74.1% |
Last updated: 2026-04-27
Personality & Tone
The long-context note-taker
Tone: Careful and document-focused. Kimi K2.5 shines when you dump a long document in -- replies read as summary-and-citation rather than open chat, leaning on the source material rather than the model's opinions.
Quirks: Context handling is the whole pitch. Without a document to anchor to, replies feel plainer than Qwen or DeepSeek. Native Chinese quality is very strong; English is decent but not class-leading.
The Good and the Bad
What we like
- +Frontier-tier performance -- Elo 1309 on GDPval-AA, behind only OpenAI and Anthropic flagships
- +Beats Claude Opus 4.5 on several coding benchmarks per community testing
- +Unified thinking + non-thinking modes in one model (no need to swap)
- +256K context window handles large codebases for agentic coding
- +Modified MIT license permits commercial use of weights
- +Native tool-use and agentic planning trained in -- not bolted on
What could be better
- −1T parameter model is impractical to self-host without 4+ H100-class GPUs
- −Moonshot is a smaller lab than DeepSeek/Alibaba -- less Western infrastructure support
- −API pricing ($0.60 in / $3.00 out) is higher than DeepSeek V3.2 ($0.28 in / $0.42 out)
- −PRC content filters apply (Tiananmen, Taiwan, etc.)
- −Documentation is heavily Chinese-first -- English docs trail releases
Pricing
Self-hosted (Free)
- ✓Modified MIT license -- commercial use allowed
- ✓Weights on Hugging Face
- ✓Fine-tuning permitted
API (Moonshot direct, K2.6)
- ✓K2.6: $0.60 in / $2.50 out (Moonshot direct)
- ✓256K context
- ✓Native video input (mp4/mov/avi/webm)
API (OpenRouter, K2.6 blended)
- ✓K2.6: ~$0.95 in / ~$4.00 out via OpenRouter
- ✓Useful when you don't want a Moonshot account directly
System Requirements
Hardware needed to self-host. Min = smallest viable setup (usually heavy quantization). Max = full-precision / production-grade.
| Model variant | Min | Max |
|---|---|---|
| Kimi K2.5 (1T total, 32B active MoE)Practically a hosted-only model for most users -- self-hosting requires enterprise hardware | 256 GB unified RAM Mac Studio M3 Ultra (Q2, ~3 tok/s) | 8× H200 141 GB FP8 or 16× H100 (production-grade) |
Known Issues
- WATCHLIST (verified 2026-05-13, Day 4 of ship window): Kimi K3 has NOT shipped. moonshotai HuggingFace org shows K2.6 as the latest model (last update 2 days ago); no Kimi-K3 repository exists. kimi.com/blog latest post remains 'Kimi K2.6 -- Advancing Open-Source Coding' (2026-04-20). Manifold market priced ~74% probability of K3 ship before end of May 2026; today is Day 4 of that window with no observable on-platform signal. If K3 lands before 2026-05-31 it likely beats Manifold's implied timeline; if it slips past 5/31 the market resolves NO. Watch: kimi.com/blog, huggingface.co/moonshotai, GitHub MoonshotAI/Kimi-K* releases.Source: kimi.com/blog (no new post since K2.6), huggingface.co/moonshotai (no K3 repo) · 2026-05-13
- Kimi K2.6 (GA 2026-04-20) supersedes K2.5 -- 1T total / 32B active MoE, 256K context, adds native video input (mp4/mov/avi/webm). Scores 54 on Artificial Analysis Intelligence Index v4.0, ranked #1 open-weights and #4 overall (three points behind Claude Opus 4.7 / Gemini 3.1 Pro / OpenAI flagships at 57). SWE-Bench Pro 58.6%. Modified MIT license unchanged. Moonshot direct API: $0.60 in / $2.50 out per 1M tokens. OpenRouter blended: ~$0.95 in / $4.00 out. If you were on K2.5, the upgrade is non-breaking on the API side -- Moonshot routes the K2.6 model under the same endpoint familySource: Moonshot Kimi blog (kimi.com/blog/kimi-k2-6), HuggingFace moonshotai/Kimi-K2.6, Artificial Analysis, OpenRouter, SiliconANGLE · 2026-04-20
- Self-hosting K2.5 / K2.6 at usable speed requires $30K+ in enterprise GPU hardware (8x H200 FP8 or 16x H100 production-grade) -- realistically this is a hosted-API model. Mac Studio M3 Ultra 256 GB unified RAM at Q2 quantization runs the model but at ~3 tok/sSource: Reddit r/LocalLLaMA, llm-stats.com · 2026-03
- Early K2.5 releases had inconsistent tool-calling when quantized below Q4 -- community fixes landed March 2026; K2.6 inherits the same tool-use stack so quant guidance carries forwardSource: Hugging Face discussions · 2026-03
Best for
Agentic coding workflows, tool-use agents, and teams willing to pay hosted-API prices for frontier-tier quality with open-weights licensing protection.
Not for
Solo developers or hobbyists who want to run models locally -- the 1T parameter size makes that impractical. Use Qwen3-Coder-Next or DeepSeek instead for self-hosting.
Our Verdict
Kimi K2.5 is the best open-weights model in the world right now for agentic coding. It legitimately rivals Claude Opus 4.5 and Gemini 3.1 Pro on practical coding tasks while being nominally 'open.' The catch is that the 1T parameter size makes it hosted-only for 99% of users. If you're picking between hosted APIs and you want maximum quality with open-weights safety, Kimi K2.5 is the S-tier pick. If you need a model that actually runs on your hardware, look at Qwen3-Coder-Next or DeepSeek V3.2 instead.
Sources
- Moonshot Kimi K2.6 blog (GA 2026-04-20) (accessed 2026-04-27)
- HuggingFace moonshotai/Kimi-K2.6 (accessed 2026-04-27)
- Artificial Analysis: Kimi K2.6 leading open weights (accessed 2026-04-27)
- SiliconANGLE: Kimi K2.6 release (accessed 2026-04-27)
- OpenRouter Kimi K2.6 pricing (accessed 2026-04-27)
- llm-stats.com (accessed 2026-04-13)
- Reddit r/singularity, r/LocalLLaMA (accessed 2026-04-13)
Explore more Kimi K2.6 (Moonshot) rankings
Deeper leaderboards, benchmarks, task-specific tier lists, and status/pricing pages for Kimi K2.6 (Moonshot).
The Tier List Tuesday
Weekly newsletter: tier movers, new entrants, and the VS of the week. Built from our daily AI-tool sweeps. No spam, unsubscribe anytime.
Alternatives to Kimi K2.6 (Moonshot)
Llama 4 (Meta)
Meta's open-weights flagship family -- Scout (10M context), Maverick (multimodal 400B MoE), Behemoth in preview
Mistral AI
European AI lab with open and commercial models -- Mistral Medium 3.5 SHIPPED 2026-04-29 (128B dense, 256k context, 77.6% SWE-Bench Verified) plus Vibe Remote Agents + Le Chat Work Mode. Earlier 2026 line: Small 4 (Mar 2026 119B MoE Apache 2.0 unified), Medium 3 (Apr 9 2026), Voxtral TTS (Mar 2026 open-source speech)
DeepSeek
DeepSeek V4 shipped 2026-04-24: V4-Pro (1.6T/49B active MoE) + V4-Flash (284B/13B active), 1M native context, Hybrid Attention Architecture, open-source on HF. Trails only Gemini 3.1 Pro on world knowledge
Gemma 4 (Google)
Google DeepMind's open-weights model family -- multimodal, 256K context, runs on edge devices
Qwen (Alibaba)
Alibaba's open-weights + API family -- Qwen3.6-27B dense (Apr 22 2026 Apache 2.0, beats the 397B MoE flagship on coding from a single consumer GPU), Qwen 3.6-Max-Preview (Apr 20 2026 closed-weights #1 on SWE-bench Pro/Terminal-Bench 2.0/SciCode), Qwen3.6-35B-A3B (Apr 16 open-weights MoE), plus Qwen 3.6-Plus API flagship
GLM / Z.ai (Zhipu AI)
Zhipu AI's open-weights family -- GLM-5.1 (launched 2026-04-07) is 744B MoE / 40B active, topped SWE-Bench Pro at 58.4 (beating GPT-5.4 and Claude Opus 4.6), MIT licensed, 200K context. Trained entirely on 100K Huawei Ascend 910B chips -- first frontier model with zero Nvidia in the training stack
Nemotron (Nvidia)
Nvidia's open-weights family -- hybrid Mamba-Transformer MoE architecture, optimized for efficient reasoning on Nvidia hardware
MiniMax M2.7
MiniMax's open-weights self-evolving agent flagship -- M2.7 (released 2026-03-18) scores 56.22% SWE-Pro and 57.0% Terminal Bench 2 from a 229B/10B-active MoE
Falcon (TII)
UAE's Technology Innovation Institute open-weights family -- Falcon 3 optimized for efficient sub-10B deployment on consumer hardware
gpt-oss (OpenAI)
OpenAI's FIRST open-weight models -- gpt-oss-120b (single 80GB GPU, near parity with o4-mini on reasoning) and gpt-oss-20b (runs on 16GB edge devices). Apache 2.0. Launched 2025-08-05. gpt-oss-safeguard ships in 2026 as the safety-tuned variant
IBM Granite 4.0
IBM's enterprise-focused open-weight family -- Granite 4.0 hybrid Mamba-2 + transformer architecture (70-80% memory reduction vs pure transformer), 3B to 32B sizes, Apache 2.0. First open model family to secure ISO 42001 certification. Nano 350M runs on CPU with 8-16GB RAM. 3B Vision variant landed 2026-04-01
Arcee Trinity-Large-Thinking
Arcee AI's US-made open-weight frontier reasoning model -- launched 2026-04-01. 398B total params, ~13B active. Sparse MoE (256 experts, 4 active = 1.56% routing). Apache 2.0, trained from scratch. #2 on PinchBench trailing only Claude 3.5 Opus. ~96% cheaper than Opus-4.6 on agentic tasks
Olmo 3 (AI2)
Allen Institute for AI's fully-open frontier reasoning models -- Olmo 3 family (2025-11-20) includes 7B and 32B sizes, four variants (Base, Think, Instruct, RLZero). Apache 2.0 with fully open data + checkpoints + training logs. Olmo 3-Think 32B matches Qwen3-32B-Thinking at 6x fewer training tokens
AI21 Jamba2
AI21 Labs' hybrid SSM-Transformer (Mamba-style) open-weight family -- Jamba2 launched 2026-01-08. Two sizes: 3B dense (runs on phones / laptops) and Jamba2 Mini MoE (12B active / 52B total). Apache 2.0, 256K context, mid-trained on 500B tokens
StepFun Step 3.5 Flash
StepFun's (China) agent-focused open-weight model -- Step 3.5 Flash launched 2026-02-01. 196B sparse MoE, ~11B active. Benchmarks slightly ahead of DeepSeek V3.2 at over 3x smaller total size. Step 3 (321B / 38B active, Apache 2.0) and Step3-VL-10B multimodal also in the family
Cohere Command A
Cohere's enterprise-multilingual flagship -- 111B params, 256K context, runs on 2x H100. 23 languages. CC-BY-NC 4.0 on weights (research / non-commercial), commercial requires Cohere enterprise contract. Follow-ups: Command A Reasoning + Command A Vision