Kimi K2.6 (Moonshot) logo
A

Kimi K2.6 (Moonshot)

A Tier · 8.1/10

Moonshot's 1T-parameter MoE open-weights flagship -- Kimi K2.6 (GA 2026-04-20) is #1 open-weights on Artificial Analysis Intelligence Index v4.0 (score 54, ranked #4 overall). Native video input, 256K context, Modified MIT license

Last updated: 2026-05-13Free tier available

Score Breakdown

6.0
Ease of Use
9.0
Output Quality
8.5
Value
9.0
Features

Benchmark Scores

Benchmarks for Kimi K2.6 (1T/32B active MoE) -- Artificial Analysis Intelligence Index v4.0 score 54 (#1 open-weights, #4 overall as of 2026-04-27). MMLU/GPQA/AIME shown below are K2.5-baseline numbers retained until K2.6-specific third-party runs publish

BenchmarkScore
SWE-Bench Pro58.6%
MMLU-Pro (K2.5 baseline)84.8%
GPQA Diamond (K2.5 baseline)80.5%
AIME 2025 (K2.5 baseline)91.2%
LiveCodeBench (K2.5 baseline)74.1%

Last updated: 2026-04-27

Personality & Tone

The long-context note-taker

Tone: Careful and document-focused. Kimi K2.5 shines when you dump a long document in -- replies read as summary-and-citation rather than open chat, leaning on the source material rather than the model's opinions.

Quirks: Context handling is the whole pitch. Without a document to anchor to, replies feel plainer than Qwen or DeepSeek. Native Chinese quality is very strong; English is decent but not class-leading.

The Good and the Bad

What we like

  • +Frontier-tier performance -- Elo 1309 on GDPval-AA, behind only OpenAI and Anthropic flagships
  • +Beats Claude Opus 4.5 on several coding benchmarks per community testing
  • +Unified thinking + non-thinking modes in one model (no need to swap)
  • +256K context window handles large codebases for agentic coding
  • +Modified MIT license permits commercial use of weights
  • +Native tool-use and agentic planning trained in -- not bolted on

What could be better

  • 1T parameter model is impractical to self-host without 4+ H100-class GPUs
  • Moonshot is a smaller lab than DeepSeek/Alibaba -- less Western infrastructure support
  • API pricing ($0.60 in / $3.00 out) is higher than DeepSeek V3.2 ($0.28 in / $0.42 out)
  • PRC content filters apply (Tiananmen, Taiwan, etc.)
  • Documentation is heavily Chinese-first -- English docs trail releases

Pricing

Self-hosted (Free)

$0
  • Modified MIT license -- commercial use allowed
  • Weights on Hugging Face
  • Fine-tuning permitted

API (Moonshot direct, K2.6)

$0.60/per 1M input tokens
  • K2.6: $0.60 in / $2.50 out (Moonshot direct)
  • 256K context
  • Native video input (mp4/mov/avi/webm)

API (OpenRouter, K2.6 blended)

~$0.95/per 1M input tokens
  • K2.6: ~$0.95 in / ~$4.00 out via OpenRouter
  • Useful when you don't want a Moonshot account directly

System Requirements

Hardware needed to self-host. Min = smallest viable setup (usually heavy quantization). Max = full-precision / production-grade.

Model variantMinMax
Kimi K2.5 (1T total, 32B active MoE)Practically a hosted-only model for most users -- self-hosting requires enterprise hardware256 GB unified RAM Mac Studio M3 Ultra (Q2, ~3 tok/s)8× H200 141 GB FP8 or 16× H100 (production-grade)

Known Issues

  • WATCHLIST (verified 2026-05-13, Day 4 of ship window): Kimi K3 has NOT shipped. moonshotai HuggingFace org shows K2.6 as the latest model (last update 2 days ago); no Kimi-K3 repository exists. kimi.com/blog latest post remains 'Kimi K2.6 -- Advancing Open-Source Coding' (2026-04-20). Manifold market priced ~74% probability of K3 ship before end of May 2026; today is Day 4 of that window with no observable on-platform signal. If K3 lands before 2026-05-31 it likely beats Manifold's implied timeline; if it slips past 5/31 the market resolves NO. Watch: kimi.com/blog, huggingface.co/moonshotai, GitHub MoonshotAI/Kimi-K* releases.Source: kimi.com/blog (no new post since K2.6), huggingface.co/moonshotai (no K3 repo) · 2026-05-13
  • Kimi K2.6 (GA 2026-04-20) supersedes K2.5 -- 1T total / 32B active MoE, 256K context, adds native video input (mp4/mov/avi/webm). Scores 54 on Artificial Analysis Intelligence Index v4.0, ranked #1 open-weights and #4 overall (three points behind Claude Opus 4.7 / Gemini 3.1 Pro / OpenAI flagships at 57). SWE-Bench Pro 58.6%. Modified MIT license unchanged. Moonshot direct API: $0.60 in / $2.50 out per 1M tokens. OpenRouter blended: ~$0.95 in / $4.00 out. If you were on K2.5, the upgrade is non-breaking on the API side -- Moonshot routes the K2.6 model under the same endpoint familySource: Moonshot Kimi blog (kimi.com/blog/kimi-k2-6), HuggingFace moonshotai/Kimi-K2.6, Artificial Analysis, OpenRouter, SiliconANGLE · 2026-04-20
  • Self-hosting K2.5 / K2.6 at usable speed requires $30K+ in enterprise GPU hardware (8x H200 FP8 or 16x H100 production-grade) -- realistically this is a hosted-API model. Mac Studio M3 Ultra 256 GB unified RAM at Q2 quantization runs the model but at ~3 tok/sSource: Reddit r/LocalLLaMA, llm-stats.com · 2026-03
  • Early K2.5 releases had inconsistent tool-calling when quantized below Q4 -- community fixes landed March 2026; K2.6 inherits the same tool-use stack so quant guidance carries forwardSource: Hugging Face discussions · 2026-03

Best for

Agentic coding workflows, tool-use agents, and teams willing to pay hosted-API prices for frontier-tier quality with open-weights licensing protection.

Not for

Solo developers or hobbyists who want to run models locally -- the 1T parameter size makes that impractical. Use Qwen3-Coder-Next or DeepSeek instead for self-hosting.

Our Verdict

Kimi K2.5 is the best open-weights model in the world right now for agentic coding. It legitimately rivals Claude Opus 4.5 and Gemini 3.1 Pro on practical coding tasks while being nominally 'open.' The catch is that the 1T parameter size makes it hosted-only for 99% of users. If you're picking between hosted APIs and you want maximum quality with open-weights safety, Kimi K2.5 is the S-tier pick. If you need a model that actually runs on your hardware, look at Qwen3-Coder-Next or DeepSeek V3.2 instead.

Sources

  • Moonshot Kimi K2.6 blog (GA 2026-04-20) (accessed 2026-04-27)
  • HuggingFace moonshotai/Kimi-K2.6 (accessed 2026-04-27)
  • Artificial Analysis: Kimi K2.6 leading open weights (accessed 2026-04-27)
  • SiliconANGLE: Kimi K2.6 release (accessed 2026-04-27)
  • OpenRouter Kimi K2.6 pricing (accessed 2026-04-27)
  • llm-stats.com (accessed 2026-04-13)
  • Reddit r/singularity, r/LocalLLaMA (accessed 2026-04-13)

The Tier List Tuesday

Weekly newsletter: tier movers, new entrants, and the VS of the week. Built from our daily AI-tool sweeps. No spam, unsubscribe anytime.

Alternatives to Kimi K2.6 (Moonshot)

Llama 4 (Meta) logo

Llama 4 (Meta)

Meta's open-weights flagship family -- Scout (10M context), Maverick (multimodal 400B MoE), Behemoth in preview

B
7.9/10
Free tierFrom $0
Llama 4 Scout has a 10M token context wi...Llama 4 Maverick is natively multimodal ...
Updated 2026-04-13
Mistral AI logo

Mistral AI

European AI lab with open and commercial models -- Mistral Medium 3.5 SHIPPED 2026-04-29 (128B dense, 256k context, 77.6% SWE-Bench Verified) plus Vibe Remote Agents + Le Chat Work Mode. Earlier 2026 line: Small 4 (Mar 2026 119B MoE Apache 2.0 unified), Medium 3 (Apr 9 2026), Voxtral TTS (Mar 2026 open-source speech)

B
7.5/10
Free tierFrom $0
Mistral Medium 3.5 (April 29 2026) is Mi...Vibe Remote Agents (also 4/29) lets you ...
Updated 2026-05-04
DeepSeek logo

DeepSeek

DeepSeek V4 shipped 2026-04-24: V4-Pro (1.6T/49B active MoE) + V4-Flash (284B/13B active), 1M native context, Hybrid Attention Architecture, open-source on HF. Trails only Gemini 3.1 Pro on world knowledge

A
8.0/10
Free tierFrom $0
Pricing is absurdly cheap compared to GP...DeepSeek-R1 reasoning model genuinely co...
Updated 2026-04-28
Gemma 4 (Google) logo

Gemma 4 (Google)

Google DeepMind's open-weights model family -- multimodal, 256K context, runs on edge devices

A
8.3/10
Free tierFrom $0
Apache 2.0 license -- truly permissive, ...Multimodal: handles text + image input (...
Updated 2026-04-19
Qwen (Alibaba) logo

Qwen (Alibaba)

Alibaba's open-weights + API family -- Qwen3.6-27B dense (Apr 22 2026 Apache 2.0, beats the 397B MoE flagship on coding from a single consumer GPU), Qwen 3.6-Max-Preview (Apr 20 2026 closed-weights #1 on SWE-bench Pro/Terminal-Bench 2.0/SciCode), Qwen3.6-35B-A3B (Apr 16 open-weights MoE), plus Qwen 3.6-Plus API flagship

A
8.8/10
Free tierFrom $0
Qwen 3.6-Plus (launched Mar 30 2026) is ...Qwen3.5 Small (0.8B / 2B / 4B / 9B) is t...
Updated 2026-04-27
GLM / Z.ai (Zhipu AI) logo

GLM / Z.ai (Zhipu AI)

Zhipu AI's open-weights family -- GLM-5.1 (launched 2026-04-07) is 744B MoE / 40B active, topped SWE-Bench Pro at 58.4 (beating GPT-5.4 and Claude Opus 4.6), MIT licensed, 200K context. Trained entirely on 100K Huawei Ascend 910B chips -- first frontier model with zero Nvidia in the training stack

A
8.0/10
Free tierFrom $0
GLM-5.1 (2026-04-07) topped SWE-Bench Pr...First frontier model trained entirely on...
Updated 2026-04-17
Nemotron (Nvidia) logo

Nemotron (Nvidia)

Nvidia's open-weights family -- hybrid Mamba-Transformer MoE architecture, optimized for efficient reasoning on Nvidia hardware

B
7.8/10
Free tierFrom $0
Hybrid Mamba-Transformer architecture dr...Nemotron 3 Super activates only 3.6B par...
Updated 2026-04-19
MiniMax M2.7 logo

MiniMax M2.7

MiniMax's open-weights self-evolving agent flagship -- M2.7 (released 2026-03-18) scores 56.22% SWE-Pro and 57.0% Terminal Bench 2 from a 229B/10B-active MoE

A
8.4/10
Free tierFrom $0
229B/10B-active MoE delivers Tier-1 agen...Sparse MoE design: ~10B active params du...
Updated 2026-04-27
Falcon (TII) logo

Falcon (TII)

UAE's Technology Innovation Institute open-weights family -- Falcon 3 optimized for efficient sub-10B deployment on consumer hardware

B
7.1/10
Free tierFrom $0
Apache 2.0 license -- fully permissive f...Sub-10B sizes run on any consumer GPU or...
Updated 2026-04-13
gpt-oss (OpenAI) logo

gpt-oss (OpenAI)

OpenAI's FIRST open-weight models -- gpt-oss-120b (single 80GB GPU, near parity with o4-mini on reasoning) and gpt-oss-20b (runs on 16GB edge devices). Apache 2.0. Launched 2025-08-05. gpt-oss-safeguard ships in 2026 as the safety-tuned variant

A
8.1/10
Free tierFrom $0
First-ever OpenAI open-weight release --...gpt-oss-120b approaches o4-mini on reaso...
Updated 2026-04-17
IBM Granite 4.0 logo

IBM Granite 4.0

IBM's enterprise-focused open-weight family -- Granite 4.0 hybrid Mamba-2 + transformer architecture (70-80% memory reduction vs pure transformer), 3B to 32B sizes, Apache 2.0. First open model family to secure ISO 42001 certification. Nano 350M runs on CPU with 8-16GB RAM. 3B Vision variant landed 2026-04-01

A
8.2/10
Free tierFrom $0
Hybrid Mamba-2 + transformer architectur...Granite 4.0 Nano (350M and 1.5B) is genu...
Updated 2026-04-17
Arcee Trinity-Large-Thinking logo

Arcee Trinity-Large-Thinking

Arcee AI's US-made open-weight frontier reasoning model -- launched 2026-04-01. 398B total params, ~13B active. Sparse MoE (256 experts, 4 active = 1.56% routing). Apache 2.0, trained from scratch. #2 on PinchBench trailing only Claude 3.5 Opus. ~96% cheaper than Opus-4.6 on agentic tasks

A
8.1/10
Free tierFrom $0
Rare US-made frontier-tier open-weight r...Trained from scratch (not a fine-tune) a...
Updated 2026-04-17
Olmo 3 (AI2) logo

Olmo 3 (AI2)

Allen Institute for AI's fully-open frontier reasoning models -- Olmo 3 family (2025-11-20) includes 7B and 32B sizes, four variants (Base, Think, Instruct, RLZero). Apache 2.0 with fully open data + checkpoints + training logs. Olmo 3-Think 32B matches Qwen3-32B-Thinking at 6x fewer training tokens

B
7.9/10
Free tierFrom $0
FULLY OPEN is a different category than ...Olmo 3-Think 32B matches Qwen3-32B-Think...
Updated 2026-04-17
AI21 Jamba2 logo

AI21 Jamba2

AI21 Labs' hybrid SSM-Transformer (Mamba-style) open-weight family -- Jamba2 launched 2026-01-08. Two sizes: 3B dense (runs on phones / laptops) and Jamba2 Mini MoE (12B active / 52B total). Apache 2.0, 256K context, mid-trained on 500B tokens

A
8.0/10
Free tierFrom $0
Hybrid SSM-Transformer (Mamba-style) arc...Jamba2 3B dense runs realistically on iP...
Updated 2026-04-17
StepFun Step 3.5 Flash logo

StepFun Step 3.5 Flash

StepFun's (China) agent-focused open-weight model -- Step 3.5 Flash launched 2026-02-01. 196B sparse MoE, ~11B active. Benchmarks slightly ahead of DeepSeek V3.2 at over 3x smaller total size. Step 3 (321B / 38B active, Apache 2.0) and Step3-VL-10B multimodal also in the family

B
7.8/10
Free tierFrom $0
Step 3.5 Flash at 196B total / 11B activ...Agent-focused tuning explicitly -- tool ...
Updated 2026-04-17
Cohere Command A logo

Cohere Command A

Cohere's enterprise-multilingual flagship -- 111B params, 256K context, runs on 2x H100. 23 languages. CC-BY-NC 4.0 on weights (research / non-commercial), commercial requires Cohere enterprise contract. Follow-ups: Command A Reasoning + Command A Vision

B
7.5/10
Free tierFrom $0
Best-in-class multilingual open-weight m...Runs on just 2x H100 at FP16 for the ful...
Updated 2026-04-17