DeepSeek
A Tier · 8.0/10
DeepSeek V4 shipped 2026-04-24: V4-Pro (1.6T/49B active MoE) + V4-Flash (284B/13B active), 1M native context, Hybrid Attention Architecture, open-source on HF. Trails only Gemini 3.1 Pro on world knowledge
Score Breakdown
Benchmark Scores
Benchmarks for DeepSeek V4-Pro (launched 2026-04-24; scores below are the V3.2 baseline pending third-party V4 verification, which typically lands 3-7 days post-launch)
| Benchmark | Description | Score | |
|---|---|---|---|
| MMLU | Knowledge across 57 subjects | 90.8% | |
| MMLU-Pro | Harder multi-subject reasoning | 85% | |
| GPQA Diamond | Graduate-level science questions | 79.9% | |
| HumanEval | Python code generation | 91.5% | |
| SWE-bench | Real GitHub issue fixing | 67.8% |
Last updated: 2026-04-24
Personality & Tone
The open-source reasoning specialist
Tone: Direct and technical. DeepSeek's chat models give compact, math- and code-first answers and are noticeably less chatty than Claude or ChatGPT. When asked to reason, they expose a lot of visible thinking.
Quirks: Refusal patterns differ from Western models -- more permissive on many technical and gray-area prompts, more cautious on China-specific political questions. Community-tuned variants exist with different system prompts and guardrails.
The Good and the Bad
What we like
- +Pricing is absurdly cheap compared to GPT-4 or Claude -- we're talking 90%+ savings on API calls
- +DeepSeek-R1 reasoning model genuinely competes with o1 and o3 on math and coding benchmarks
- +Fully open-source weights mean you can run it locally or fine-tune for your own use case
- +130M+ users and growing fast, so the ecosystem and community support are solid
What could be better
- −Censorship on politically sensitive topics is real and unavoidable -- it's a Chinese company subject to PRC regulations
- −English output quality is good but noticeably behind Claude or GPT-4 for nuanced writing tasks
- −Hallucinations on niche or domain-specific topics happen more often than with top-tier Western models
- −Service reliability has been spotty during high-demand periods -- the free tier especially suffers from rate limiting
Pricing
Free
- ✓Web chat access at chat.deepseek.com
- ✓V4-Flash by default (as of 2026-04-24 launch)
- ✓Basic usage limits
API -- V4-Flash
- ✓284B total / 13B active MoE
- ✓1M native context
- ✓Cheapest frontier-class API on market
- ✓Pay-as-you-go, no minimum
API -- V4-Pro (75% PROMO active through 2026-05-31)
- ✓1.6T total / 49B active MoE
- ✓1M native context
- ✓Trails only Gemini 3.1 Pro on world knowledge benchmarks
- ✓PROMO PRICING active through 2026-05-31 15:59 UTC -- 75% off list ($1.74/$3.48). Cache-hit input drops to $0.003625/M during promo
- ✓Post-promo pricing reverts to $1.74/$3.48 per 1M -- still 3-10x cheaper than GPT-5.5 or Claude Opus 4.7
Self-hosted (open-source)
- ✓MIT license, open weights on HuggingFace
- ✓V4-Flash is feasible on consumer hardware with quantization
- ✓V4-Pro needs multi-GPU production infrastructure
System Requirements
Hardware needed to self-host. Min = smallest viable setup (usually heavy quantization). Max = full-precision / production-grade.
| Model variant | Min | Max |
|---|---|---|
| DeepSeek V4-Flash (284B total, 13B active MoE)MIT license, open weights on HuggingFace. Flash is the accessible entry point -- feasible on enthusiast / workstation hardware | 96 GB RAM + 1× RTX 3090/4090 (Q4 quantization, ~3-5 tok/s) | 2× H100 FP8 or 1× H200 (FP8 production, fast) |
| DeepSeek V4-Pro (1.6T total, 49B active MoE)MIT license, open weights. Pro is production multi-GPU territory -- not feasible for individuals | 512 GB RAM + 4× RTX 4090 (severe quantization, experimental) | 16× H100 FP8 or 8× H200 (full 1.6T production) |
| DeepSeek V3.2 (671B total, 37B active MoE) -- prior version, still availableMIT license -- commercial use OK | 192 GB RAM + 1× RTX 3090/4090 (IQ2_XXS offload, ~2 tok/s) | 8× H100 FP8 or 4× H200 (full 671B, production) |
Known Issues
- Regional availability restrictions: EU, Canada, South Korea, Australia, and India issued formal restrictions or bans on deployment of DeepSeek-V3 and the enterprise API in Q1 2026 over data-residency concerns (traffic routing through mainland China). Germany's BSI confirmed classified metadata leak from a parliamentary pilot. If you're deploying DeepSeek in any of these jurisdictions, check local compliance guidance before shipping; self-hosted open-weights deployment is often the workaround but changes the operational pictureSource: National CSIRT/BSI statements (aggregated), Alibaba policy analysis · 2026-Q1
- DeepSeek V4 SHIPPED 2026-04-24. Two-model family released simultaneously: V4-Pro (1.6T total / 49B active MoE) and V4-Flash (284B / 13B active MoE). Both default to 1M context natively, use DeepSeek's new Hybrid Attention Architecture, and are open-sourced on HuggingFace under MIT license. V4-Pro trails only Gemini 3.1 Pro on world-knowledge benchmarks per early third-party runs. API pricing: Flash $0.14/$0.28, Pro $1.74/$3.48 per 1M tokens -- still 3-10x cheaper than Western frontier models. Tier-1 coverage: Bloomberg, CNBC, TechCrunch, Simon Willison blog. This closes out the 'V4 imminent' watchlist item that was open since 2026-04-03 Reuters pre-reportSource: DeepSeek API docs, Bloomberg, CNBC, TechCrunch, Simon Willison · 2026-04-24
- PROMO: DeepSeek V4-Pro is 75% off through 2026-05-31 15:59 UTC per the official pricing page (api-docs.deepseek.com/quick_start/pricing). Effective rates during promo: $0.435 input / $0.87 output per 1M tokens (vs $1.74 / $3.48 list); cache-hit input drops to $0.003625/M. After 2026-05-31 reverts to standard pricing. Bloomberg framed the move as a 'Chinese price war' against frontier-model rates from OpenAI / Anthropic / Google. Worth locking in agentic-coding workloads now if you're cost-sensitiveSource: DeepSeek pricing docs (api-docs.deepseek.com/quick_start/pricing), Bloomberg · 2026-04-27
- Third-party verification (T+3 days post-launch): Artificial Analysis Intelligence Index pegs V4-Pro at 52 (#2 open-weight, behind Kimi K2.6) and V4-Flash at 47. Vals AI: V4 is #1 open-weight on Vibe Code Bench 'and it's not close', plus #1 open-weight on SWE-bench. SWE-bench Verified 80.6% (effectively tied with Claude Opus 4.6's 80.8%). Codeforces 3206 surpasses GPT-5.4 (3168) -- highest competitive-programming score at release. GDPval-AA agentic 1554 leads all open-weight models. BUT LMSYS Chatbot Arena Elo around 1220 places V4-Pro alongside GPT-4o and Claude 4 Sonnet, not at the Opus-class frontier (1280+). Simon Willison's pelican-SVG community test produced visibly weak output from V4-Pro (one wing, oversized body) and concluded V4-Pro is 3-6 months behind US frontier labs at a fraction of the cost. Practical verdict: best-in-class open-weight for code/agents/math, mid-pack for general chat quality, weakest for creative/visual generation. Hallucination rate 94%/96% (Pro/Flash) per AA-Omniscience -- caveat for fact-sensitive workloadsSource: Artificial Analysis, Vals AI, Simon Willison, LMSYS Chatbot Arena, Codeforces · 2026-04-27
- Refuses to engage with questions about Tiananmen Square, Taiwan sovereignty, and other politically sensitive topics per Chinese regulationsSource: Reddit r/LocalLLaMA · 2026-01
- API latency spikes during peak hours, sometimes timing out entirely on longer reasoning chainsSource: GitHub Issues · 2026-02
Best for
Developers and teams who need strong reasoning and coding capabilities on a budget. If you're building AI features and can't justify GPT-4 API costs, DeepSeek is the obvious first stop.
Not for
Anyone working on content that touches geopolitical topics, or teams that need guaranteed uptime and enterprise SLAs. Also not ideal if your primary use case is creative English writing.
Our Verdict
DeepSeek is the real deal when it comes to bang-for-your-buck AI. The reasoning capabilities are legitimately impressive, and the open-source angle gives it a flexibility that closed models can't match. The censorship limitations are a dealbreaker for some use cases, and the writing quality trails behind Claude and GPT-4. But for coding, math, and analytical tasks? It's hard to argue with near-frontier performance at a fraction of the cost.
Sources
- DeepSeek V4 API launch announcement (2026-04-24) (accessed 2026-04-24)
- Bloomberg: DeepSeek unveils newest flagship (2026-04-24) (accessed 2026-04-24)
- CNBC: DeepSeek V4 LLM preview (2026-04-24) (accessed 2026-04-24)
- TechCrunch: DeepSeek V4 closes gap with frontier (2026-04-24) (accessed 2026-04-24)
- Simon Willison: DeepSeek V4 (accessed 2026-04-24)
- Artificial Analysis: DeepSeek V4 Pro + Flash leading open weights (accessed 2026-04-27)
- Vals AI: DeepSeek V4-Pro model card (accessed 2026-04-27)
- DeepSeek pricing docs (75% V4-Pro promo through 2026-05-31) (accessed 2026-04-28)
- DeepSeek official site (accessed 2026-04-24)
- Artificial Analysis benchmarks (accessed 2026-04-24)
Explore more DeepSeek rankings
Deeper leaderboards, benchmarks, task-specific tier lists, and status/pricing pages for DeepSeek.
The Tier List Tuesday
Weekly newsletter: tier movers, new entrants, and the VS of the week. Built from our daily AI-tool sweeps. No spam, unsubscribe anytime.
Alternatives to DeepSeek
Llama 4 (Meta)
Meta's open-weights flagship family -- Scout (10M context), Maverick (multimodal 400B MoE), Behemoth in preview
Mistral AI
European AI lab with open and commercial models -- Mistral Medium 3.5 SHIPPED 2026-04-29 (128B dense, 256k context, 77.6% SWE-Bench Verified) plus Vibe Remote Agents + Le Chat Work Mode. Earlier 2026 line: Small 4 (Mar 2026 119B MoE Apache 2.0 unified), Medium 3 (Apr 9 2026), Voxtral TTS (Mar 2026 open-source speech)
Gemma 4 (Google)
Google DeepMind's open-weights model family -- multimodal, 256K context, runs on edge devices
Qwen (Alibaba)
Alibaba's open-weights + API family -- Qwen3.6-27B dense (Apr 22 2026 Apache 2.0, beats the 397B MoE flagship on coding from a single consumer GPU), Qwen 3.6-Max-Preview (Apr 20 2026 closed-weights #1 on SWE-bench Pro/Terminal-Bench 2.0/SciCode), Qwen3.6-35B-A3B (Apr 16 open-weights MoE), plus Qwen 3.6-Plus API flagship
GLM / Z.ai (Zhipu AI)
Zhipu AI's open-weights family -- GLM-5.1 (launched 2026-04-07) is 744B MoE / 40B active, topped SWE-Bench Pro at 58.4 (beating GPT-5.4 and Claude Opus 4.6), MIT licensed, 200K context. Trained entirely on 100K Huawei Ascend 910B chips -- first frontier model with zero Nvidia in the training stack
Kimi K2.6 (Moonshot)
Moonshot's 1T-parameter MoE open-weights flagship -- Kimi K2.6 (GA 2026-04-20) is #1 open-weights on Artificial Analysis Intelligence Index v4.0 (score 54, ranked #4 overall). Native video input, 256K context, Modified MIT license
Nemotron (Nvidia)
Nvidia's open-weights family -- hybrid Mamba-Transformer MoE architecture, optimized for efficient reasoning on Nvidia hardware
MiniMax M2.7
MiniMax's open-weights self-evolving agent flagship -- M2.7 (released 2026-03-18) scores 56.22% SWE-Pro and 57.0% Terminal Bench 2 from a 229B/10B-active MoE
Falcon (TII)
UAE's Technology Innovation Institute open-weights family -- Falcon 3 optimized for efficient sub-10B deployment on consumer hardware
gpt-oss (OpenAI)
OpenAI's FIRST open-weight models -- gpt-oss-120b (single 80GB GPU, near parity with o4-mini on reasoning) and gpt-oss-20b (runs on 16GB edge devices). Apache 2.0. Launched 2025-08-05. gpt-oss-safeguard ships in 2026 as the safety-tuned variant
IBM Granite 4.0
IBM's enterprise-focused open-weight family -- Granite 4.0 hybrid Mamba-2 + transformer architecture (70-80% memory reduction vs pure transformer), 3B to 32B sizes, Apache 2.0. First open model family to secure ISO 42001 certification. Nano 350M runs on CPU with 8-16GB RAM. 3B Vision variant landed 2026-04-01
Arcee Trinity-Large-Thinking
Arcee AI's US-made open-weight frontier reasoning model -- launched 2026-04-01. 398B total params, ~13B active. Sparse MoE (256 experts, 4 active = 1.56% routing). Apache 2.0, trained from scratch. #2 on PinchBench trailing only Claude 3.5 Opus. ~96% cheaper than Opus-4.6 on agentic tasks
Olmo 3 (AI2)
Allen Institute for AI's fully-open frontier reasoning models -- Olmo 3 family (2025-11-20) includes 7B and 32B sizes, four variants (Base, Think, Instruct, RLZero). Apache 2.0 with fully open data + checkpoints + training logs. Olmo 3-Think 32B matches Qwen3-32B-Thinking at 6x fewer training tokens
AI21 Jamba2
AI21 Labs' hybrid SSM-Transformer (Mamba-style) open-weight family -- Jamba2 launched 2026-01-08. Two sizes: 3B dense (runs on phones / laptops) and Jamba2 Mini MoE (12B active / 52B total). Apache 2.0, 256K context, mid-trained on 500B tokens
StepFun Step 3.5 Flash
StepFun's (China) agent-focused open-weight model -- Step 3.5 Flash launched 2026-02-01. 196B sparse MoE, ~11B active. Benchmarks slightly ahead of DeepSeek V3.2 at over 3x smaller total size. Step 3 (321B / 38B active, Apache 2.0) and Step3-VL-10B multimodal also in the family
Cohere Command A
Cohere's enterprise-multilingual flagship -- 111B params, 256K context, runs on 2x H100. 23 languages. CC-BY-NC 4.0 on weights (research / non-commercial), commercial requires Cohere enterprise contract. Follow-ups: Command A Reasoning + Command A Vision