Gemma 4 (Google)

A Tier · 8.3/10

Google DeepMind's open-weights model family -- multimodal, 256K context, runs on edge devices

Last updated: 2026-04-19Free tier available

Score Breakdown

7.0

Ease of Use

8.0

Output Quality

10.0

Value

8.0

Features

Benchmark Scores

Benchmarks for Gemma 4 31B

Benchmark	Description	Score
MMLU	Knowledge across 57 subjects	83%
GPQA Diamond	Graduate-level science questions	84.3%
AIME 2026		89.2%
HumanEval	Python code generation	85%

Last updated: 2026-04-13

Visit Gemma 4 (Google)

Personality & Tone

The compact Google cousin

Tone: Similar corporate-Google tone as Gemini but smaller and less polished. Gemma's chat replies are short, cautious, and structured -- closer to a careful intern than a peer.

Quirks: Inherits a Gemini-like safety bias, so refusals appear on prompts Mistral or DeepSeek would answer. Best used as a cheap local fallback or on-device model, not as a personality play.

The Good and the Bad

What we like

+Apache 2.0 license -- truly permissive, you can use it commercially without strings attached
+Multimodal: handles text + image input (audio on smaller models), generates text output
+256K token context window -- larger than most open models
+140+ language support -- one of the strongest multilingual open models available
+Four sizes (E2B, E4B, 26B MoE, 31B Dense) cover edge devices to data centers
+31B Dense scores 89% on AIME 2026 and 84% on GPQA Diamond -- competitive with frontier closed models
+31B Dense currently ranks #3 among open models on the LMArena text leaderboard (26B MoE ranks #6) -- genuinely competitive with the top Chinese and Meta open-weight families
+26B MoE activates only 3.8B params during inference for fast tokens-per-second

What could be better

−Requires technical setup unless you use a hosted API provider
−Quality still trails the very best closed models (GPT-5.4 Pro, Claude Mythos 5, Gemini 3.1 Ultra) on hardest reasoning tasks
−No native chat UI from Google -- you're either coding against an API or using a third-party frontend
−Smaller community than Llama -- fewer fine-tunes and tooling integrations exist

Pricing

Self-hosted

✓Apache 2.0 license
✓Free download from Hugging Face/Kaggle/Ollama
✓Run on your own hardware

API (OpenRouter, Gemma 4 31B)

$0.14-0.40/per 1M tokens

✓Hosted inference
✓$0.14 input / $0.40 output
✓No infrastructure setup

Google AI Studio

✓Free tier for testing
✓Web playground access

System Requirements

Hardware needed to self-host. Min = smallest viable setup (usually heavy quantization). Max = full-precision / production-grade.

Model variant	Min	Max
Gemma 4 E2B / E4B (edge-class)	2-3 GB VRAM Q4 (runs on phones and laptops)	8-12 GB VRAM FP16
Gemma 4 26B MoE	8 GB VRAM (Q4)	32 GB VRAM FP16
Gemma 4 31B Dense (flagship)	12 GB VRAM Q4 (RTX 4070)	1× A100 40 GB FP16

Known Issues

2026-04-18 BF16 stability refresh -- Google re-released Gemma 4 multimodal checkpoints in BF16 format focused on truthfulness, JSON / tool-call formatting, long-context extraction reliability, and loop resistance. Not a new model version; a quality refresh that fixes specific failure modes developers were hitting in production. If you pulled weights before 2026-04-18, consider re-downloading for the new checkpointsSource: Google DeepMind Gemma page, Hugging Face · 2026-04
Gemma 4 launched April 2, 2026 with improved licensing -- earlier Gemma versions had restrictive use clauses that confused developersSource: The Register, Hugging Face · 2026-04
Function calling support is new -- some users report inconsistent tool-use behavior compared to Llama 3 or MistralSource: Hugging Face discussions · 2026-04

Best for

Developers and businesses who need a permissively licensed multimodal LLM they can self-host or fine-tune. Especially good for multilingual use cases and on-device deployment.

Not for

Non-technical users who just want to chat with an AI -- there's no consumer-facing app. Use Gemini if you want a polished chat experience.

Our Verdict

Gemma 4 is Google's answer to the open-weights race against Meta's Llama and the wave of strong Chinese open models. The Apache 2.0 license is a big deal -- it removes the legal friction that made earlier Gemma adoption awkward. The 31B Dense model is genuinely competitive with frontier closed models on benchmarks while costing $0.14/M input via API. If you're building a product on open-weights LLMs and you need multimodal + multilingual + permissive licensing, Gemma 4 is now a top choice.