Official Model Rankings

BridgeBench

The definitive vibe coding benchmark. Compare AI models on standardized coding tasks to find the best teammate for your stack.

Models

Top Score

96.1

Last Indexed

3/18/2026, 11:01:51 PM

Benchmark Results

Scores normalized to 0–100 based on vibe coding evaluations.

Metric

Sort

Rank	Model	Overall	Algo	Debug	Refactor	Gen	UI	Sec	Comp %	Cost	Speed
	Grok 4.20 Multi-Agent (4-Agent)x-ai/grok-4.20-multi-agent-4agents	96.1	98.7	96.4	98.7	97.5	91.1	89.4	100.0%	–	87.8s
	Grok 4.20 Multi-Agent (16-Agent)x-ai/grok-4.20-multi-agent-16agents	95.9	98.6	96.2	98.8	96.8	92.3	88.1	100.0%	–	82.1s
	GPT 5.4openai/gpt-5.4	95.5	98.9	96.4	97.9	97.0	89.7	87.6	94.6%	–	704.4s
4	Claude Sonnet 4.6anthropic/claude-sonnet-4.6	94.9	98.1	95.8	97.3	95.6	89.2	88.4	93.8%	–	25.4s
5	Claude Opus 4.6anthropic/claude-opus-4.6	94.8	98.2	95.8	95.7	96.1	89.2	88.5	93.8%	–	28.3s
6	GPT-5.4 Miniopenai/gpt-5.4-mini	94.8	99.0	96.4	97.6	94.4	88.4	87.3	100.0%	–	3.4s
7	GPT-5.3 Codexopenai/gpt-5.3-codex	94.6	98.6	96.5	97.8	93.6	89.0	87.2	93.1%	–	85.8s
8	Qwen3.5 Plus 02-15qwen/qwen3.5-plus-02-15	93.6	98.3	96.1	92.3	93.7	86.9	89.0	90.8%	–	1412.4s
9	Grok 4.20 Betax-ai/grok-4.20-beta	93.4	97.8	94.4	94.9	91.8	93.1	85.3	88.5%	–	59.0s
10	GPT-5.4 Nanoopenai/gpt-5.4-nano	92.9	97.8	96.0	98.3	90.1	85.2	84.3	100.0%	–	4.9s
11	GPT-5.2 Codexopenai/gpt-5.2-codex	92.8	98.7	96.3	95.6	86.0	89.7	88.2	89.2%	–	390.4s
12	MiniMax M2.5minimax/minimax-m2.5	92.3	97.3	94.8	97.3	94.3	76.6	84.6	87.7%	–	14565.8s
13	Qwen3.5 397B A17Bqwen/qwen3.5-397b-a17b	92.1	94.5	96.0	96.2	92.5	86.5	80.7	86.9%	–	15375.4s
14	Qwen 3.5 35B-A3Bqwen/qwen3.5-35b-a3b	91.7	94.7	96.0	87.3	93.5	86.0	88.1	96.2%	–	5312.8s
15	Qwen 3.5 122B-A10Bqwen/qwen3.5-122b-a10b	90.0	94.9	94.1	87.4	92.5	86.7	76.9	100.0%	–	21483.9s
16	GLM-5z-ai/glm-5	89.5	98.6	95.8	91.0	81.4	85.6	79.8	87.7%	–	16732.8s
17	Qwen 3.5 27Bqwen/qwen3.5-27b	89.5	94.5	93.2	83.0	92.2	86.9	80.1	90.8%	–	9764.5s
18	Kimi K2.5moonshotai/kimi-k2.5	89.1	95.9	95.7	93.1	81.7	75.1	88.6	84.6%	–	22608.1s
19	MiniMax M2.7minimax/minimax-m2.7	88.1	95.8	94.9	90.7	89.2	61.9	83.6	96.9%	–	61.4s
20	Qwen 3.5 Flash (02-23)qwen/qwen3.5-flash-02-23	86.9	87.0	89.5	86.5	90.8	78.6	84.2	100.0%	–	1847.3s
21	GLM-5 Turboz-ai/glm-5-turbo	80.2	95.3	95.9	89.7	69.5	50.9	63.5	76.9%	–	51.0s
22	Gemini 3.1 Pro Previewgoogle/gemini-3.1-pro-preview	75.6	87.7	92.0	80.6	63.4	59.6	59.3	52.3%	–	67.4s
23	Aurora Alphaopenrouter/aurora-alpha	57.6	69.6	74.5	53.3	49.2	41.1	47.8	21.5%	–	1669.9s

Legend80+ Elite60–79 Good40–59 Fair<40 Poor