Official Model Rankings
BridgeBench
The definitive vibe coding benchmark. Compare AI models on standardized coding tasks to find the best teammate for your stack.
Models
23
Top Score
96.1
Last Indexed
3/18/2026, 11:01:51 PM
Benchmark Results
Scores normalized to 0–100 based on vibe coding evaluations.
| Rank | Model | Overall | Algo | Debug | Refactor | UI | Sec | Comp % |
|---|---|---|---|---|---|---|---|---|
Grok 4.20 Multi-Agent (4-Agent)x-ai/grok-4.20-multi-agent-4agents | 96.1 | 98.7 | 96.4 | 98.7 | 91.1 | 89.4 | 100.0% | |
Grok 4.20 Multi-Agent (16-Agent)x-ai/grok-4.20-multi-agent-16agents | 95.9 | 98.6 | 96.2 | 98.8 | 92.3 | 88.1 | 100.0% | |
GPT 5.4openai/gpt-5.4 | 95.5 | 98.9 | 96.4 | 97.9 | 89.7 | 87.6 | 94.6% | |
4 | Claude Sonnet 4.6anthropic/claude-sonnet-4.6 | 94.9 | 98.1 | 95.8 | 97.3 | 89.2 | 88.4 | 93.8% |
5 | Claude Opus 4.6anthropic/claude-opus-4.6 | 94.8 | 98.2 | 95.8 | 95.7 | 89.2 | 88.5 | 93.8% |
6 | GPT-5.4 Miniopenai/gpt-5.4-mini | 94.8 | 99.0 | 96.4 | 97.6 | 88.4 | 87.3 | 100.0% |
7 | GPT-5.3 Codexopenai/gpt-5.3-codex | 94.6 | 98.6 | 96.5 | 97.8 | 89.0 | 87.2 | 93.1% |
8 | Qwen3.5 Plus 02-15qwen/qwen3.5-plus-02-15 | 93.6 | 98.3 | 96.1 | 92.3 | 86.9 | 89.0 | 90.8% |
9 | Grok 4.20 Betax-ai/grok-4.20-beta | 93.4 | 97.8 | 94.4 | 94.9 | 93.1 | 85.3 | 88.5% |
10 | GPT-5.4 Nanoopenai/gpt-5.4-nano | 92.9 | 97.8 | 96.0 | 98.3 | 85.2 | 84.3 | 100.0% |
11 | GPT-5.2 Codexopenai/gpt-5.2-codex | 92.8 | 98.7 | 96.3 | 95.6 | 89.7 | 88.2 | 89.2% |
12 | MiniMax M2.5minimax/minimax-m2.5 | 92.3 | 97.3 | 94.8 | 97.3 | 76.6 | 84.6 | 87.7% |
13 | Qwen3.5 397B A17Bqwen/qwen3.5-397b-a17b | 92.1 | 94.5 | 96.0 | 96.2 | 86.5 | 80.7 | 86.9% |
14 | Qwen 3.5 35B-A3Bqwen/qwen3.5-35b-a3b | 91.7 | 94.7 | 96.0 | 87.3 | 86.0 | 88.1 | 96.2% |
15 | Qwen 3.5 122B-A10Bqwen/qwen3.5-122b-a10b | 90.0 | 94.9 | 94.1 | 87.4 | 86.7 | 76.9 | 100.0% |
16 | GLM-5z-ai/glm-5 | 89.5 | 98.6 | 95.8 | 91.0 | 85.6 | 79.8 | 87.7% |
17 | Qwen 3.5 27Bqwen/qwen3.5-27b | 89.5 | 94.5 | 93.2 | 83.0 | 86.9 | 80.1 | 90.8% |
18 | Kimi K2.5moonshotai/kimi-k2.5 | 89.1 | 95.9 | 95.7 | 93.1 | 75.1 | 88.6 | 84.6% |
19 | MiniMax M2.7minimax/minimax-m2.7 | 88.1 | 95.8 | 94.9 | 90.7 | 61.9 | 83.6 | 96.9% |
20 | Qwen 3.5 Flash (02-23)qwen/qwen3.5-flash-02-23 | 86.9 | 87.0 | 89.5 | 86.5 | 78.6 | 84.2 | 100.0% |
21 | GLM-5 Turboz-ai/glm-5-turbo | 80.2 | 95.3 | 95.9 | 89.7 | 50.9 | 63.5 | 76.9% |
22 | Gemini 3.1 Pro Previewgoogle/gemini-3.1-pro-preview | 75.6 | 87.7 | 92.0 | 80.6 | 59.6 | 59.3 | 52.3% |
23 | Aurora Alphaopenrouter/aurora-alpha | 57.6 | 69.6 | 74.5 | 53.3 | 41.1 | 47.8 | 21.5% |
Legend80+ Elite60–79 Good40–59 Fair<40 Poor