Leaderboard Overview
See how leading AI coding models stack up across algorithms, debugging, refactoring, generation, security, and speed. Each card provides a snapshot of the top performers in that category. Learn more.
Security
View
Apr 2 · 2h ago
| Rank | Model | Score |
|---|---|---|
| 1 | Claude Sonnet 4.6 | 85.3 |
| 2 | Gemini 3.1 Pro | 85.2 |
| 3 | GPT-5.4 | 84.8 |
| 4 | GPT-5.4 Mini | 83.2 |
| 5 | Qwen 3.6 Plus Preview (Free) | 82.4 |
| 6 | GPT-5.4 Nano | 81.9 |
| 7 | Claude Opus 4.6 | 81.6 |
| 8 | GLM 5.1 | 80.8 |
| 9 | Qwen3.5 Plus 2026-02-15 | 80.3 |
| 10 | Grok 4.20 Reasoning | 78.9 |
Hallucination
View
Apr 2 · 2h ago
| Rank | Model | Score | Fab % |
|---|---|---|---|
| 1 | Grok 4.20 Reasoning | 91.8 | 10.0% |
| 2 | Claude Opus 4.6 | 87.6 | 16.7% |
| 3 | GPT-5.4 | 86.1 | 16.7% |
| 4 | Qwen 3.6 Plus Preview (Free) | 79.7 | 26.5% |
| 5 | Gemini 3.1 Pro | 79.1 | 26.7% |
| 6 | Qwen3.5 Plus 2026-02-15 | 77.3 | 29.0% |
| 7 | Claude Sonnet 4.6 | 76.6 | 28.9% |
| 8 | Grok 4.20 (Non-Reasoning) | 76.1 | 29.7% |
| 9 | Gemini 3 Pro | 75.9 | 30.0% |
| 10 | Claude Haiku 4.5 | 73.0 | 34.2% |
Speed
View
Apr 2 · 2h ago
| Rank | Model | tok/s | TTFT |
|---|---|---|---|
| 1 | Grok 4.20 (Non-Reasoning) | 243.3 | 1999ms |
| 2 | Grok 4.20 Reasoning | 237.7 | 1497ms |
| 3 | GPT-5.4 Mini | 236.4 | 233ms |
| 4 | GPT-5.4 Nano | 227.8 | 941ms |
| 5 | GLM 5V Turbo | 221.2 | 5444ms |
| 6 | Qwen 3.6 Plus Preview (Free) | 158 | 11520ms |
| 7 | Gemini 3.1 Pro | 122.2 | 7608ms |
| 8 | Claude Sonnet 4.6 | 95.3 | 1207ms |
| 9 | Qwen3.5 Plus 2026-02-15 | 94.6 | 14952ms |
| 10 | Claude Opus 4.6 | 92.2 | 1922ms |
Coming Soon
Overall
| Rank | Model | Score |
|---|---|---|
| 1 | GPT-5.4 | 95.5 |
| 2 | GPT-5.4 Mini | 94.8 |
| 3 | GPT-5.4 Nano | 92.9 |
| 4 | GPT-4.1 | 91.8 |
| 5 | Qwen 3.5 35B-A3B | 91.7 |
| 6 | Claude Sonnet 4.5 | 90.7 |
| 7 | Qwen 3.5 122B-A10B | 90.0 |
| 8 | o3-mini | 89.6 |
| 9 | Qwen 3.5 27B | 89.5 |
| 10 | Gemini 2.5 Pro | 88.9 |
Coming Soon
Algorithms
| Rank | Model | Score |
|---|---|---|
| 1 | GPT-5.4 Mini | 99.0 |
| 2 | GPT-5.4 | 98.9 |
| 3 | GPT-5.4 Nano | 97.8 |
| 4 | Qwen 3.5 122B-A10B | 94.9 |
| 5 | Qwen 3.5 35B-A3B | 94.7 |
| 6 | Qwen 3.5 27B | 94.5 |
| 7 | GPT-4.1 | 92.7 |
| 8 | o3-mini | 90.3 |
| 9 | Gemini 2.5 Pro | 89.8 |
| 10 | Claude Sonnet 4.5 | 89.6 |
Coming Soon
Debugging
| Rank | Model | Score |
|---|---|---|
| 1 | GPT-5.4 | 96.4 |
| 2 | GPT-5.4 Mini | 96.4 |
| 3 | GPT-5.4 Nano | 96.0 |
| 4 | Qwen 3.5 35B-A3B | 96.0 |
| 5 | Qwen 3.5 122B-A10B | 94.1 |
| 6 | GPT-4.1 | 93.8 |
| 7 | Qwen 3.5 27B | 93.2 |
| 8 | Claude Sonnet 4.5 | 92.5 |
| 9 | o3-mini | 91.4 |
| 10 | Gemini 2.5 Pro | 90.6 |
Coming Soon
Refactoring
| Rank | Model | Score |
|---|---|---|
| 1 | GPT-5.4 Nano | 98.3 |
| 2 | GPT-5.4 | 97.9 |
| 3 | GPT-5.4 Mini | 97.6 |
| 4 | Claude Sonnet 4.5 | 93.1 |
| 5 | GPT-4.1 | 91.9 |
| 6 | o3-mini | 89.8 |
| 7 | Gemini 2.5 Pro | 88.4 |
| 8 | Qwen 3.5 122B-A10B | 87.4 |
| 9 | Qwen 3.5 35B-A3B | 87.3 |
| 10 | Qwen 3.5 Flash (02-23) | 86.5 |
Coming Soon
Generation
| Rank | Model | Score |
|---|---|---|
| 1 | GPT-5.4 | 97.0 |
| 2 | GPT-5.4 Mini | 94.4 |
| 3 | Qwen 3.5 35B-A3B | 93.5 |
| 4 | Qwen 3.5 122B-A10B | 92.5 |
| 5 | GPT-4.1 | 92.4 |
| 6 | Qwen 3.5 27B | 92.2 |
| 7 | Qwen 3.5 Flash (02-23) | 90.8 |
| 8 | Claude Sonnet 4.5 | 90.4 |
| 9 | GPT-5.4 Nano | 90.1 |
| 10 | Gemini 2.5 Pro | 89.3 |