ARC-AGI has evolved from its first version (ARC-AGI-1) which measured basic fluid intelligence, to ARC-AGI-2 which challenges systems to demonstrate both high adaptability and high efficiency.
The scatter plot above visualizes the critical relationship between cost-per-task and performance - a key measure of intelligence efficiency. True intelligence isn't just about solving problems, but solving them efficiently with minimal resources.
For more information on our reporting process, see our testing policy.
| AI System | Organization | System Type | ARC-AGI-1 | ARC-AGI-2 | Cost/Task | Code / Paper |
|---|---|---|---|---|---|---|
| Human Panel | Human | N/A | 98.0% | 100.0% | $17.00 | — |
| Grok 4 (Thinking) | xAI | CoT | 66.7% | 16.0% | $2.17 | 📄 |
| GPT-5 (High) | OpenAI | CoT | 65.7% | 9.9% | $0.730 | — |
| Claude Opus 4 (Thinking 16K) | Anthropic | CoT | 35.7% | 8.6% | $1.93 | 📄 |
| GPT-5 (Medium) | OpenAI | CoT | 56.2% | 7.5% | $0.449 | — |
| o3 (High) | OpenAI | CoT | 60.8% | 6.5% | $0.834 | — |
| o4-mini (High) | OpenAI | CoT | 58.7% | 6.1% | $0.856 | — |
| Claude Sonnet 4 (Thinking 16K) | Anthropic | CoT | 40.0% | 5.9% | $0.486 | 📄 |
| o3-Pro (High) | OpenAI | CoT + Synthesis | 59.3% | 4.9% | $7.55 | 📄 |
| Gemini 2.5 Pro (Thinking 32K) | CoT | 37.0% | 4.9% | $0.757 | 📄 | |
| Claude Opus 4 (Thinking 8K) | Anthropic | CoT | 30.7% | 4.5% | $1.16 | 📄 |
| GPT-5 Mini (High) | OpenAI | CoT | 54.3% | 4.4% | $0.198 | — |
| Gemini 2.5 Pro (Thinking 16K) | CoT | 41.0% | 4.0% | $0.715 | 📄 | |
| GPT-5 Mini (Medium) | OpenAI | CoT | 37.3% | 4.0% | $0.063 | — |
| o3-preview (Low)* | OpenAI | CoT + Synthesis | 75.7% | 4.0% | $200.00 | 📄 |
| Gemini 2.5 Pro (Preview) | CoT | 33.0% | 3.8% | $0.813 | 📄 | |
| Gemini 2.5 Pro (Preview, Thinking 1K) | CoT | 31.3% | 3.4% | $0.804 | 📄 | |
| o3-mini (High) | OpenAI | CoT | 34.5% | 3.0% | $0.547 | — |
| o3 (Medium) | OpenAI | CoT | 53.8% | 3.0% | $0.479 | — |
| Gemini 2.5 Pro (Thinking 8K) | CoT | 29.5% | 2.9% | $0.444 | 📄 | |
| GPT-5 Nano (High) | OpenAI | CoT | 16.7% | 2.6% | $0.029 | — |
| Gemini 2.5 Flash (Preview) (Thinking 24K) | CoT | 32.3% | 2.5% | $0.319 | 📄💻 | |
| ARChitects | ARC Prize 2024 | Custom | 56.0% | 2.5% | $0.200 | 📄💻 |
| o4-mini (Medium) | OpenAI | CoT | 41.8% | 2.4% | $0.231 | — |
| Gemini 2.5 Flash (Preview) (Thinking 1K) | CoT | 16.0% | 2.2% | $0.030 | 📄💻 | |
| Gemini 2.5 Flash (Preview) (Thinking 8K) | CoT | 25.8% | 2.1% | $0.199 | 📄💻 | |
| Claude Sonnet 4 (Thinking 8K) | Anthropic | CoT | 29.0% | 2.1% | $0.265 | 📄 |
| o3-mini (Medium) | OpenAI | CoT | 22.3% | 2.1% | $0.284 | — |
| o3-Pro (Low) | OpenAI | CoT + Synthesis | 44.3% | 2.1% | $2.23 | 📄 |
| o3 (Low) | OpenAI | CoT | 41.5% | 2.0% | $0.234 | — |
| Gemini 2.5 Flash (Preview) (Thinking 16K) | CoT | 33.3% | 2.0% | $0.317 | 📄💻 | |
| o3-Pro (Medium) | OpenAI | CoT + Synthesis | 57.0% | 1.9% | $4.74 | 📄 |
| GPT-5 (Low) | OpenAI | CoT | 44.0% | 1.9% | $0.190 | — |
| Gemini 2.5 Flash (Preview) | CoT | 33.3% | 1.7% | $0.057 | 📄💻 | |
| o4-mini (Low) | OpenAI | CoT | 21.3% | 1.7% | $0.050 | — |
| GPT-5 Mini (Minimal) | OpenAI | CoT | 5.3% | 1.7% | $0.009 | — |
| Icecuber | ARC Prize 2024 | Custom | 17.0% | 1.6% | $0.130 | 💻 |
| Gemini 2.0 Flash | Base LLM | N/A | 1.3% | $0.004 | 💻 | |
| Deepseek R1 | Deepseek | CoT | 15.8% | 1.3% | $0.080 | 💻 |
| Codex Mini (Latest) | OpenAI | CoT | 27.3% | 1.3% | $0.230 | 📄 |
| Claude Sonnet 4 | Anthropic | Base LLM | 23.8% | 1.3% | $0.127 | 📄 |
| Claude Opus 4 | Anthropic | CoT | 22.5% | 1.3% | $0.639 | 📄 |
| o1 (Medium) | OpenAI | CoT | 30.7% | 1.3% | $2.61 | 📄💻 |
| Qwen3-235b-a22b Instruct (25/07) | Alibaba | Base LLM | 11.0% | 1.3% | $0.004 | 📄 |
| Deepseek R1 (05/28) | Deepseek | CoT | 21.2% | 1.1% | $0.053 | 📄 |
| o1-pro (Low) | OpenAI | CoT + Synthesis | 23.3% | 0.9% | $13.95 | 📄💻 |
| Claude 3.7 (8K) | Anthropic | CoT | 21.2% | 0.9% | $0.360 | 💻 |
| GPT-5 Nano (Medium) | OpenAI | CoT | 20.7% | 0.9% | $0.014 | — |
| Claude Sonnet 4 (Thinking 1K) | Anthropic | CoT | 28.0% | 0.9% | $0.142 | 📄 |
| o1-mini | OpenAI | CoT | 14.0% | 0.8% | $0.191 | 📄💻 |
| o1 (Low) | OpenAI | CoT | 27.2% | 0.8% | $1.47 | 📄💻 |
| GPT-5 Mini (Low) | OpenAI | CoT | 26.3% | 0.8% | $0.019 | — |
| Gemini 1.5 Pro | Base LLM | N/A | 0.8% | $0.040 | 💻 | |
| GPT-4.5 | OpenAI | Base LLM | 10.3% | 0.8% | $2.10 | 💻 |
| Claude 3.7 (16K) | Anthropic | CoT | 28.6% | 0.7% | $0.510 | 💻 |
| GPT-4.1 | OpenAI | Base LLM | 5.5% | 0.4% | $0.069 | 📄💻 |
| Grok 3 Mini (Low) | xAI | CoT | 16.5% | 0.4% | $0.013 | 📄 |
| Claude 3.7 (1K) | Anthropic | CoT | 11.6% | 0.4% | $0.140 | 💻 |
| Claude 3.7 | Anthropic | Base LLM | 13.6% | 0.0% | $0.120 | 💻 |
| GPT-4o | OpenAI | Base LLM | 4.5% | 0.0% | $0.080 | 💻 |
| GPT-4o-mini | OpenAI | Base LLM | N/A | 0.0% | $0.010 | 💻 |
| Avg. Mturker | Human | N/A | 77.0% | N/A | $3.00 | — |
| Stem Grad | Human | N/A | 98.0% | N/A | $10.00 | — |
| Llama 4 Maverick | Meta | Base LLM | 4.4% | 0.0% | $0.012 | 📄💻 |
| Llama 4 Scout | Meta | Base LLM | 0.5% | 0.0% | $0.006 | 📄💻 |
| GPT-4.1-Nano | OpenAI | Base LLM | 0.0% | 0.0% | $0.004 | 📄💻 |
| GPT-4.1-Mini | OpenAI | Base LLM | 3.5% | 0.0% | $0.014 | 📄💻 |
| o3-mini (Low) | OpenAI | CoT | 14.5% | 0.0% | $0.062 | — |
| o1-preview | OpenAI | CoT | 18.0% | N/A | $1.64 | 📄 |
| Claude Opus 4 (Thinking 1K) | Anthropic | CoT | 27.0% | 0.0% | $0.750 | 📄 |
| Grok 3 | xAI | Base LLM | 5.5% | 0.0% | $0.142 | 📄 |
| Magistral Small | Mistral | CoT | 5.0% | 0.0% | $0.049 | 📄 |
| Magistral Medium | Mistral | CoT | 5.9% | 0.0% | $0.108 | 📄 |
| Magistral Medium (Thinking) | Mistral | CoT | 6.1% | 0.0% | $0.123 | 📄 |
| Gemini 2.5 Pro (Thinking 1K) | CoT | 16.0% | 0.0% | $0.088 | 📄 | |
| GPT-5 (Minimal) | OpenAI | Base LLM | 6.0% | 0.0% | $0.056 | — |
| GPT-5 Nano (Low) | OpenAI | CoT | 4.0% | 0.0% | $0.003 | — |
| GPT-5 Nano (Minimal) | OpenAI | CoT | 1.5% | 0.0% | $0.003 | — |
* ARC-AGI-2 score estimate based on partial testing results and o1-pro pricing.
* * Preview results: Results marked as preview are unofficial and may be based on incomplete testing. Models without available pricing information will not be shown on the efficiency chart. Results become official after complete testing is finished.