AGI remains unsolved.
New ideas still needed.

ARC-AGI Leaderboard

$1e-3$0.01$0.10$1$10$100$1K0%10%20%30%40%50%60%70%80%90%100%ARC Prize - Grand PrizeHuman PanelHuman PanelARChitectsARChitectsClaude 3.7Claude 3.7Claude 3.7 (16K)Claude 3.7 (16K)Claude 3.7 (1K)Claude 3.7 (1K)Claude 3.7 (8K)Claude 3.7 (8K)Gemini 1.5 ProGemini 2.0 FlashGPT-4.5GPT-4.5GPT-4oGPT-4oGPT-4o-miniIcecuberIcecuberAvg. Mturkero3-preview (Low)*o3-preview (Low)*Deepseek R1Deepseek R1Stem GradLlama 4 MaverickLlama 4 MaverickLlama 4 ScoutLlama 4 ScoutGPT-4.1-NanoGPT-4.1-NanoGPT-4.1-MiniGPT-4.1-MiniGPT-4.1GPT-4.1o1-minio1-minio1 (Low)o1 (Low)o1 (Medium)o1 (Medium)o1-pro (Low)o1-pro (Low)o3-mini (Low)o3-mini (Low)o3-mini (Medium)o3-mini (Medium)o3-mini (High)o3-mini (High)o3 (Low)o3 (Low)o3 (Medium)o3 (Medium)o3 (High)o3 (High)o4-mini (Low)o4-mini (Low)o4-mini (Medium)o4-mini (Medium)o4-mini (High)o4-mini (High)o1-previewGemini 2.5 Flash (Preview)Gemini 2.5 Flash (Preview)Gemini 2.5 Flash (Preview) (Thinking 1K)Gemini 2.5 Flash (Preview) (Thinking 1K)Gemini 2.5 Flash (Preview) (Thinking 8K)Gemini 2.5 Flash (Preview) (Thinking 8K)Gemini 2.5 Flash (Preview) (Thinking 16K)Gemini 2.5 Flash (Preview) (Thinking 16K)Gemini 2.5 Flash (Preview) (Thinking 24K)Gemini 2.5 Flash (Preview) (Thinking 24K)Codex Mini (Latest)Codex Mini (Latest)Claude Sonnet 4Claude Sonnet 4Claude Sonnet 4 (Thinking 1K)Claude Sonnet 4 (Thinking 1K)Claude Sonnet 4 (Thinking 8K)Claude Sonnet 4 (Thinking 8K)Claude Sonnet 4 (Thinking 16K)Claude Sonnet 4 (Thinking 16K)Claude Opus 4 (Thinking 16K)Claude Opus 4 (Thinking 16K)Claude Opus 4 (Thinking 8K)Claude Opus 4 (Thinking 8K)Claude Opus 4 (Thinking 1K)Claude Opus 4 (Thinking 1K)Claude Opus 4Claude Opus 4Deepseek R1 (05/28)Deepseek R1 (05/28)Grok 3Grok 3Grok 3 Mini (Low)Grok 3 Mini (Low)o3-Pro (Low)o3-Pro (Low)o3-Pro (Medium)o3-Pro (Medium)o3-Pro (High)o3-Pro (High)Magistral SmallMagistral SmallMagistral MediumMagistral MediumMagistral Medium (Thinking)Magistral Medium (Thinking)Gemini 2.5 Pro (Thinking 1K)Gemini 2.5 Pro (Thinking 1K)Gemini 2.5 Pro (Thinking 8K)Gemini 2.5 Pro (Thinking 8K)Gemini 2.5 Pro (Thinking 16K)Gemini 2.5 Pro (Thinking 16K)Gemini 2.5 Pro (Thinking 32K)Gemini 2.5 Pro (Thinking 32K)Grok 4 (Thinking)Grok 4 (Thinking)Qwen3-235b-a22b Instruct (25/07)Qwen3-235b-a22b Instruct (25/07)GPT-5 (High)GPT-5 (High)GPT-5 (Medium)GPT-5 (Medium)GPT-5 (Low)GPT-5 (Low)GPT-5 (Low)GPT-5 (Minimal)GPT-5 (Minimal)GPT-5 Mini (High)GPT-5 Mini (High)GPT-5 Mini (Medium)GPT-5 Mini (Medium)GPT-5 Mini (Low)GPT-5 Mini (Low)GPT-5 Mini (Minimal)GPT-5 Mini (Minimal)GPT-5 Nano (High)GPT-5 Nano (High)GPT-5 Nano (Medium)GPT-5 Nano (Medium)GPT-5 Nano (Low)GPT-5 Nano (Low)GPT-5 Nano (Minimal)GPT-5 Nano (Minimal)COST PER TASK ($)SCORE (%)
Data:
All
ARC-AGI-1
ARC-AGI-2
Color By:
ARC-AGI Version
Model Provider
Model Type
Model provider:
ARC Prize 2024
Alibaba
Anthropic
Deepseek
Google
Meta
Mistral
OpenAI
xAI
Model type:
Base LLM
CoT
CoT + Synthesis
Custom
Only systems which required less than $10,000 to run are shown. Notably missing from this chart is o3 (high compute). Learn more
For models that were not able to produce full test out puts, remaining tasks were marked as incorrect.

Understanding the Leaderboard

ARC-AGI has evolved from its first version (ARC-AGI-1) which measured basic fluid intelligence, to ARC-AGI-2 which challenges systems to demonstrate both high adaptability and high efficiency.

The scatter plot above visualizes the critical relationship between cost-per-task and performance - a key measure of intelligence efficiency. True intelligence isn't just about solving problems, but solving them efficiently with minimal resources.

Interpreting the data

For more information on our reporting process, see our testing policy.

Leaderboard Breakdown

AI SystemOrganizationSystem TypeARC-AGI-1ARC-AGI-2Cost/TaskCode / Paper
Human PanelHumanN/A98.0%100.0%$17.00
Grok 4 (Thinking)xAICoT66.7%16.0%$2.17📄
GPT-5 (High)OpenAICoT65.7%9.9%$0.730
Claude Opus 4 (Thinking 16K)AnthropicCoT35.7%8.6%$1.93📄
GPT-5 (Medium)OpenAICoT56.2%7.5%$0.449
o3 (High)OpenAICoT60.8%6.5%$0.834
o4-mini (High)OpenAICoT58.7%6.1%$0.856
Claude Sonnet 4 (Thinking 16K)AnthropicCoT40.0%5.9%$0.486📄
o3-Pro (High)OpenAICoT + Synthesis59.3%4.9%$7.55📄
Gemini 2.5 Pro (Thinking 32K)GoogleCoT37.0%4.9%$0.757📄
Claude Opus 4 (Thinking 8K)AnthropicCoT30.7%4.5%$1.16📄
GPT-5 Mini (High)OpenAICoT54.3%4.4%$0.198
Gemini 2.5 Pro (Thinking 16K)GoogleCoT41.0%4.0%$0.715📄
GPT-5 Mini (Medium)OpenAICoT37.3%4.0%$0.063
o3-preview (Low)*OpenAICoT + Synthesis75.7%4.0%$200.00📄
Gemini 2.5 Pro (Preview)GoogleCoT33.0%3.8%$0.813📄
Gemini 2.5 Pro (Preview, Thinking 1K)GoogleCoT31.3%3.4%$0.804📄
o3-mini (High)OpenAICoT34.5%3.0%$0.547
o3 (Medium)OpenAICoT53.8%3.0%$0.479
Gemini 2.5 Pro (Thinking 8K)GoogleCoT29.5%2.9%$0.444📄
GPT-5 Nano (High)OpenAICoT16.7%2.6%$0.029
Gemini 2.5 Flash (Preview) (Thinking 24K)GoogleCoT32.3%2.5%$0.319📄💻
ARChitectsARC Prize 2024Custom56.0%2.5%$0.200📄💻
o4-mini (Medium)OpenAICoT41.8%2.4%$0.231
Gemini 2.5 Flash (Preview) (Thinking 1K)GoogleCoT16.0%2.2%$0.030📄💻
Gemini 2.5 Flash (Preview) (Thinking 8K)GoogleCoT25.8%2.1%$0.199📄💻
Claude Sonnet 4 (Thinking 8K)AnthropicCoT29.0%2.1%$0.265📄
o3-mini (Medium)OpenAICoT22.3%2.1%$0.284
o3-Pro (Low)OpenAICoT + Synthesis44.3%2.1%$2.23📄
o3 (Low)OpenAICoT41.5%2.0%$0.234
Gemini 2.5 Flash (Preview) (Thinking 16K)GoogleCoT33.3%2.0%$0.317📄💻
o3-Pro (Medium)OpenAICoT + Synthesis57.0%1.9%$4.74📄
GPT-5 (Low)OpenAICoT44.0%1.9%$0.190
Gemini 2.5 Flash (Preview)GoogleCoT33.3%1.7%$0.057📄💻
o4-mini (Low)OpenAICoT21.3%1.7%$0.050
GPT-5 Mini (Minimal)OpenAICoT5.3%1.7%$0.009
IcecuberARC Prize 2024Custom17.0%1.6%$0.130💻
Gemini 2.0 FlashGoogleBase LLMN/A1.3%$0.004💻
Deepseek R1DeepseekCoT15.8%1.3%$0.080💻
Codex Mini (Latest)OpenAICoT27.3%1.3%$0.230📄
Claude Sonnet 4AnthropicBase LLM23.8%1.3%$0.127📄
Claude Opus 4AnthropicCoT22.5%1.3%$0.639📄
o1 (Medium)OpenAICoT30.7%1.3%$2.61📄💻
Qwen3-235b-a22b Instruct (25/07)AlibabaBase LLM11.0%1.3%$0.004📄
Deepseek R1 (05/28)DeepseekCoT21.2%1.1%$0.053📄
o1-pro (Low)OpenAICoT + Synthesis23.3%0.9%$13.95📄💻
Claude 3.7 (8K)AnthropicCoT21.2%0.9%$0.360💻
GPT-5 Nano (Medium)OpenAICoT20.7%0.9%$0.014
Claude Sonnet 4 (Thinking 1K)AnthropicCoT28.0%0.9%$0.142📄
o1-miniOpenAICoT14.0%0.8%$0.191📄💻
o1 (Low)OpenAICoT27.2%0.8%$1.47📄💻
GPT-5 Mini (Low)OpenAICoT26.3%0.8%$0.019
Gemini 1.5 ProGoogleBase LLMN/A0.8%$0.040💻
GPT-4.5OpenAIBase LLM10.3%0.8%$2.10💻
Claude 3.7 (16K)AnthropicCoT28.6%0.7%$0.510💻
GPT-4.1OpenAIBase LLM5.5%0.4%$0.069📄💻
Grok 3 Mini (Low)xAICoT16.5%0.4%$0.013📄
Claude 3.7 (1K)AnthropicCoT11.6%0.4%$0.140💻
Claude 3.7AnthropicBase LLM13.6%0.0%$0.120💻
GPT-4oOpenAIBase LLM4.5%0.0%$0.080💻
GPT-4o-miniOpenAIBase LLMN/A0.0%$0.010💻
Avg. MturkerHumanN/A77.0%N/A$3.00
Stem GradHumanN/A98.0%N/A$10.00
Llama 4 MaverickMetaBase LLM4.4%0.0%$0.012📄💻
Llama 4 ScoutMetaBase LLM0.5%0.0%$0.006📄💻
GPT-4.1-NanoOpenAIBase LLM0.0%0.0%$0.004📄💻
GPT-4.1-MiniOpenAIBase LLM3.5%0.0%$0.014📄💻
o3-mini (Low)OpenAICoT14.5%0.0%$0.062
o1-previewOpenAICoT18.0%N/A$1.64📄
Claude Opus 4 (Thinking 1K)AnthropicCoT27.0%0.0%$0.750📄
Grok 3xAIBase LLM5.5%0.0%$0.142📄
Magistral SmallMistralCoT5.0%0.0%$0.049📄
Magistral MediumMistralCoT5.9%0.0%$0.108📄
Magistral Medium (Thinking)MistralCoT6.1%0.0%$0.123📄
Gemini 2.5 Pro (Thinking 1K)GoogleCoT16.0%0.0%$0.088📄
GPT-5 (Minimal)OpenAIBase LLM6.0%0.0%$0.056
GPT-5 Nano (Low)OpenAICoT4.0%0.0%$0.003
GPT-5 Nano (Minimal)OpenAICoT1.5%0.0%$0.003

* ARC-AGI-2 score estimate based on partial testing results and o1-pro pricing.

* * Preview results: Results marked as preview are unofficial and may be based on incomplete testing. Models without available pricing information will not be shown on the efficiency chart. Results become official after complete testing is finished.

Toggle Animation