WebDev Arena Leaderboard
WebDev Arena is an open-source benchmark evaluating AI capabilities in web development, developed by LMArena.
Leaderboard
Anthropic
Arena Score
1252.97
License
Proprietary
95% CI
+5.67 / -5.83
Votes
15,727
DeepSeek
Arena Score
1210.88
License
MIT
95% CI
+9.32 / -11.26
Votes
3,539
OpenAI
Arena Score
1161.38
License
Proprietary
95% CI
+32.58 / -35.18
Votes
342
Anthropic
Arena Score
1139.00
License
Proprietary
95% CI
+5.66 / -6.27
Votes
10,172
OpenAI
Arena Score
1110.15
License
Proprietary
95% CI
+8.54 / -9.42
Votes
3,310
Arena Score
1109.54
License
Proprietary
95% CI
+8.61 / -10.75
Votes
4,543
OpenAI
Arena Score
1053.81
License
Proprietary
95% CI
+7.09 / -6.05
Votes
8,376
OpenAI
Arena Score
1053.69
License
Proprietary
95% CI
+5.48 / -6.47
Votes
12,871
Arena Score
1038.53
License
Proprietary
95% CI
+16.84 / -17.27
Votes
1,064
Arena Score
1026.87
License
Proprietary
95% CI
+6.85 / -5.82
Votes
8,010
Arena Score
1025.29
License
Proprietary
95% CI
+4.69 / -5.56
Votes
12,099
Arena Score
986.97
License
Proprietary
95% CI
+6.83 / -5.59
Votes
14,482
Alibaba
Arena Score
981.29
License
Proprietary
95% CI
+11.67 / -12.81
Votes
2,702
DeepSeek
Arena Score
966.28
License
DeepSeek
95% CI
+7.92 / -5.88
Votes
7,478
OpenAI
Arena Score
964.00
License
Proprietary
95% CI
+4.91 / -5.70
Votes
13,838
Alibaba
Arena Score
904.14
License
Apache 2.0
95% CI
+6.04 / -5.58
Votes
12,497
Arena Score
894.65
License
Proprietary
95% CI
+7.08 / -5.73
Votes
12,155
Arena Score
813.69
License
Llama 3.1
95% CI
+19.07 / -14.97
Votes
1,117
More Statistics for WebDev Arena (Overall)
Confidence Interval for Model Strength
Figure 1
Average Win Rate Against All Other Models (Assuming Uniform Sampling and No Ties)
Figure 2
Fraction of Model A Wins for All Non-tied A vs. B Battles
Figure 3
Battle Count for Each Combination of Models (without Ties)
Figure 4