WebDev Arena Leaderboard

WebDev Arena is an open-source benchmark evaluating AI capabilities in web development, developed by LMArena.

Leaderboard

Claude 3.5 Sonnet (20241022)

Anthropic

Arena Score

1252.97

License

Proprietary

95% CI

+5.67 / -5.83

Votes

15,727

DeepSeek-R1

DeepSeek

Arena Score

1210.88

License

MIT

95% CI

+9.32 / -11.26

Votes

3,539

o3-mini-high (20250131)

OpenAI

Arena Score

1161.38

License

Proprietary

95% CI

+32.58 / -35.18

Votes

342

Claude 3.5 Haiku (20241022)

Anthropic

Arena Score

1139.00

License

Proprietary

95% CI

+5.66 / -6.27

Votes

10,172

o3-mini (20250131)

OpenAI

Arena Score

1110.15

License

Proprietary

95% CI

+8.54 / -9.42

Votes

3,310

Gemini-2.0-Pro-Exp-02-05

Google

Arena Score

1109.54

License

Proprietary

95% CI

+8.61 / -10.75

Votes

4,543

o1 (20241217)

OpenAI

Arena Score

1053.81

License

Proprietary

95% CI

+7.09 / -6.05

Votes

8,376

o1-mini (20240912)

OpenAI

Arena Score

1053.69

License

Proprietary

95% CI

+5.48 / -6.47

Votes

12,871

Gemini-2.0-Flash-Thinking-01-21

Google

Arena Score

1038.53

License

Proprietary

95% CI

+16.84 / -17.27

Votes

1,064

Gemini-2.0-Flash-Thinking-1219

Google

Arena Score

1026.87

License

Proprietary

95% CI

+6.85 / -5.82

Votes

8,010

Gemini-Exp-1206

Google

Arena Score

1025.29

License

Proprietary

95% CI

+4.69 / -5.56

Votes

12,099

Gemini-2.0-Flash-Exp

Google

#12

Arena Score

986.97

License

Proprietary

95% CI

+6.83 / -5.59

Votes

14,482

Qwen2.5-Max

Alibaba

#12

Arena Score

981.29

License

Proprietary

95% CI

+11.67 / -12.81

Votes

2,702

DeepSeek-V3

DeepSeek

#13

Arena Score

966.28

License

DeepSeek

95% CI

+7.92 / -5.88

Votes

7,478

GPT-4o-2024-11-20

OpenAI

#13

Arena Score

964.00

License

Proprietary

95% CI

+4.91 / -5.70

Votes

13,838

Qwen2.5-Coder-32B-Instruct

Alibaba

#16

Arena Score

904.14

License

Apache 2.0

95% CI

+6.04 / -5.58

Votes

12,497

Gemini-1.5-Pro-002

Google

#16

Arena Score

894.65

License

Proprietary

95% CI

+7.08 / -5.73

Votes

12,155

Llama-3.1-405B-Instruct

Meta

#18

Arena Score

813.69

License

Llama 3.1

95% CI

+19.07 / -14.97

Votes

1,117

Rank (UB)	Model	Arena Score	95% CI	Votes	Organization	License
1	Claude 3.5 Sonnet (20241022)	1252.97	+5.67 / -5.83	15,727	Anthropic	Proprietary
2	DeepSeek-R1	1210.88	+9.32 / -11.26	3,539	DeepSeek	MIT
3	o3-mini-high (20250131)	1161.38	+32.58 / -35.18	342	OpenAI	Proprietary
3	Claude 3.5 Haiku (20241022)	1139.00	+5.66 / -6.27	10,172	Anthropic	Proprietary
5	o3-mini (20250131)	1110.15	+8.54 / -9.42	3,310	OpenAI	Proprietary
5	Gemini-2.0-Pro-Exp-02-05	1109.54	+8.61 / -10.75	4,543	Google	Proprietary
7	o1 (20241217)	1053.81	+7.09 / -6.05	8,376	OpenAI	Proprietary
7	o1-mini (20240912)	1053.69	+5.48 / -6.47	12,871	OpenAI	Proprietary
7	Gemini-2.0-Flash-Thinking-01-21	1038.53	+16.84 / -17.27	1,064	Google	Proprietary
9	Gemini-2.0-Flash-Thinking-1219	1026.87	+6.85 / -5.82	8,010	Google	Proprietary
9	Gemini-Exp-1206	1025.29	+4.69 / -5.56	12,099	Google	Proprietary
12	Gemini-2.0-Flash-Exp	986.97	+6.83 / -5.59	14,482	Google	Proprietary
12	Qwen2.5-Max	981.29	+11.67 / -12.81	2,702	Alibaba	Proprietary
13	DeepSeek-V3	966.28	+7.92 / -5.88	7,478	DeepSeek	DeepSeek
13	GPT-4o-2024-11-20	964.00	+4.91 / -5.70	13,838	OpenAI	Proprietary
16	Qwen2.5-Coder-32B-Instruct	904.14	+6.04 / -5.58	12,497	Alibaba	Apache 2.0
16	Gemini-1.5-Pro-002	894.65	+7.08 / -5.73	12,155	Google	Proprietary
18	Llama-3.1-405B-Instruct	813.69	+19.07 / -14.97	1,117	Meta	Llama 3.1

More Statistics for WebDev Arena (Overall)

Confidence Interval for Model Strength

Figure 1

Average Win Rate Against All Other Models (Assuming Uniform Sampling and No Ties)

Figure 2

Fraction of Model A Wins for All Non-tied A vs. B Battles

Figure 3

Battle Count for Each Combination of Models (without Ties)

Figure 4

Join the LMArena community

WebDev Arena Leaderboard

Leaderboard

More Statistics for WebDev Arena (Overall)

Confidence Interval for Model Strength

Average Win Rate Against All Other Models (Assuming Uniform Sampling and No Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles

Battle Count for Each Combination of Models (without Ties)

WebDev Arena User Agreement