lmarena.ai

1,855 posts
Square profile picture and Opens profile photo
lmarena.ai
@lmarena_ai
LMArena: Open Platform for Community-driven AI Benchmarking. Graduated from UC Berkeley / . We’re hiring: lmarena.ai/jobs
USlmarena.aiJoined March 2023

lmarena.ai’s posts

Pinned
Square profile picture
The NEW LMArena is officially live! 🎉 ✨ New Logo! ⚡️ Better, faster UI/UX for chat and leaderboard 📱 Mobile optimized 💬 Chat history 🧭 Clearer leaderboard navigation 🤖 Many modalities in one place: vision, image, and more coming soon Try it now at lmarena dot ai! (Link in
0:46
Square profile picture
✨ NEW: Feature update: 🖼️ Image edit models now support multi-turn editing! Instead of trying to fit every edit into one mega-prompt, you can now refine your image step by step. Like a natural back-and-forth conversation. Do it in Battle mode, Side by Side or Direct. Just
0:04 / 0:21
Square profile picture
Create and iterate image generations with the community's favorite models like Gemini-2.5-Flash-Image-Preview aka "nano banana" 🍌! You can now help answer the question: which image gen AI handles multi-turn editing best? Jump in to test, compare, and vote ▶️
Square profile picture
🚨 Leaderboard Disrupted! Two new models have entered the Top 10 Text leaderboard: 🔸#6 Qwen3-max-preview (Proprietary) by 🔸#8 Kimi-K2-0905-preview (Modified MIT) by tied with 7 others. Note that this puts Kimi-K2-0905-preview in a tight race for
Image
Square profile picture
Learn more about Qwen3-max-preview here:
Quote
Qwen
@Alibaba_Qwen
Big news: Introducing Qwen3-Max-Preview (Instruct) — our biggest model yet, with over 1 trillion parameters! 🚀 Now available via Qwen Chat & Alibaba Cloud API. Benchmarks show it beats our previous best, Qwen3-235B-A22B-2507. Internal tests + early user feedback confirm:
Show more
Image
Square profile picture
🚨🍌Breaking News: Gemini-2.5-Flash-Image-Preview (“nano-banana”) by now ranks #1 in Image Edit Arena. In just two weeks: 🟡“nano-banana” has driven over 5 million community votes in the Arena 🟡Record-breaking 2.5M+ votes casted for this model alone 🟡It has
A table ranking AI models in Image Edit Arena. Gemini-2.5-Flash-Image-Preview, labeled "nano-banana," is listed first with an Elo score of 1392. Other models like gpt-4-turbo and qwen-image-v1 are ranked below. The table includes columns for rank, model name, score, votes, organization, and license.
Quote
Google DeepMind
@GoogleDeepMind
Embedded video
0:18
Image generation with Gemini just got a bananas upgrade and is the new state-of-the-art image generation and editing model. 🤯 From photorealistic masterpieces to mind-bending fantasy worlds, you can now natively produce, edit and refine visuals with new levels of reasoning,
Square profile picture
🚨🍌Big Reveal: who was "Nano Banana?" The anonymous model, “nano-banana,” that caught the world's attention with its ability to follow complex instructions, preserve character identity, and maintain contextual details was: Gemini-2.5-Flash-Image-Preview by 🍌✨
0:01 / 0:42
Quote
Google DeepMind
@GoogleDeepMind
Embedded video
0:18
Image generation with Gemini just got a bananas upgrade and is the new state-of-the-art image generation and editing model. 🤯 From photorealistic masterpieces to mind-bending fantasy worlds, you can now natively produce, edit and refine visuals with new levels of reasoning,
Square profile picture
🌐 Our first open model has landed on the Search leaderboard! Diffbot-small-xl by debuts at #9 (Apache 2.0) We look forward to more models with search capabilities contributing to ecosystem progress!
A leaderboard graphic displaying rankings and scores for various AI models. Columns include rank, model name, score, and license type. Diffbot-small-xl is listed at rank 9 with a score of 2,560 and Apache 2.0 license. The lmarena.ai logo is at the top left.
Square profile picture
Quick reminder that we have turned off style control by default from Search Arena. The effects of response style are different for search than they are for ordinary chat so we'll be defaulting this leaderboard to ordinary Bradley-Terry while we study search specific style
Square profile picture
🚨 Top 10 Leaderboard Disrupted ⚡ DeepSeek V3.1 and DeepSeek v3.1 thinking by have landed in the Arena, both ranked at #8. A few highlights: 💠 DeepSeek V3.1 is in the Top 3 for Math, Creative Writing & Longer Query 💠 DeepSeek V3.1 thinking comes in #3 for
A table listing a leaderboard with model names, scores, and ranks. DeepSeek V3.1 and DeepSeek V3.1 thinking are highlighted at rank #8. Columns include rank, model, score, wins, losses, ties, organization, and license. A watermark from lmarena.ai is present.
Quote
DeepSeek
@deepseek_ai
Introducing DeepSeek-V3.1: our first step toward the agent era! 🚀 🧠 Hybrid inference: Think & Non-Think — one model, two modes ⚡️ Faster thinking: DeepSeek-V3.1-Think reaches answers in less time vs. DeepSeek-R1-0528 🛠️ Stronger agent skills: Post-training boosts tool use and
Square profile picture
The new open models by (MIT license) are tied with Grok-4, Kimi K2, Claude Opus 4, Qwen 3-235b-a22b-Instruct, and even it's sibling—DeepSeek R1—making this an incredibly competitive race. 🏎️ 💨
Image
hello, world!
Quote
lmarena.ai
@lmarena_ai
🚨Text Leaderboard Update: A new model provider, @MicrosoftAI has broken into the Top 15 this week! 💠MAI-1-preview by @MicrosoftAI debuts at #13. Congrats to the Microsoft AI team! As the Text Arena is one of the most competitive races, breaking into the Top 15 is no small x.com/mustafasuleyma…
Show more
Image
Square profile picture
🚨Text Leaderboard Update: A new model provider, has broken into the Top 15 this week! 💠MAI-1-preview by debuts at #13. Congrats to the Microsoft AI team! As the Text Arena is one of the most competitive races, breaking into the Top 15 is no small
A table listing AI model rankings with columns for rank, model, score, win rate, and license. Microsoft AI\'s MAI-1-preview is highlighted at rank 13 with a score of 1402, win rate of 40%, and proprietary license. The lmarena.ai logo is at the top left.
Quote
Mustafa Suleyman
@mustafasuleyman
Replying to @mustafasuleyman
Introducing MAI-1-preview - our first foundation model trained end to end in house - in public testing on LMArena - we’re excited to be actively spinning the flywheel to deliver improved models
Square profile picture
🌐GPT-5 by and Claude Opus 4.1 by have just debuted on the Search Leaderboard. Both came into the Arena strong across categories in the Text Arena, and we've been waiting to see what the community thinks about their search capabilities. Here's how they've
A leaderboard table ranking AI models by search capabilities. The table lists ranks, model names, scores, votes, organizations, and licenses. GPT-5-search by OpenAI is ranked #1, tying with o3-search. Claude Opus 4.1-search by Anthropic is ranked #2, tying with gemini-2.5-pro-grounding and grok-4-search. The lmarena.ai logo is at the top left.