Thoughts on AI, technology, and the future we're building.
New posts every week
A provider-level view of who is strongest across public benchmark coverage, best-category wins, open/proprietary mix, fastest matched models, and value leaders.
Labs covered
23
From public benchmark rows
Top lab score
Anthropic
Avg rank 29.7
Most models
OpenAI
Largest tracked model set
Open-weight depth
DeepSeek
13 open models
Provider view
Average rank is computed across available Arena scores, so labs with fewer public rows should be read with coverage in mind.
| Rank | Lab | Models | Avg rank | Best model | Best value | Fastest |
|---|---|---|---|---|---|---|
| #1 | Anthropic Coverage 35% | 270 open | 29.7Best #1 | Claude Opus 4.6 Thinking | Claude Opus 4.6 Thinking | Claude Sonnet 4.6 |
| #2 | Xiaomi Coverage 53% | 53 open | 44.1Best #15 | MiMo V2.5 Pro | MiMo V2.5 Pro | MiMo V2 Flash |
| #3 | Moonshot AI Coverage 43% | 66 open | 44.2Best #8 | Kimi K2.6 | Kimi K2.6 | Kimi K2.5 Thinking |
| #4 | Z.ai Coverage 50% | 87 open | 47.2Best #6 | GLM 5.1 | GLM 5.1 | GLM 4.7 |
| #5 | ByteDance Coverage 22% | 10 open | 26Best #18 | Dola Seed 2.0 Pro | No price match | No latency match |
| #6 | MiniMax Coverage 53% | 44 open | 55Best #34 | Minimax M2.7 | Minimax M2.1 Preview | Minimax M2 |
| #7 | Google Coverage 32% | 222 open | 41.3Best #4 | Gemini 3.1 Pro Grounding | Gemini 3 Pro | Gemma 4 26b A4B |
| #8 | OpenAI Coverage 30% | 450 open | 39.4Best #1 | GPT-5.5 Search | GPT-5.5 Instant | GPT 4o 2024 08 06 |
| #9 | Perplexity Coverage 11% | 20 open | 25Best #24 | Ppl Sonar Pro High | No price match | No latency match |
| #10 | Alibaba Coverage 35% | 229 open | 54Best #4 | Qwen3.7 Max 20260517 | Qwen3.7 Plus Preview | Qwen3.5 122b A10B |
| #11 | xAI Coverage 23% | 150 open | 40.6Best #5 | Grok 4.20 Multi Agent Beta 0309 | Grok 4.3 | Grok 4.3 |
| #12 | Meta Coverage 18% | 30 open | 37Best #4 | Muse Spark | No price match | No latency match |
| #13 | Diffbot Coverage 11% | 11 open | 29Best #29 | Diffbot Small Xl | No price match | No latency match |
| #14 | IBM Coverage 44% | 11 open | 76Best #76 | Granite 4.1 8b | Granite 4.1 8b | Granite 4.1 8b |
| #15 | Inception AI Coverage 44% | 10 open | 78Best #78 | Mercury 2 | Mercury 2 | Mercury 2 |
| #16 | Baidu Coverage 13% | 50 open | 41.2Best #13 | ERNIE 5.1 | No price match | No latency match |
| #17 | DeepSeek Coverage 28% | 1313 open | 65.3Best #17 | DeepSeek V4 Pro Thinking | DeepSeek V4 Pro Thinking | DeepSeek V4 Flash |
| #18 | Meituan Coverage 11% | 10 open | 59Best #59 | Longcat Flash Chat 2602 Exp | No price match | No latency match |
| #19 | Mistral Coverage 24% | 84 open | 82.3Best #73 | Mistral Large 3 | Mistral Large 3 | Mistral Large 3 |
| #20 | Tencent Coverage 15% | 30 open | 72.5Best #50 | Hunyuan Hy3 Preview | No price match | No latency match |
| #21 | Amazon Coverage 11% | 20 open | 80Best #70 | Amazon Nova Experimental Chat 26 02 10 | No price match | No latency match |
| #22 | Stepfun Coverage 11% | 31 open | 82Best #75 | Step 1o Turbo 202506 | No price match | No latency match |
| #23 | Ai2 Coverage 11% | 11 open | 95Best #95 | Molmo 2 8b | No price match | No latency match |
Anthropic
27 models, 0 open
Avg rank
29.7
Coverage
35%
Xiaomi
5 models, 3 open
Avg rank
44.1
Coverage
53%
Moonshot AI
6 models, 6 open
Avg rank
44.2
Coverage
43%
Z.ai
8 models, 7 open
Avg rank
47.2
Coverage
50%
ByteDance
1 models, 0 open
Avg rank
26
Coverage
22%
MiniMax
4 models, 4 open
Avg rank
55
Coverage
53%
22 models, 2 open
Avg rank
41.3
Coverage
32%
OpenAI
45 models, 0 open
Avg rank
39.4
Coverage
30%
Perplexity
2 models, 0 open
Avg rank
25
Coverage
11%
Alibaba
22 models, 9 open
Avg rank
54
Coverage
35%
xAI
15 models, 0 open
Avg rank
40.6
Coverage
23%
Meta
3 models, 0 open
Avg rank
37
Coverage
18%
Diffbot
1 models, 1 open
Avg rank
29
Coverage
11%
IBM
1 models, 1 open
Avg rank
76
Coverage
44%
Inception AI
1 models, 0 open
Avg rank
78
Coverage
44%
Baidu
5 models, 0 open
Avg rank
41.2
Coverage
13%
DeepSeek
13 models, 13 open
Avg rank
65.3
Coverage
28%
Meituan
1 models, 0 open
Avg rank
59
Coverage
11%
Mistral
8 models, 4 open
Avg rank
82.3
Coverage
24%
Tencent
3 models, 0 open
Avg rank
72.5
Coverage
15%
Amazon
2 models, 0 open
Avg rank
80
Coverage
11%
Stepfun
3 models, 1 open
Avg rank
82
Coverage
11%
Ai2
1 models, 1 open
Avg rank
95
Coverage
11%
Benchmark guide
A quick reading key for provider-level comparisons, coverage, average rank, and open-weight portfolio signals.
Labs are ranked from available public Arena performance with a coverage adjustment, so broad benchmark coverage matters. A lab with one excellent score should not automatically outrank a lab with many strong model rows.
Average rank is computed across the Arena rows available for that lab's tracked models. It is useful for comparing portfolio strength, but it should be read alongside model count and coverage.
Open-weight counts show how much of a lab's tracked portfolio can plausibly be self-hosted or inspected outside a closed API. It is a portfolio signal, not a quality score by itself.
Small portfolios are more sensitive to one strong or weak model. The page keeps model count and coverage visible so lab comparisons are not reduced to a single rank number.