Thoughts on AI, technology, and the future we're building.

New posts every week

HomeAll PostsAI NewsAI Basics
Timelines
ChatGPTOpenAI release historyAnthropic ClaudeClaude release historyGoogle GeminiGemini release history
Benchmarks
OverviewFull model trackerValue RankingsPerformance for the moneyCoding RankingsSWE-bench and code signalsAgent RankingsTool and workflow signalsReasoning RankingsKnowledge and reasoningLong ContextDocument and retrieval signalsLab ComparisonsProvider-level rankings
CategoriesAboutContact

Subscribe to Newsletter

Practical AI news, tips, tricks, tool analysis, sent straight to your inbox.

No spam. Unsubscribe anytime.

Practical explainers, tool notes, and systems thinking for people turning new AI capability into useful work.

Explore

  • All Posts
  • Categories
  • About
  • Contact

Categories

  • AI News
  • AI Basics
  • ChatGPT
  • Anthropic
  • AI Tools
  • AI Video
  • AI Images
  • Courses

Connect

LinkedInTwitterRSS

© 2026. All rights reserved.

Benchmark suiteUpdated May 27, 2026

Reasoning and knowledge rankings.

A clean reasoning surface that starts with available Arena signals and keeps LiveBench, HELM Capabilities, MMLU-Pro, GPQA, and AIME-style math benchmarks visible as planned source integrations.

What These Mean

Ranked models

152

Text, docs, or vision signal

Reasoning variants

41

Models tagged from public names

Top proxy

Claude Opus 4.6 Thinking

99 reasoning proxy

LiveBench

Planned

Fresh reasoning feed

OverviewFull public trackerValuePerformance for the moneyCodingCode and SWE-bench signalsAgentsTool and workflow readinessReasoningKnowledge and reasoning signalsContextDocument and retrieval signalsLabsProvider comparisons

Reasoning formula

A proxy until dedicated reasoning feeds are wired.

The current index blends Text, Docs, and Vision Arena scores. It is useful for broad comparison, but dedicated LiveBench, GPQA, AIME, MMLU-Pro, and HELM signals should be treated as the next data layer.

Reasoning and knowledge proxy

Higher proxy index is better. Dedicated reasoning benchmarks are source-ready but not silently mixed in yet.

Top 30 of 152

RankModelIndexTextDocsVisionModeSources
#1
Claude Opus 4.6 Thinking

claude-opus-4-6-thinking

AnthropicProprietary
99reasoning proxy
1,502Text Arena rank #1
Not listedDocs Arena
1,300Vision Arena rank #3
Reasoningmode signal
Text ArenaDocs ArenaVision Arena
#2
Claude Opus 4.7 Thinking

claude-opus-4-7-thinking

AnthropicProprietary
99reasoning proxy
1,500Text Arena rank #2
Not listedDocs Arena
1,306Vision Arena rank #1
Reasoningmode signal
Text ArenaDocs ArenaVision Arena
#3
Claude Opus 4.6

claude-opus-4-6

AnthropicProprietary
95reasoning proxy
1,498Text Arena rank #3
Not listedDocs Arena
1,293Vision Arena rank #5
Standardmode signal
Text ArenaDocs ArenaVision Arena
#4
Claude Opus 4.7

claude-opus-4-7

AnthropicProprietary
94reasoning proxy
1,494Text Arena rank #4
Not listedDocs Arena
1,304Vision Arena rank #2
Standardmode signal
Text ArenaDocs ArenaVision Arena
#5
Muse Spark

muse-spark

MetaProprietary
89reasoning proxy
1,489Text Arena rank #5
Not listedDocs Arena
1,296Vision Arena rank #4
Standardmode signal
Text ArenaDocs ArenaVision Arena
#6
Gemini 3 Pro

gemini-3-pro

GoogleProprietary
86reasoning proxy
1,486Text Arena rank #7
Not listedDocs Arena
1,289Vision Arena rank #6
Standardmode signal
Text ArenaDocs ArenaVision Arena
#7
Gemini 3.1 Pro Preview

gemini-3.1-pro-preview

GoogleProprietary
85reasoning proxy
1,487Text Arena rank #6
Not listedDocs Arena
1,277Vision Arena rank #10
Standardmode signal
Text ArenaDocs ArenaVision Arena
#8
GPT-5.5 High

gpt-5.5-high

OpenAIProprietary
81reasoning proxy
1,482Text Arena rank #8
Not listedDocs Arena
1,278Vision Arena rank #9
Reasoningmode signal
Text ArenaDocs ArenaVision Arena
#9
GPT-5.4 High

gpt-5.4-high

OpenAIProprietary
80reasoning proxy
1,480Text Arena rank #9
Not listedDocs Arena
1,277Vision Arena rank #11
Reasoningmode signal
Text ArenaDocs ArenaVision Arena
#10
Qwen3.7 Plus Preview

qwen3.7-plus-preview

AlibabaProprietary
79reasoning proxy
Not listedText Arena
Not listedDocs Arena
1,260Vision Arena rank #16
Standardmode signal
Text ArenaDocs ArenaVision Arena
#11
GPT-5.5

gpt-5.5

OpenAIProprietary
79reasoning proxy
1,476Text Arena rank #11
Not listedDocs Arena
1,288Vision Arena rank #7
Standardmode signal
Text ArenaDocs ArenaVision Arena
#12
GPT-5.2 Chat Latest 20260210

gpt-5.2-chat-latest-20260210

OpenAIProprietary
77reasoning proxy
1,476Text Arena rank #12
Not listedDocs Arena
1,280Vision Arena rank #8
Standardmode signal
Text ArenaDocs ArenaVision Arena
#13
GPT-5.5 Instant

gpt-5.5-instant

OpenAIProprietary
75reasoning proxy
1,474Text Arena rank #17
Not listedDocs Arena
1,275Vision Arena rank #13
Standardmode signal
Text ArenaDocs ArenaVision Arena
#14
Gemini 3.5 Flash

gemini-3.5-flash

GoogleProprietary
75reasoning proxy
1,479Text Arena rank #10
Not listedDocs Arena
Not listedVision Arena
Standardmode signal
Text ArenaDocs ArenaVision Arena
#15
Claude Sonnet 4.6

claude-sonnet-4-6

AnthropicProprietary
72reasoning proxy
1,470Text Arena rank #22
Not listedDocs Arena
1,275Vision Arena rank #12
Standardmode signal
Text ArenaDocs ArenaVision Arena
#16
Grok 4.20 Beta 0309 Reasoning

grok-4.20-beta-0309-reasoning

xAIProprietary
72reasoning proxy
1,475Text Arena rank #14
Not listedDocs Arena
1,251Vision Arena rank #21
Reasoningmode signal
Text ArenaDocs ArenaVision Arena
#17
Grok 4.20 Beta 1

grok-4.20-beta1

xAIProprietary
71reasoning proxy
1,476Text Arena rank #13
Not listedDocs Arena
Not listedVision Arena
Standardmode signal
Text ArenaDocs ArenaVision Arena
#18
GPT-5.4

gpt-5.4

OpenAIProprietary
71reasoning proxy
1,469Text Arena rank #23
Not listedDocs Arena
1,269Vision Arena rank #15
Standardmode signal
Text ArenaDocs ArenaVision Arena
#19
Qwen3.7 Max Preview

qwen3.7-max-preview

AlibabaProprietary
70reasoning proxy
1,475Text Arena rank #15
Not listedDocs Arena
Not listedVision Arena
Standardmode signal
Text ArenaDocs ArenaVision Arena
#20
GLM 5.1

glm-5.1

Z.aiOpen weights
69reasoning proxy
1,474Text Arena rank #16
Not listedDocs Arena
Not listedVision Arena
Standardmode signal
Text ArenaDocs ArenaVision Arena
#21
Grok 4.20 Multi Agent Beta 0309

grok-4.20-multi-agent-beta-0309

xAIProprietary
69reasoning proxy
1,472Text Arena rank #20
Not listedDocs Arena
1,247Vision Arena rank #26
Standardmode signal
Text ArenaDocs ArenaVision Arena
#22
Claude Opus 4.5 20251101 Thinking 32K

claude-opus-4-5-20251101-thinking-32k

AnthropicProprietary
68reasoning proxy
1,473Text Arena rank #19
Not listedDocs Arena
Not listedVision Arena
Reasoningmode signal
Text ArenaDocs ArenaVision Arena
#23
ERNIE 5.1

ernie-5.1

BaiduProprietary
65reasoning proxy
1,470Text Arena rank #21
Not listedDocs Arena
Not listedVision Arena
Standardmode signal
Text ArenaDocs ArenaVision Arena
#24
GLM 5v Turbo

glm-5v-turbo

Z.aiProprietary
64reasoning proxy
Not listedText Arena
Not listedDocs Arena
1,227Vision Arena rank #37
Standardmode signal
Text ArenaDocs ArenaVision Arena
#25
Kimi K2.6

kimi-k2.6

Moonshot AIOpen weights
64reasoning proxy
1,462Text Arena rank #28
Not listedDocs Arena
1,259Vision Arena rank #19
Standardmode signal
Text ArenaDocs ArenaVision Arena
#26
Claude Opus 4.5 20251101

claude-opus-4-5-20251101

AnthropicProprietary
64reasoning proxy
1,469Text Arena rank #24
Not listedDocs Arena
Not listedVision Arena
Standardmode signal
Text ArenaDocs ArenaVision Arena
#27
Gemini 2.5 Flash Preview 09 2025

gemini-2.5-flash-preview-09-2025

GoogleProprietary
64reasoning proxy
Not listedText Arena
Not listedDocs Arena
1,226Vision Arena rank #39
Standardmode signal
Text ArenaDocs ArenaVision Arena
#28
Gemini 3 Flash

gemini-3-flash (thinking-minimal)

GoogleProprietary
63reasoning proxy
1,461Text Arena rank #29
Not listedDocs Arena
1,260Vision Arena rank #17
Reasoningmode signal
Text ArenaDocs ArenaVision Arena
#29
Qwen3.5 27b

qwen3.5-27b

AlibabaOpen weights
61reasoning proxy
Not listedText Arena
Not listedDocs Arena
1,221Vision Arena rank #42
Standardmode signal
Text ArenaDocs ArenaVision Arena
#30
ERNIE 5.0 Preview 1220

ernie-5.0-preview-1220

BaiduProprietary
60reasoning proxy
Not listedText Arena
Not listedDocs Arena
1,219Vision Arena rank #43
Standardmode signal
Text ArenaDocs ArenaVision Arena
#1
Claude Opus 4.6 Thinking

claude-opus-4-6-thinking

AnthropicProprietary
99reasoning proxy
Text
1,502Text Arena rank #1
Docs
Not listedDocs Arena
Vision
1,300Vision Arena rank #3
Mode
Reasoningmode signal
Text ArenaDocs ArenaVision Arena
#2
Claude Opus 4.7 Thinking

claude-opus-4-7-thinking

AnthropicProprietary
99reasoning proxy
Text
1,500Text Arena rank #2
Docs
Not listedDocs Arena
Vision
1,306Vision Arena rank #1
Mode
Reasoningmode signal
Text ArenaDocs ArenaVision Arena
#3
Claude Opus 4.6

claude-opus-4-6

AnthropicProprietary
95reasoning proxy
Text
1,498Text Arena rank #3
Docs
Not listedDocs Arena
Vision
1,293Vision Arena rank #5
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#4
Claude Opus 4.7

claude-opus-4-7

AnthropicProprietary
94reasoning proxy
Text
1,494Text Arena rank #4
Docs
Not listedDocs Arena
Vision
1,304Vision Arena rank #2
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#5
Muse Spark

muse-spark

MetaProprietary
89reasoning proxy
Text
1,489Text Arena rank #5
Docs
Not listedDocs Arena
Vision
1,296Vision Arena rank #4
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#6
Gemini 3 Pro

gemini-3-pro

GoogleProprietary
86reasoning proxy
Text
1,486Text Arena rank #7
Docs
Not listedDocs Arena
Vision
1,289Vision Arena rank #6
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#7
Gemini 3.1 Pro Preview

gemini-3.1-pro-preview

GoogleProprietary
85reasoning proxy
Text
1,487Text Arena rank #6
Docs
Not listedDocs Arena
Vision
1,277Vision Arena rank #10
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#8
GPT-5.5 High

gpt-5.5-high

OpenAIProprietary
81reasoning proxy
Text
1,482Text Arena rank #8
Docs
Not listedDocs Arena
Vision
1,278Vision Arena rank #9
Mode
Reasoningmode signal
Text ArenaDocs ArenaVision Arena
#9
GPT-5.4 High

gpt-5.4-high

OpenAIProprietary
80reasoning proxy
Text
1,480Text Arena rank #9
Docs
Not listedDocs Arena
Vision
1,277Vision Arena rank #11
Mode
Reasoningmode signal
Text ArenaDocs ArenaVision Arena
#10
Qwen3.7 Plus Preview

qwen3.7-plus-preview

AlibabaProprietary
79reasoning proxy
Text
Not listedText Arena
Docs
Not listedDocs Arena
Vision
1,260Vision Arena rank #16
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#11
GPT-5.5

gpt-5.5

OpenAIProprietary
79reasoning proxy
Text
1,476Text Arena rank #11
Docs
Not listedDocs Arena
Vision
1,288Vision Arena rank #7
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#12
GPT-5.2 Chat Latest 20260210

gpt-5.2-chat-latest-20260210

OpenAIProprietary
77reasoning proxy
Text
1,476Text Arena rank #12
Docs
Not listedDocs Arena
Vision
1,280Vision Arena rank #8
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#13
GPT-5.5 Instant

gpt-5.5-instant

OpenAIProprietary
75reasoning proxy
Text
1,474Text Arena rank #17
Docs
Not listedDocs Arena
Vision
1,275Vision Arena rank #13
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#14
Gemini 3.5 Flash

gemini-3.5-flash

GoogleProprietary
75reasoning proxy
Text
1,479Text Arena rank #10
Docs
Not listedDocs Arena
Vision
Not listedVision Arena
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#15
Claude Sonnet 4.6

claude-sonnet-4-6

AnthropicProprietary
72reasoning proxy
Text
1,470Text Arena rank #22
Docs
Not listedDocs Arena
Vision
1,275Vision Arena rank #12
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#16
Grok 4.20 Beta 0309 Reasoning

grok-4.20-beta-0309-reasoning

xAIProprietary
72reasoning proxy
Text
1,475Text Arena rank #14
Docs
Not listedDocs Arena
Vision
1,251Vision Arena rank #21
Mode
Reasoningmode signal
Text ArenaDocs ArenaVision Arena
#17
Grok 4.20 Beta 1

grok-4.20-beta1

xAIProprietary
71reasoning proxy
Text
1,476Text Arena rank #13
Docs
Not listedDocs Arena
Vision
Not listedVision Arena
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#18
GPT-5.4

gpt-5.4

OpenAIProprietary
71reasoning proxy
Text
1,469Text Arena rank #23
Docs
Not listedDocs Arena
Vision
1,269Vision Arena rank #15
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#19
Qwen3.7 Max Preview

qwen3.7-max-preview

AlibabaProprietary
70reasoning proxy
Text
1,475Text Arena rank #15
Docs
Not listedDocs Arena
Vision
Not listedVision Arena
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#20
GLM 5.1

glm-5.1

Z.aiOpen weights
69reasoning proxy
Text
1,474Text Arena rank #16
Docs
Not listedDocs Arena
Vision
Not listedVision Arena
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#21
Grok 4.20 Multi Agent Beta 0309

grok-4.20-multi-agent-beta-0309

xAIProprietary
69reasoning proxy
Text
1,472Text Arena rank #20
Docs
Not listedDocs Arena
Vision
1,247Vision Arena rank #26
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#22
Claude Opus 4.5 20251101 Thinking 32K

claude-opus-4-5-20251101-thinking-32k

AnthropicProprietary
68reasoning proxy
Text
1,473Text Arena rank #19
Docs
Not listedDocs Arena
Vision
Not listedVision Arena
Mode
Reasoningmode signal
Text ArenaDocs ArenaVision Arena
#23
ERNIE 5.1

ernie-5.1

BaiduProprietary
65reasoning proxy
Text
1,470Text Arena rank #21
Docs
Not listedDocs Arena
Vision
Not listedVision Arena
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#24
GLM 5v Turbo

glm-5v-turbo

Z.aiProprietary
64reasoning proxy
Text
Not listedText Arena
Docs
Not listedDocs Arena
Vision
1,227Vision Arena rank #37
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#25
Kimi K2.6

kimi-k2.6

Moonshot AIOpen weights
64reasoning proxy
Text
1,462Text Arena rank #28
Docs
Not listedDocs Arena
Vision
1,259Vision Arena rank #19
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#26
Claude Opus 4.5 20251101

claude-opus-4-5-20251101

AnthropicProprietary
64reasoning proxy
Text
1,469Text Arena rank #24
Docs
Not listedDocs Arena
Vision
Not listedVision Arena
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#27
Gemini 2.5 Flash Preview 09 2025

gemini-2.5-flash-preview-09-2025

GoogleProprietary
64reasoning proxy
Text
Not listedText Arena
Docs
Not listedDocs Arena
Vision
1,226Vision Arena rank #39
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#28
Gemini 3 Flash

gemini-3-flash (thinking-minimal)

GoogleProprietary
63reasoning proxy
Text
1,461Text Arena rank #29
Docs
Not listedDocs Arena
Vision
1,260Vision Arena rank #17
Mode
Reasoningmode signal
Text ArenaDocs ArenaVision Arena
#29
Qwen3.5 27b

qwen3.5-27b

AlibabaOpen weights
61reasoning proxy
Text
Not listedText Arena
Docs
Not listedDocs Arena
Vision
1,221Vision Arena rank #42
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena
#30
ERNIE 5.0 Preview 1220

ernie-5.0-preview-1220

BaiduProprietary
60reasoning proxy
Text
Not listedText Arena
Docs
Not listedDocs Arena
Vision
1,219Vision Arena rank #43
Mode
Standardmode signal
Text ArenaDocs ArenaVision Arena

Benchmark guide

What the scores mean.

A quick reading key for reasoning and knowledge rankings while dedicated reasoning benchmark feeds are being integrated.

Higher: reasoning proxyLiveBench planned
What does the reasoning proxy measure?

The reasoning proxy currently blends Text, Document, and Vision Arena scores. It is an orientation layer for broad model capability, not a replacement for dedicated reasoning benchmarks like LiveBench, GPQA, AIME, or MMLU-Pro.

Why use Arena scores for reasoning?

Arena scores are live public signals with broad model coverage. They are useful while dedicated reasoning feeds are being wired, but they should be read as preference-based capability signals rather than exam-style accuracy scores.

What will LiveBench add?

LiveBench is designed to update over time and reduce benchmark contamination risk. Once integrated, it can add more direct reasoning, math, data, and coding task signals to this page.

Why are reasoning modes only a signal?

A model name or tag like Thinking, Reasoning, or High can indicate an inference mode, but it is not itself a benchmark score. The page treats those labels as context, not as proof of better reasoning.