Thoughts on AI, technology, and the future we're building.

New posts every week

HomeAll PostsAI NewsAI Basics
Timelines
ChatGPTOpenAI release historyAnthropic ClaudeClaude release historyGoogle GeminiGemini release history
Benchmarks
OverviewFull model trackerValue RankingsPerformance for the moneyCoding RankingsSWE-bench and code signalsAgent RankingsTool and workflow signalsReasoning RankingsKnowledge and reasoningLong ContextDocument and retrieval signalsLab ComparisonsProvider-level rankings
CategoriesAboutContact

Subscribe to Newsletter

Practical AI news, tips, tricks, tool analysis, sent straight to your inbox.

No spam. Unsubscribe anytime.

Practical explainers, tool notes, and systems thinking for people turning new AI capability into useful work.

Explore

  • All Posts
  • Categories
  • About
  • Contact

Categories

  • AI News
  • AI Basics
  • ChatGPT
  • Anthropic
  • AI Tools
  • AI Video
  • AI Images
  • Courses

Connect

LinkedInTwitterRSS

© 2026. All rights reserved.

Benchmark suiteUpdated May 27, 2026

Coding model rankings.

A focused coding view using Code Arena and SWE-bench today, with clear source slots for Terminal-Bench and Aider-style coding benchmarks as those feeds are wired.

What These Mean

Ranked models

102

Code Arena or SWE-bench signal

SWE-bench matches

22

Matched to model rows

Top coding index

Claude Opus 4.7 Thinking

100 coding index

Best SWE-bench

76.8%

Claude Opus 4.5 20251101

OverviewFull public trackerValuePerformance for the moneyCodingCode and SWE-bench signalsAgentsTool and workflow readinessReasoningKnowledge and reasoning signalsContextDocument and retrieval signalsLabsProvider comparisons

Coding formula

Code generation plus real issue resolution.

The coding index blends Code Arena and SWE-bench where available, with output speed as a small tie-breaker. Terminal-Bench and Aider remain visible as planned feed slots until we have a reliable ingestion path.

Coding performance

Higher coding index is better. SWE-bench cells show verified model matches only.

Top 30 of 102

RankModelIndexCode ArenaSWE-benchSpeedRun modeSources
#1
Claude Opus 4.7 Thinking

claude-opus-4-7-thinking

AnthropicProprietary
100coding index
1,567Code Arena rank #1
Not listedNo confident match
Not listedoutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-bench
#2
Qwen3.7 Max 20260517

qwen3.7-max-20260517

AlibabaProprietary
95coding index
1,541Code Arena rank #4
Not listedNo confident match
Not listedoutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-bench
#3
Muse Spark

muse-spark

MetaProprietary
88coding index
1,508Code Arena rank #9
Not listedNo confident match
Not listedoutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#4
GPT-5.5 xHigh

gpt-5.5-xhigh (codex-harness)

OpenAIProprietary
87coding index
1,505Code Arena rank #11
Not listedNo confident match
Not listedoutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-bench
#5
Claude Opus 4.6

claude-opus-4-6

AnthropicProprietary
86coding index
1,538Code Arena rank #5
75.6%SWE-bench rank #4
42 tok/soutput speed
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#6
Claude Opus 4.5 20251101 Thinking 32K

claude-opus-4-5-20251101-thinking-32k

AnthropicProprietary
84coding index
1,490Code Arena rank #12
Not listedNo confident match
Not listedoutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-bench
#7
Gemini 3 Pro

gemini-3-pro

GoogleProprietary
83coding index
1,438Code Arena rank #24
74.2%SWE-bench rank #6
Not listedoutput speed
bash-onlymini 1.15.0
Code ArenaSWE-benchArtificial Analysis
#8
Claude Opus 4.7

claude-opus-4-7

AnthropicProprietary
82coding index
1,562Code Arena rank #2
Not listedNo confident match
42 tok/soutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#9
Claude Opus 4.5 20251101

claude-opus-4-5-20251101

AnthropicProprietary
80coding index
1,467Code Arena rank #16
76.8%SWE-bench rank #1
54 tok/soutput speed
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#10
Claude Opus 4.6 Thinking

claude-opus-4-6-thinking

AnthropicProprietary
79coding index
1,542Code Arena rank #3
Not listedNo confident match
47 tok/soutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#11
GPT-5.2

gpt-5.2

OpenAIProprietary
78coding index
1,404Code Arena rank #32
72.8%SWE-bench rank #9
Not listedoutput speed
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#12
GLM 5.1

glm-5.1

Z.aiOpen weights
77coding index
1,533Code Arena rank #6
Not listedNo confident match
49 tok/soutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#13
GPT-5.4 High

gpt-5.4-high

OpenAIProprietary
77coding index
1,457Code Arena rank #19
Not listedNo confident match
Not listedoutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-bench
#14
Claude Sonnet 4.6

claude-sonnet-4-6

AnthropicProprietary
76coding index
1,523Code Arena rank #7
Not listedNo confident match
47 tok/soutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#15
Gemini 3.5 Flash

gemini-3.5-flash

GoogleProprietary
75coding index
1,506Code Arena rank #10
Not listedNo confident match
198 tok/soutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#16
Kimi K2.6

kimi-k2.6

Moonshot AIOpen weights
75coding index
1,518Code Arena rank #8
Not listedNo confident match
42 tok/soutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#17
GLM 5

glm-5

Z.aiOpen weights
74coding index
1,436Code Arena rank #27
72.8%SWE-bench rank #8
80 tok/soutput speed
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#18
Gemini 3 Flash

gemini-3-flash (thinking-minimal)

GoogleProprietary
73coding index
1,387Code Arena rank #42
75.8%SWE-bench rank #2
184 tok/soutput speed
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#19
GPT-5.4 Medium

gpt-5.4-medium (codex-harness)

OpenAIProprietary
73coding index
1,437Code Arena rank #25
Not listedNo confident match
Not listedoutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-bench
#20
Minimax M2.5

minimax-m2.5

MiniMaxOpen weights
72coding index
1,382Code Arena rank #45
75.8%SWE-bench rank #3
202 tok/soutput speed
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#21
Kimi K2.5 Thinking

kimi-k2.5-thinking

Moonshot AIOpen weights
71coding index
1,431Code Arena rank #29
70.8%SWE-bench rank #13
33 tok/soutput speed
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#22
Qwen3.6 Max Preview

qwen3.6-max-preview

AlibabaProprietary
69coding index
1,486Code Arena rank #13
Not listedNo confident match
41 tok/soutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#23
GPT-5.5 High

gpt-5.5-high

OpenAIProprietary
68coding index
1,479Code Arena rank #14
Not listedNo confident match
61 tok/soutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#24
Claude Sonnet 4.5 20250929

claude-sonnet-4-5-20250929

AnthropicProprietary
67coding index
1,386Code Arena rank #43
71.4%SWE-bench rank #12
43 tok/soutput speed
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#25
DeepSeek V3.2

deepseek-v3.2

DeepSeekOpen weights
67coding index
1,332Code Arena rank #58
70%SWE-bench rank #15
Not listedoutput speed
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#26
Kimi K2.5 Instant

kimi-k2.5-instant

Moonshot AIOpen weights
67coding index
1,408Code Arena rank #30
Not listedNo confident match
Not listedoutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-bench
#27
MiMo V2.5 Pro

mimo-v2.5-pro

XiaomiOpen weights
66coding index
1,471Code Arena rank #15
Not listedNo confident match
46 tok/soutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#28
Claude Opus 4 20250514

claude-opus-4-20250514

AnthropicProprietary
66coding index
Not listedCode Arena
67.6%SWE-bench rank #18
37 tok/soutput speed
bash-onlymini 1.0.0
Code ArenaSWE-benchArtificial Analysis
#29
DeepSeek V4 Pro Thinking

deepseek-v4-pro-thinking

DeepSeekOpen weights
65coding index
1,464Code Arena rank #17
Not listedNo confident match
45 tok/soutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#30
Qwen3.6 Plus

qwen3.6-plus

AlibabaProprietary
65coding index
1,460Code Arena rank #18
Not listedNo confident match
53 tok/soutput speed
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#1
Claude Opus 4.7 Thinking

claude-opus-4-7-thinking

AnthropicProprietary
100coding index
Code Arena
1,567Code Arena rank #1
SWE-bench
Not listedNo confident match
Speed
Not listedoutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-bench
#2
Qwen3.7 Max 20260517

qwen3.7-max-20260517

AlibabaProprietary
95coding index
Code Arena
1,541Code Arena rank #4
SWE-bench
Not listedNo confident match
Speed
Not listedoutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-bench
#3
Muse Spark

muse-spark

MetaProprietary
88coding index
Code Arena
1,508Code Arena rank #9
SWE-bench
Not listedNo confident match
Speed
Not listedoutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#4
GPT-5.5 xHigh

gpt-5.5-xhigh (codex-harness)

OpenAIProprietary
87coding index
Code Arena
1,505Code Arena rank #11
SWE-bench
Not listedNo confident match
Speed
Not listedoutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-bench
#5
Claude Opus 4.6

claude-opus-4-6

AnthropicProprietary
86coding index
Code Arena
1,538Code Arena rank #5
SWE-bench
75.6%SWE-bench rank #4
Speed
42 tok/soutput speed
Run mode
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#6
Claude Opus 4.5 20251101 Thinking 32K

claude-opus-4-5-20251101-thinking-32k

AnthropicProprietary
84coding index
Code Arena
1,490Code Arena rank #12
SWE-bench
Not listedNo confident match
Speed
Not listedoutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-bench
#7
Gemini 3 Pro

gemini-3-pro

GoogleProprietary
83coding index
Code Arena
1,438Code Arena rank #24
SWE-bench
74.2%SWE-bench rank #6
Speed
Not listedoutput speed
Run mode
bash-onlymini 1.15.0
Code ArenaSWE-benchArtificial Analysis
#8
Claude Opus 4.7

claude-opus-4-7

AnthropicProprietary
82coding index
Code Arena
1,562Code Arena rank #2
SWE-bench
Not listedNo confident match
Speed
42 tok/soutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#9
Claude Opus 4.5 20251101

claude-opus-4-5-20251101

AnthropicProprietary
80coding index
Code Arena
1,467Code Arena rank #16
SWE-bench
76.8%SWE-bench rank #1
Speed
54 tok/soutput speed
Run mode
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#10
Claude Opus 4.6 Thinking

claude-opus-4-6-thinking

AnthropicProprietary
79coding index
Code Arena
1,542Code Arena rank #3
SWE-bench
Not listedNo confident match
Speed
47 tok/soutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#11
GPT-5.2

gpt-5.2

OpenAIProprietary
78coding index
Code Arena
1,404Code Arena rank #32
SWE-bench
72.8%SWE-bench rank #9
Speed
Not listedoutput speed
Run mode
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#12
GLM 5.1

glm-5.1

Z.aiOpen weights
77coding index
Code Arena
1,533Code Arena rank #6
SWE-bench
Not listedNo confident match
Speed
49 tok/soutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#13
GPT-5.4 High

gpt-5.4-high

OpenAIProprietary
77coding index
Code Arena
1,457Code Arena rank #19
SWE-bench
Not listedNo confident match
Speed
Not listedoutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-bench
#14
Claude Sonnet 4.6

claude-sonnet-4-6

AnthropicProprietary
76coding index
Code Arena
1,523Code Arena rank #7
SWE-bench
Not listedNo confident match
Speed
47 tok/soutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#15
Gemini 3.5 Flash

gemini-3.5-flash

GoogleProprietary
75coding index
Code Arena
1,506Code Arena rank #10
SWE-bench
Not listedNo confident match
Speed
198 tok/soutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#16
Kimi K2.6

kimi-k2.6

Moonshot AIOpen weights
75coding index
Code Arena
1,518Code Arena rank #8
SWE-bench
Not listedNo confident match
Speed
42 tok/soutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#17
GLM 5

glm-5

Z.aiOpen weights
74coding index
Code Arena
1,436Code Arena rank #27
SWE-bench
72.8%SWE-bench rank #8
Speed
80 tok/soutput speed
Run mode
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#18
Gemini 3 Flash

gemini-3-flash (thinking-minimal)

GoogleProprietary
73coding index
Code Arena
1,387Code Arena rank #42
SWE-bench
75.8%SWE-bench rank #2
Speed
184 tok/soutput speed
Run mode
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#19
GPT-5.4 Medium

gpt-5.4-medium (codex-harness)

OpenAIProprietary
73coding index
Code Arena
1,437Code Arena rank #25
SWE-bench
Not listedNo confident match
Speed
Not listedoutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-bench
#20
Minimax M2.5

minimax-m2.5

MiniMaxOpen weights
72coding index
Code Arena
1,382Code Arena rank #45
SWE-bench
75.8%SWE-bench rank #3
Speed
202 tok/soutput speed
Run mode
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#21
Kimi K2.5 Thinking

kimi-k2.5-thinking

Moonshot AIOpen weights
71coding index
Code Arena
1,431Code Arena rank #29
SWE-bench
70.8%SWE-bench rank #13
Speed
33 tok/soutput speed
Run mode
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#22
Qwen3.6 Max Preview

qwen3.6-max-preview

AlibabaProprietary
69coding index
Code Arena
1,486Code Arena rank #13
SWE-bench
Not listedNo confident match
Speed
41 tok/soutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#23
GPT-5.5 High

gpt-5.5-high

OpenAIProprietary
68coding index
Code Arena
1,479Code Arena rank #14
SWE-bench
Not listedNo confident match
Speed
61 tok/soutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#24
Claude Sonnet 4.5 20250929

claude-sonnet-4-5-20250929

AnthropicProprietary
67coding index
Code Arena
1,386Code Arena rank #43
SWE-bench
71.4%SWE-bench rank #12
Speed
43 tok/soutput speed
Run mode
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#25
DeepSeek V3.2

deepseek-v3.2

DeepSeekOpen weights
67coding index
Code Arena
1,332Code Arena rank #58
SWE-bench
70%SWE-bench rank #15
Speed
Not listedoutput speed
Run mode
bash-onlymini 2.0.0
Code ArenaSWE-benchArtificial Analysis
#26
Kimi K2.5 Instant

kimi-k2.5-instant

Moonshot AIOpen weights
67coding index
Code Arena
1,408Code Arena rank #30
SWE-bench
Not listedNo confident match
Speed
Not listedoutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-bench
#27
MiMo V2.5 Pro

mimo-v2.5-pro

XiaomiOpen weights
66coding index
Code Arena
1,471Code Arena rank #15
SWE-bench
Not listedNo confident match
Speed
46 tok/soutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#28
Claude Opus 4 20250514

claude-opus-4-20250514

AnthropicProprietary
66coding index
Code Arena
Not listedCode Arena
SWE-bench
67.6%SWE-bench rank #18
Speed
37 tok/soutput speed
Run mode
bash-onlymini 1.0.0
Code ArenaSWE-benchArtificial Analysis
#29
DeepSeek V4 Pro Thinking

deepseek-v4-pro-thinking

DeepSeekOpen weights
65coding index
Code Arena
1,464Code Arena rank #17
SWE-bench
Not listedNo confident match
Speed
45 tok/soutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis
#30
Qwen3.6 Plus

qwen3.6-plus

AlibabaProprietary
65coding index
Code Arena
1,460Code Arena rank #18
SWE-bench
Not listedNo confident match
Speed
53 tok/soutput speed
Run mode
PlannedTerminal-Bench + Aider feed
Code ArenaSWE-benchArtificial Analysis

Benchmark guide

What the scores mean.

A quick reading key for comparing coding models without confusing source coverage, preference scores, and real issue-resolution benchmarks.

Higher: coding indexMissing rows are not zero
What does the coding index measure?

The coding index combines Code Arena and SWE-bench signals where available, with output speed used as a small tie-breaker. It is a practical coding comparison, not a guarantee that the model will solve every repository task.

How should I read Code Arena versus SWE-bench?

Code Arena is a public preference-style benchmark for coding outputs, while SWE-bench measures real software issue resolution. Code Arena is broader and more available; SWE-bench is more task-specific and harder to match across model names.

Why does Terminal-Bench show as planned?

Terminal-Bench is relevant for agentic coding workflows, but this page does not mix it into rankings until a reliable source feed is wired. Planned benchmarks are called out so the table does not imply hidden or invented scores.

Why are some SWE-bench cells empty?

Empty SWE-bench cells mean there is no public model row or no confident match to the model name in this table. They should be read as missing source coverage, not as a zero percent result.