Thoughts on AI, technology, and the future we're building.

New posts every week

HomeAll PostsAI NewsAI Basics
Timelines
ChatGPTOpenAI release historyAnthropic ClaudeClaude release historyGoogle GeminiGemini release history
Benchmarks
OverviewFull model trackerValue RankingsPerformance for the moneyCoding RankingsSWE-bench and code signalsAgent RankingsTool and workflow signalsReasoning RankingsKnowledge and reasoningLong ContextDocument and retrieval signalsLab ComparisonsProvider-level rankings
CategoriesAboutContact

Subscribe to Newsletter

Practical AI news, tips, tricks, tool analysis, sent straight to your inbox.

No spam. Unsubscribe anytime.

Practical explainers, tool notes, and systems thinking for people turning new AI capability into useful work.

Explore

  • All Posts
  • Categories
  • About
  • Contact

Categories

  • AI News
  • AI Basics
  • ChatGPT
  • Anthropic
  • AI Tools
  • AI Video
  • AI Images
  • Courses

Connect

LinkedInTwitterRSS

© 2026. All rights reserved.

Benchmark suiteUpdated May 27, 2026

Agent and tool-use rankings.

A pragmatic agent-readiness view using the source-backed signals we have today: search, document work, code performance, and latency. BFCL and Terminal-Bench are called out as planned feed integrations.

What These Mean

Ranked models

128

Search, docs, code, or latency signal

Search Arena

29

Models with public search scores

Docs Arena

0

Models with document scores

Top proxy

Claude Opus 4.7 Thinking

100 agent proxy

OverviewFull public trackerValuePerformance for the moneyCodingCode and SWE-bench signalsAgentsTool and workflow readinessReasoningKnowledge and reasoning signalsContextDocument and retrieval signalsLabsProvider comparisons

Agent formula

Workflow signals without overstating tool calling.

This page is intentionally labeled as a proxy until BFCL and Terminal-Bench are wired. It ranks search, docs, coding, and latency signals that matter for agent workflows without claiming they are a full agent benchmark.

Agent workflow proxy

Higher proxy index is better. BFCL remains a planned feed, not a silent placeholder.

Top 30 of 128

RankModelIndexSearchDocsLatencyTool useSources
#1
Claude Opus 4.7 Thinking

claude-opus-4-7-thinking

AnthropicProprietary
100agent proxy
Not listedSearch Arena
Not listedDocs Arena
Not listedlatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#2
GPT-5.5 Search

gpt-5.5-search

OpenAIProprietary
100agent proxy
1,223Search Arena rank #1
Not listedDocs Arena
Not listedlatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#3
GPT 4o 2024 05 13

gpt-4o-2024-05-13

OpenAIProprietary
100agent proxy
Not listedSearch Arena
Not listedDocs Arena
0.55slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#4
GPT 4o 2024 08 06

gpt-4o-2024-08-06

OpenAIProprietary
100agent proxy
Not listedSearch Arena
Not listedDocs Arena
0.55slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#5
GPT-5 High

gpt-5-high

OpenAIProprietary
100agent proxy
Not listedSearch Arena
Not listedDocs Arena
0.56slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#6
Pixtral Large 2411

pixtral-large-2411

MistralProprietary
100agent proxy
Not listedSearch Arena
Not listedDocs Arena
0.72slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#7
DeepSeek V4 Flash

deepseek-v4-flash

DeepSeekOpen weights
100agent proxy
Not listedSearch Arena
Not listedDocs Arena
0.96slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#8
DeepSeek V4 Pro

deepseek-v4-pro

DeepSeekOpen weights
100agent proxy
Not listedSearch Arena
Not listedDocs Arena
1.16slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#9
GLM 4.5v

glm-4.5v

Z.aiOpen weights
100agent proxy
Not listedSearch Arena
Not listedDocs Arena
1.18slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#10
Qwen3 VL 235b A22B Instruct

qwen3-vl-235b-a22b-instruct

AlibabaOpen weights
100agent proxy
Not listedSearch Arena
Not listedDocs Arena
1.20slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#11
Qwen3 235b A22B Instruct 2507

qwen3-235b-a22b-instruct-2507

AlibabaOpen weights
100agent proxy
Not listedSearch Arena
Not listedDocs Arena
1.28slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#12
GLM 4.5

glm-4.5

Z.aiOpen weights
100agent proxy
Not listedSearch Arena
Not listedDocs Arena
1.28slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#13
Qwen3.7 Plus Preview

qwen3.7-plus-preview

AlibabaProprietary
100agent proxy
Not listedSearch Arena
Not listedDocs Arena
1.33slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#14
GLM 4.6v

glm-4.6v

Z.aiOpen weights
99agent proxy
Not listedSearch Arena
Not listedDocs Arena
1.43slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#15
Kimi K2 0905 Preview

kimi-k2-0905-preview

Moonshot AIOpen weights
99agent proxy
Not listedSearch Arena
Not listedDocs Arena
1.56slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#16
Qwen3.7 Max Preview

qwen3.7-max-preview

AlibabaProprietary
99agent proxy
Not listedSearch Arena
Not listedDocs Arena
1.75slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#17
Qwen3 Max Preview

qwen3-max-preview

AlibabaProprietary
99agent proxy
Not listedSearch Arena
Not listedDocs Arena
1.93slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#18
MiMo V2 Omni

mimo-v2-omni

XiaomiProprietary
99agent proxy
Not listedSearch Arena
Not listedDocs Arena
2.26slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#19
Qwen3 VL 235b A22B Thinking

qwen3-vl-235b-a22b-thinking

AlibabaOpen weights
99agent proxy
Not listedSearch Arena
Not listedDocs Arena
2.73slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#20
GPT-5.4 Nano High

gpt-5.4-nano-high

OpenAIProprietary
98agent proxy
Not listedSearch Arena
Not listedDocs Arena
3.50slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#21
Claude Opus 4.6 Search

claude-opus-4-6-search

AnthropicProprietary
98agent proxy
1,219Search Arena rank #2
Not listedDocs Arena
Not listedlatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#22
Claude Opus 4.7

claude-opus-4-7

AnthropicProprietary
97agent proxy
1,214Search Arena rank #3
Not listedDocs Arena
1.14slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#23
Claude Opus 4 20250514

claude-opus-4-20250514

AnthropicProprietary
96agent proxy
Not listedSearch Arena
Not listedDocs Arena
7.78slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#24
Claude Opus 4.6

claude-opus-4-6

AnthropicProprietary
96agent proxy
Not listedSearch Arena
Not listedDocs Arena
1.49slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#25
GLM 5.1

glm-5.1

Z.aiOpen weights
96agent proxy
Not listedSearch Arena
Not listedDocs Arena
1.12slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#26
Claude Sonnet 4 20250514

claude-sonnet-4-20250514

AnthropicProprietary
96agent proxy
Not listedSearch Arena
Not listedDocs Arena
9.41slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#27
Claude Opus 4.6 Thinking

claude-opus-4-6-thinking

AnthropicProprietary
95agent proxy
Not listedSearch Arena
Not listedDocs Arena
11.0slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#28
Gemini 2.5 Flash

gemini-2.5-flash

GoogleProprietary
95agent proxy
Not listedSearch Arena
Not listedDocs Arena
11.3slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#29
Claude Sonnet 4.6

claude-sonnet-4-6

AnthropicProprietary
95agent proxy
Not listedSearch Arena
Not listedDocs Arena
1.13slatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#30
Qwen3.7 Max 20260517

qwen3.7-max-20260517

AlibabaProprietary
95agent proxy
Not listedSearch Arena
Not listedDocs Arena
Not listedlatency
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#1
Claude Opus 4.7 Thinking

claude-opus-4-7-thinking

AnthropicProprietary
100agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
Not listedlatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#2
GPT-5.5 Search

gpt-5.5-search

OpenAIProprietary
100agent proxy
Search
1,223Search Arena rank #1
Docs
Not listedDocs Arena
Latency
Not listedlatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#3
GPT 4o 2024 05 13

gpt-4o-2024-05-13

OpenAIProprietary
100agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
0.55slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#4
GPT 4o 2024 08 06

gpt-4o-2024-08-06

OpenAIProprietary
100agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
0.55slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#5
GPT-5 High

gpt-5-high

OpenAIProprietary
100agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
0.56slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#6
Pixtral Large 2411

pixtral-large-2411

MistralProprietary
100agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
0.72slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#7
DeepSeek V4 Flash

deepseek-v4-flash

DeepSeekOpen weights
100agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
0.96slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#8
DeepSeek V4 Pro

deepseek-v4-pro

DeepSeekOpen weights
100agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
1.16slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#9
GLM 4.5v

glm-4.5v

Z.aiOpen weights
100agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
1.18slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#10
Qwen3 VL 235b A22B Instruct

qwen3-vl-235b-a22b-instruct

AlibabaOpen weights
100agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
1.20slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#11
Qwen3 235b A22B Instruct 2507

qwen3-235b-a22b-instruct-2507

AlibabaOpen weights
100agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
1.28slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#12
GLM 4.5

glm-4.5

Z.aiOpen weights
100agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
1.28slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#13
Qwen3.7 Plus Preview

qwen3.7-plus-preview

AlibabaProprietary
100agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
1.33slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#14
GLM 4.6v

glm-4.6v

Z.aiOpen weights
99agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
1.43slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#15
Kimi K2 0905 Preview

kimi-k2-0905-preview

Moonshot AIOpen weights
99agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
1.56slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#16
Qwen3.7 Max Preview

qwen3.7-max-preview

AlibabaProprietary
99agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
1.75slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#17
Qwen3 Max Preview

qwen3-max-preview

AlibabaProprietary
99agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
1.93slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#18
MiMo V2 Omni

mimo-v2-omni

XiaomiProprietary
99agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
2.26slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#19
Qwen3 VL 235b A22B Thinking

qwen3-vl-235b-a22b-thinking

AlibabaOpen weights
99agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
2.73slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#20
GPT-5.4 Nano High

gpt-5.4-nano-high

OpenAIProprietary
98agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
3.50slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#21
Claude Opus 4.6 Search

claude-opus-4-6-search

AnthropicProprietary
98agent proxy
Search
1,219Search Arena rank #2
Docs
Not listedDocs Arena
Latency
Not listedlatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#22
Claude Opus 4.7

claude-opus-4-7

AnthropicProprietary
97agent proxy
Search
1,214Search Arena rank #3
Docs
Not listedDocs Arena
Latency
1.14slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#23
Claude Opus 4 20250514

claude-opus-4-20250514

AnthropicProprietary
96agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
7.78slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#24
Claude Opus 4.6

claude-opus-4-6

AnthropicProprietary
96agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
1.49slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#25
GLM 5.1

glm-5.1

Z.aiOpen weights
96agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
1.12slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#26
Claude Sonnet 4 20250514

claude-sonnet-4-20250514

AnthropicProprietary
96agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
9.41slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#27
Claude Opus 4.6 Thinking

claude-opus-4-6-thinking

AnthropicProprietary
95agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
11.0slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#28
Gemini 2.5 Flash

gemini-2.5-flash

GoogleProprietary
95agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
11.3slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#29
Claude Sonnet 4.6

claude-sonnet-4-6

AnthropicProprietary
95agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
1.13slatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena
#30
Qwen3.7 Max 20260517

qwen3.7-max-20260517

AlibabaProprietary
95agent proxy
Search
Not listedSearch Arena
Docs
Not listedDocs Arena
Latency
Not listedlatency
Tool use
BFCL pendingtool calling feed
Search ArenaDocs ArenaCode Arena

Benchmark guide

What the scores mean.

A quick reading key for agent-oriented benchmark signals, especially where tool-use benchmarks are still source-ready rather than live in the table.

Higher: agent proxyBFCL + Terminal-Bench planned
What does the agent proxy measure?

The agent proxy blends available search, document, coding, and latency signals that matter for multi-step workflows. It is useful for orientation, but it is not a full tool-use benchmark until BFCL and Terminal-Bench are wired into the data layer.

Why is BFCL not part of the score yet?

BFCL is specifically relevant to function calling and tool use, but the page only includes benchmark families once we have a reliable, auditable feed. Until then, BFCL is treated as a planned source rather than an implied score.

Why do search and document scores matter for agents?

Agents often need to retrieve information, inspect documents, and act on context before producing an answer. Search Arena and Document Arena are proxy signals for those parts of the workflow.

How should latency be interpreted here?

Latency is time to first token. Lower latency can make iterative agent loops feel faster, but it should be balanced against the model's task quality and tool reliability.