Thoughts on AI, technology, and the future we're building.
New posts every week
A pragmatic agent-readiness view using the source-backed signals we have today: search, document work, code performance, and latency. BFCL and Terminal-Bench are called out as planned feed integrations.
Ranked models
128
Search, docs, code, or latency signal
Search Arena
29
Models with public search scores
Docs Arena
0
Models with document scores
Top proxy
Claude Opus 4.7 Thinking
100 agent proxy
Agent formula
This page is intentionally labeled as a proxy until BFCL and Terminal-Bench are wired. It ranks search, docs, coding, and latency signals that matter for agent workflows without claiming they are a full agent benchmark.
Higher proxy index is better. BFCL remains a planned feed, not a silent placeholder.
Top 30 of 128
| Rank | Model | Index | Search | Docs | Latency | Tool use | Sources |
|---|---|---|---|---|---|---|---|
| #1 | 100agent proxy | Not listedSearch Arena | Not listedDocs Arena | Not listedlatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #2 | 100agent proxy | 1,223Search Arena rank #1 | Not listedDocs Arena | Not listedlatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #3 | 100agent proxy | Not listedSearch Arena | Not listedDocs Arena | 0.55slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #4 | 100agent proxy | Not listedSearch Arena | Not listedDocs Arena | 0.55slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #5 | 100agent proxy | Not listedSearch Arena | Not listedDocs Arena | 0.56slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #6 | 100agent proxy | Not listedSearch Arena | Not listedDocs Arena | 0.72slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #7 | 100agent proxy | Not listedSearch Arena | Not listedDocs Arena | 0.96slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #8 | 100agent proxy | Not listedSearch Arena | Not listedDocs Arena | 1.16slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #9 | 100agent proxy | Not listedSearch Arena | Not listedDocs Arena | 1.18slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #10 | 100agent proxy | Not listedSearch Arena | Not listedDocs Arena | 1.20slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #11 | 100agent proxy | Not listedSearch Arena | Not listedDocs Arena | 1.28slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #12 | 100agent proxy | Not listedSearch Arena | Not listedDocs Arena | 1.28slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #13 | 100agent proxy | Not listedSearch Arena | Not listedDocs Arena | 1.33slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #14 | 99agent proxy | Not listedSearch Arena | Not listedDocs Arena | 1.43slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #15 | 99agent proxy | Not listedSearch Arena | Not listedDocs Arena | 1.56slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #16 | 99agent proxy | Not listedSearch Arena | Not listedDocs Arena | 1.75slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #17 | 99agent proxy | Not listedSearch Arena | Not listedDocs Arena | 1.93slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #18 | 99agent proxy | Not listedSearch Arena | Not listedDocs Arena | 2.26slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #19 | 99agent proxy | Not listedSearch Arena | Not listedDocs Arena | 2.73slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #20 | 98agent proxy | Not listedSearch Arena | Not listedDocs Arena | 3.50slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #21 | 98agent proxy | 1,219Search Arena rank #2 | Not listedDocs Arena | Not listedlatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #22 | 97agent proxy | 1,214Search Arena rank #3 | Not listedDocs Arena | 1.14slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #23 | 96agent proxy | Not listedSearch Arena | Not listedDocs Arena | 7.78slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #24 | 96agent proxy | Not listedSearch Arena | Not listedDocs Arena | 1.49slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #25 | 96agent proxy | Not listedSearch Arena | Not listedDocs Arena | 1.12slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #26 | 96agent proxy | Not listedSearch Arena | Not listedDocs Arena | 9.41slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #27 | 95agent proxy | Not listedSearch Arena | Not listedDocs Arena | 11.0slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #28 | 95agent proxy | Not listedSearch Arena | Not listedDocs Arena | 11.3slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #29 | 95agent proxy | Not listedSearch Arena | Not listedDocs Arena | 1.13slatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena | |
| #30 | 95agent proxy | Not listedSearch Arena | Not listedDocs Arena | Not listedlatency | BFCL pendingtool calling feed | Search ArenaDocs ArenaCode Arena |
Benchmark guide
A quick reading key for agent-oriented benchmark signals, especially where tool-use benchmarks are still source-ready rather than live in the table.
The agent proxy blends available search, document, coding, and latency signals that matter for multi-step workflows. It is useful for orientation, but it is not a full tool-use benchmark until BFCL and Terminal-Bench are wired into the data layer.
BFCL is specifically relevant to function calling and tool use, but the page only includes benchmark families once we have a reliable, auditable feed. Until then, BFCL is treated as a planned source rather than an implied score.
Agents often need to retrieve information, inspect documents, and act on context before producing an answer. Search Arena and Document Arena are proxy signals for those parts of the workflow.
Latency is time to first token. Lower latency can make iterative agent loops feel faster, but it should be balanced against the model's task quality and tool reliability.