Anthropic says its Project Glasswing effort has used Claude Mythos Preview, with roughly 50 partners, to find more than 10,000 high- or critical-severity vulnerabilities across important software. The headline is not just that an AI model can find bugs. It is that the operational bottleneck may be moving from vulnerability discovery to triage, coordinated disclosure, patch engineering and safe deployment.
That makes Project Glasswing a cybersecurity story, a frontier-model story and a software-maintenance story at the same time. The strongest immediate reading is defensive: organisations with critical code now have a brief window to use these tools before similar capabilities become easier for attackers to access. The uncomfortable part is that validation and patching capacity may not scale as quickly as model capability.
What Anthropic reported
Anthropic published its initial Project Glasswing update on 22 May 2026. The company said it launched Glasswing the previous month as a collaborative effort to secure critical software before increasingly capable AI models can be used against it. Anthropic says Claude Mythos Preview and approximately 50 partners have found more than 10,000 high- or critical-severity vulnerabilities.
The company also made a crucial caveat: many details cannot be disclosed immediately because vulnerability disclosure is intentionally delayed to protect end users. Anthropic points to the common 90-day disclosure convention, or roughly 45 days after a patch becomes available, and says detailed technical write-ups will come later once patches are widely deployed.
| Claim | Reported figure | Source status | What it means |
|---|---|---|---|
| Project partners | Approximately 50 | Anthropic-reported | Glasswing is not a single internal scan. It is a multi-organisation defensive effort. |
| Partner findings | More than 10,000 high- or critical-severity vulnerabilities | Anthropic-reported | The scale is large enough to stress normal triage and patching workflows. |
| Open-source projects scanned | More than 1,000 | Anthropic-reported | Anthropic is scanning widely used open-source code, not only partner-owned code. |
| Open-source total findings | 23,019 estimated vulnerabilities | Anthropic-reported | Includes estimated low, medium, high and critical findings before full validation. |
| Open-source high or critical estimates | 6,202 | Anthropic-reported | The most important subset for urgent security work. |
| Independently assessed high or critical estimates | 1,752 | Anthropic-reported | Findings assessed by six independent security research firms or Anthropic in a small number of cases. |
| Valid true positives among assessed findings | 1,587, or 90.6% | Anthropic-reported | Suggests the signal is high, though it applies to the assessed subset. |
| Confirmed high or critical among assessed findings | 1,094, or 62.4% | Anthropic-reported | Indicates many findings remained severe after review, but not all did. |
The most important sentence in Anthropic's update is not the biggest number. It is the observation that progress used to be limited by finding vulnerabilities, while now it is limited by how quickly humans can verify, disclose and patch the vulnerabilities AI finds.
Why this is different from another bug bounty story
Bug bounties, fuzzing, static analysis and elite manual review have all found important vulnerabilities for years. Glasswing is different because the claims sit at the intersection of scale, reasoning and exploit validation.
Cloudflare, one of the partner organisations, described Mythos Preview as a clear step forward rather than a small refinement. Cloudflare said the model stood out in two areas: constructing exploit chains and generating proofs. In practical terms, that means the model was not merely pointing at suspicious code. It could connect lower-level primitives into higher-severity exploit paths and test whether a suspected bug behaved as expected.
Mozilla's public account is also unusually concrete. It says Firefox 150 included fixes for 271 vulnerabilities identified during its initial Claude Mythos Preview evaluation. Mozilla had previously used Claude Opus 4.6 for Firefox 148 and found 22 security-sensitive bugs, making its Mythos result more than ten times larger by Mozilla's own comparison.
| Organisation | Publicly reported result | Practical signal | Caveat |
|---|---|---|---|
| Cloudflare | 2,000 bugs found, including 400 high or critical, according to Anthropic's summary of Cloudflare's report | Mythos could produce useful findings across critical-path systems | Cloudflare's detailed false-positive rate is described qualitatively, not as a public percentage. |
| Mozilla | 271 Firefox vulnerabilities fixed in Firefox 150 after initial Mythos evaluation | Strong evidence that AI-assisted review can surface real browser bugs | Mozilla's own posts note the work required harnessing, steering and filtering models. |
| wolfSSL | 8 CVEs and wolfSSL 5.9.1 triggered by Mythos findings | A concrete cryptography-library example with shipped fixes | The public wolfSSL post is vendor-reported. |
| Palo Alto Networks | 26 CVEs representing 75 issues in one May advisory, versus usual volume below 5 CVEs in a month | AI-assisted review may increase security-advisory volume sharply | Palo Alto says this reflects frontier AI testing and broader process changes. |
| Microsoft | Larger Patch Tuesday releases expected for some time | Large vendors are preparing for a higher-volume disclosure environment | Microsoft does not attribute the change to one model. |
| Oracle | Monthly Critical Security Patch Updates starting in May 2026 | Patch cadence is changing in response to faster discovery | Oracle frames this as AI plus existing security operations. |
The open-source numbers deserve careful reading
The open-source part of Glasswing is especially important because it points at supply-chain risk. Anthropic says it scanned more than 1,000 open-source projects that underpin much of the internet and its own infrastructure. The model estimated 23,019 total vulnerabilities, with 6,202 in the high- or critical-severity category.
Those are not all confirmed vulnerabilities. Anthropic says 1,752 high- or critical-rated findings have been assessed so far. Of those, 1,587 were valid true positives and 1,094 were confirmed as high or critical. Anthropic projects that, if no further vulnerabilities were found, current post-triage true-positive rates would put Mythos on track to surface nearly 3,900 high- or critical-severity open-source vulnerabilities, in addition to partner findings.
| Metric | Count | Percentage | Interpretation |
|---|---|---|---|
| Total estimated open-source findings | 23,019 | 100% | Full estimated pool across severity levels. |
| Estimated high or critical open-source findings | 6,202 | 26.9% of total estimated findings | Priority pool for urgent review. |
| Assessed high or critical-rated findings | 1,752 | 28.2% of the estimated high or critical pool | The reviewed subset so far. |
| Valid true positives in assessed subset | 1,587 | 90.6% of assessed findings | Strong signal if the assessed subset is representative. |
| Confirmed high or critical in assessed subset | 1,094 | 62.4% of assessed findings | Severe findings remain a majority after review. |
| Projected high or critical open-source findings at current rates | Nearly 3,900 | About 62.4% of 6,202 | Anthropic's forward-looking estimate based on current triage rates. |
This is the most defensible way to write the story: Anthropic is not saying every estimated finding is already confirmed. It is saying the confirmed subset is large and accurate enough to change the defender workload.
The wolfSSL case shows why the stakes are high
Anthropic highlighted one open-source example in wolfSSL, a widely used cryptography library. wolfSSL says Mythos findings generated 8 CVEs and triggered the release of wolfSSL 5.9.1. Anthropic says one now-patched issue, assigned CVE-2026-5194, involved certificate forgery risk that could allow an attacker to make a fake website appear legitimate to an end user.
That example matters because cryptography libraries are systemic. A single library can sit inside devices, applications and infrastructure far removed from the original maintainers. If AI models can uncover bugs in code that already goes through reviews, fuzzing, static analysis and external audit, defenders need a way to feed AI-assisted discovery into existing release and dependency management systems without overwhelming maintainers.
Benchmarks back the concern, but benchmarks are not the whole story
The UK AI Security Institute says Claude Mythos Preview and GPT-5.5 have substantially exceeded previous cyber time-horizon trends in its narrow cyber task suite. AISI also says a newer Mythos Preview checkpoint was the first model to complete both of its cyber ranges end to end.
The ExploitGym paper adds another piece. It defines exploitation as turning a vulnerability into concrete security impact, such as unauthorised file access or code execution. The benchmark includes 898 instances from real-world vulnerabilities across userspace programs, Google's V8 JavaScript engine and the Linux kernel. The paper reports that Claude Mythos Preview and GPT-5.5 were the strongest configurations, producing working exploits for 157 and 120 instances respectively.
| Evidence type | What it supports | What it does not prove on its own |
|---|---|---|
| Partner reports | AI-assisted scanning is producing real patches in real codebases. | It does not provide a universal false-positive rate across all code. |
| Open-source triage data | Anthropic has a large reviewed subset with high true-positive rates. | It does not mean every estimated finding is already validated. |
| AISI cyber ranges | Frontier models are improving on autonomous cyber tasks. | Lab ranges do not equal every real-world attack path. |
| ExploitGym | AI agents can turn some real vulnerabilities into working exploits. | Benchmark success does not directly translate to unrestricted real-world capability. |
| Vendor patching changes | Major vendors expect larger or more frequent patch work. | Not every increase is caused by Mythos alone. |
The responsible conclusion is not panic. It is that model-assisted vulnerability discovery is now credible enough that security teams should treat it as an operational planning issue.
The new bottleneck: triage, disclosure and patch delivery
Finding a vulnerability is only the first step. Someone still needs to decide whether it is real, how severe it is, whether it is exploitable in realistic conditions, which versions are affected, how to fix it, how to test the fix, how to disclose it, how to ship it and how to get users to update.
That is where Glasswing becomes a systems problem. A model can generate thousands of leads faster than a response team can safely process them. If maintainers receive low-quality reports, they drown. If high-quality reports arrive faster than patch teams can act, attackers may gain a map of what to study after patches ship. If disclosure is rushed, users may be exposed. If disclosure is too slow, risk remains buried.
| Workflow stage | Old constraint | New constraint if Glasswing scales | What leaders should change |
|---|---|---|---|
| Discovery | Scarce elite researchers and uneven scanner coverage | AI can generate many more plausible findings | Build model-assisted scanning into secure development lifecycles. |
| Validation | Confirming a small flow of reports | Confirming a flood of high-signal reports | Add repeatable reproduction, severity and exploitability workflows. |
| Disclosure | Coordinating a manageable set of vulnerabilities | Coordinating many simultaneous reports across dependencies | Strengthen coordinated vulnerability disclosure capacity. |
| Patching | Scheduled release trains and emergency fixes | Larger routine security releases | Prepare teams and customers for higher patch volume. |
| Prioritisation | CVSS score plus local context | CVSS plus exploitability, exposure, KEV status and asset criticality | Use risk-based patch queues rather than raw severity alone. |
| Customer adoption | Getting users to install updates | Getting users to handle more frequent updates | Invest in automated updates, exposure reduction and detection. |
What defenders should do now
The near-term defensive playbook is not complicated, but it is operationally demanding. Security teams should assume discovery volume will rise, and that competitors, researchers and eventually attackers will have better tools for code reasoning and exploit development.
Start with the code that matters most: internet-facing services, authentication, parsers, sandbox boundaries, cryptography, update mechanisms, administrative interfaces and heavily used open-source dependencies. Apply AI-assisted review there first, but do not treat model output as ground truth. The Mozilla and Cloudflare examples both point to the importance of harnesses, context and human review.
Second, widen patch capacity before the queue arrives. That means clear ownership for security fixes, faster test pipelines, release processes for emergency and routine patches, and an agreed standard for when a finding is credible enough to move.
Third, join up vulnerability management with exposure management. A high CVSS score is not always the same thing as immediate organisational risk, and a lower-scored bug in an exposed critical asset can be urgent. CISA's Known Exploited Vulnerabilities catalogue remains useful because exploitation in the wild is a strong prioritisation signal. FIRST's CVSS framework is also useful, but CVSS itself says consumers should enrich base metrics with threat and environmental context.
The strategic read
Project Glasswing is a preview of a world where software security moves faster on both sides. Defenders may finally get tools that can find classes of bugs before attackers do. Attackers may eventually get similar capabilities. The difference will come down to whether serious software organisations can turn discovery into disciplined remediation faster than adversaries can turn discovery into exploitation.
Anthropic's strongest claim is that the security bottleneck has moved. The evidence so far supports taking that claim seriously, with caveats. The numbers are large, partner reports are concrete and independent evaluations point in the same direction. But the article should not overstate certainty. Many findings remain undisclosed, some figures are company-reported and the detailed technical evidence will arrive slowly because responsible disclosure exists for a reason.
For now, the practical lesson is clear: AI-assisted vulnerability discovery is no longer a research curiosity. It is becoming a patch-management, software-supply-chain and executive-risk problem.

About the author
Hi, I'm Jason Futrill.
I'm an tech professional and commentator exploring how intelligent systems are reshaping work, creativity, and society.
More about me



