Anthropic Glasswing: Claude Mythos and the AI bug flood

Anthropic says its Project Glasswing effort has used Claude Mythos Preview, with roughly 50 partners, to find more than 10,000 high- or critical-severity vulnerabilities across important software. The headline is not just that an AI model can find bugs. It is that the operational bottleneck may be moving from vulnerability discovery to triage, coordinated disclosure, patch engineering and safe deployment.

That makes Project Glasswing a cybersecurity story, a frontier-model story and a software-maintenance story at the same time. The strongest immediate reading is defensive: organisations with critical code now have a brief window to use these tools before similar capabilities become easier for attackers to access. The uncomfortable part is that validation and patching capacity may not scale as quickly as model capability.

What Anthropic reported

Anthropic published its initial Project Glasswing update on 22 May 2026. The company said it launched Glasswing the previous month as a collaborative effort to secure critical software before increasingly capable AI models can be used against it. Anthropic says Claude Mythos Preview and approximately 50 partners have found more than 10,000 high- or critical-severity vulnerabilities.

The company also made a crucial caveat: many details cannot be disclosed immediately because vulnerability disclosure is intentionally delayed to protect end users. Anthropic points to the common 90-day disclosure convention, or roughly 45 days after a patch becomes available, and says detailed technical write-ups will come later once patches are widely deployed.

Claim	Reported figure	Source status	What it means
Project partners	Approximately 50	Anthropic-reported	Glasswing is not a single internal scan. It is a multi-organisation defensive effort.
Partner findings	More than 10,000 high- or critical-severity vulnerabilities	Anthropic-reported	The scale is large enough to stress normal triage and patching workflows.
Open-source projects scanned	More than 1,000	Anthropic-reported	Anthropic is scanning widely used open-source code, not only partner-owned code.
Open-source total findings	23,019 estimated vulnerabilities	Anthropic-reported	Includes estimated low, medium, high and critical findings before full validation.
Open-source high or critical estimates	6,202	Anthropic-reported	The most important subset for urgent security work.
Independently assessed high or critical estimates	1,752	Anthropic-reported	Findings assessed by six independent security research firms or Anthropic in a small number of cases.
Valid true positives among assessed findings	1,587, or 90.6%	Anthropic-reported	Suggests the signal is high, though it applies to the assessed subset.
Confirmed high or critical among assessed findings	1,094, or 62.4%	Anthropic-reported	Indicates many findings remained severe after review, but not all did.

The most important sentence in Anthropic's update is not the biggest number. It is the observation that progress used to be limited by finding vulnerabilities, while now it is limited by how quickly humans can verify, disclose and patch the vulnerabilities AI finds.

Why this is different from another bug bounty story

Bug bounties, fuzzing, static analysis and elite manual review have all found important vulnerabilities for years. Glasswing is different because the claims sit at the intersection of scale, reasoning and exploit validation.

Cloudflare, one of the partner organisations, described Mythos Preview as a clear step forward rather than a small refinement. Cloudflare said the model stood out in two areas: constructing exploit chains and generating proofs. In practical terms, that means the model was not merely pointing at suspicious code. It could connect lower-level primitives into higher-severity exploit paths and test whether a suspected bug behaved as expected.

Mozilla's public account is also unusually concrete. It says Firefox 150 included fixes for 271 vulnerabilities identified during its initial Claude Mythos Preview evaluation. Mozilla had previously used Claude Opus 4.6 for Firefox 148 and found 22 security-sensitive bugs, making its Mythos result more than ten times larger by Mozilla's own comparison.

Organisation	Publicly reported result	Practical signal	Caveat
Cloudflare	2,000 bugs found, including 400 high or critical, according to Anthropic's summary of Cloudflare's report	Mythos could produce useful findings across critical-path systems	Cloudflare's detailed false-positive rate is described qualitatively, not as a public percentage.
Mozilla	271 Firefox vulnerabilities fixed in Firefox 150 after initial Mythos evaluation	Strong evidence that AI-assisted review can surface real browser bugs	Mozilla's own posts note the work required harnessing, steering and filtering models.
wolfSSL	8 CVEs and wolfSSL 5.9.1 triggered by Mythos findings	A concrete cryptography-library example with shipped fixes	The public wolfSSL post is vendor-reported.
Palo Alto Networks	26 CVEs representing 75 issues in one May advisory, versus usual volume below 5 CVEs in a month	AI-assisted review may increase security-advisory volume sharply	Palo Alto says this reflects frontier AI testing and broader process changes.
Microsoft	Larger Patch Tuesday releases expected for some time	Large vendors are preparing for a higher-volume disclosure environment	Microsoft does not attribute the change to one model.
Oracle	Monthly Critical Security Patch Updates starting in May 2026	Patch cadence is changing in response to faster discovery	Oracle frames this as AI plus existing security operations.

The open-source numbers deserve careful reading

The open-source part of Glasswing is especially important because it points at supply-chain risk. Anthropic says it scanned more than 1,000 open-source projects that underpin much of the internet and its own infrastructure. The model estimated 23,019 total vulnerabilities, with 6,202 in the high- or critical-severity category.

Those are not all confirmed vulnerabilities. Anthropic says 1,752 high- or critical-rated findings have been assessed so far. Of those, 1,587 were valid true positives and 1,094 were confirmed as high or critical. Anthropic projects that, if no further vulnerabilities were found, current post-triage true-positive rates would put Mythos on track to surface nearly 3,900 high- or critical-severity open-source vulnerabilities, in addition to partner findings.

Metric	Count	Percentage	Interpretation
Total estimated open-source findings	23,019	100%	Full estimated pool across severity levels.
Estimated high or critical open-source findings	6,202	26.9% of total estimated findings	Priority pool for urgent review.
Assessed high or critical-rated findings	1,752	28.2% of the estimated high or critical pool	The reviewed subset so far.
Valid true positives in assessed subset	1,587	90.6% of assessed findings	Strong signal if the assessed subset is representative.
Confirmed high or critical in assessed subset	1,094	62.4% of assessed findings	Severe findings remain a majority after review.
Projected high or critical open-source findings at current rates	Nearly 3,900	About 62.4% of 6,202	Anthropic's forward-looking estimate based on current triage rates.

This is the most defensible way to write the story: Anthropic is not saying every estimated finding is already confirmed. It is saying the confirmed subset is large and accurate enough to change the defender workload.

The wolfSSL case shows why the stakes are high

Anthropic highlighted one open-source example in wolfSSL, a widely used cryptography library. wolfSSL says Mythos findings generated 8 CVEs and triggered the release of wolfSSL 5.9.1. Anthropic says one now-patched issue, assigned CVE-2026-5194, involved certificate forgery risk that could allow an attacker to make a fake website appear legitimate to an end user.

That example matters because cryptography libraries are systemic. A single library can sit inside devices, applications and infrastructure far removed from the original maintainers. If AI models can uncover bugs in code that already goes through reviews, fuzzing, static analysis and external audit, defenders need a way to feed AI-assisted discovery into existing release and dependency management systems without overwhelming maintainers.

Benchmarks back the concern, but benchmarks are not the whole story

The UK AI Security Institute says Claude Mythos Preview and GPT-5.5 have substantially exceeded previous cyber time-horizon trends in its narrow cyber task suite. AISI also says a newer Mythos Preview checkpoint was the first model to complete both of its cyber ranges end to end.

The ExploitGym paper adds another piece. It defines exploitation as turning a vulnerability into concrete security impact, such as unauthorised file access or code execution. The benchmark includes 898 instances from real-world vulnerabilities across userspace programs, Google's V8 JavaScript engine and the Linux kernel. The paper reports that Claude Mythos Preview and GPT-5.5 were the strongest configurations, producing working exploits for 157 and 120 instances respectively.

Evidence type	What it supports	What it does not prove on its own
Partner reports	AI-assisted scanning is producing real patches in real codebases.	It does not provide a universal false-positive rate across all code.
Open-source triage data	Anthropic has a large reviewed subset with high true-positive rates.	It does not mean every estimated finding is already validated.
AISI cyber ranges	Frontier models are improving on autonomous cyber tasks.	Lab ranges do not equal every real-world attack path.
ExploitGym	AI agents can turn some real vulnerabilities into working exploits.	Benchmark success does not directly translate to unrestricted real-world capability.
Vendor patching changes	Major vendors expect larger or more frequent patch work.	Not every increase is caused by Mythos alone.

The responsible conclusion is not panic. It is that model-assisted vulnerability discovery is now credible enough that security teams should treat it as an operational planning issue.

The new bottleneck: triage, disclosure and patch delivery

Finding a vulnerability is only the first step. Someone still needs to decide whether it is real, how severe it is, whether it is exploitable in realistic conditions, which versions are affected, how to fix it, how to test the fix, how to disclose it, how to ship it and how to get users to update.

That is where Glasswing becomes a systems problem. A model can generate thousands of leads faster than a response team can safely process them. If maintainers receive low-quality reports, they drown. If high-quality reports arrive faster than patch teams can act, attackers may gain a map of what to study after patches ship. If disclosure is rushed, users may be exposed. If disclosure is too slow, risk remains buried.

Workflow stage	Old constraint	New constraint if Glasswing scales	What leaders should change
Discovery	Scarce elite researchers and uneven scanner coverage	AI can generate many more plausible findings	Build model-assisted scanning into secure development lifecycles.
Validation	Confirming a small flow of reports	Confirming a flood of high-signal reports	Add repeatable reproduction, severity and exploitability workflows.
Disclosure	Coordinating a manageable set of vulnerabilities	Coordinating many simultaneous reports across dependencies	Strengthen coordinated vulnerability disclosure capacity.
Patching	Scheduled release trains and emergency fixes	Larger routine security releases	Prepare teams and customers for higher patch volume.
Prioritisation	CVSS score plus local context	CVSS plus exploitability, exposure, KEV status and asset criticality	Use risk-based patch queues rather than raw severity alone.
Customer adoption	Getting users to install updates	Getting users to handle more frequent updates	Invest in automated updates, exposure reduction and detection.

What defenders should do now

The near-term defensive playbook is not complicated, but it is operationally demanding. Security teams should assume discovery volume will rise, and that competitors, researchers and eventually attackers will have better tools for code reasoning and exploit development.

Start with the code that matters most: internet-facing services, authentication, parsers, sandbox boundaries, cryptography, update mechanisms, administrative interfaces and heavily used open-source dependencies. Apply AI-assisted review there first, but do not treat model output as ground truth. The Mozilla and Cloudflare examples both point to the importance of harnesses, context and human review.

Second, widen patch capacity before the queue arrives. That means clear ownership for security fixes, faster test pipelines, release processes for emergency and routine patches, and an agreed standard for when a finding is credible enough to move.

Third, join up vulnerability management with exposure management. A high CVSS score is not always the same thing as immediate organisational risk, and a lower-scored bug in an exposed critical asset can be urgent. CISA's Known Exploited Vulnerabilities catalogue remains useful because exploitation in the wild is a strong prioritisation signal. FIRST's CVSS framework is also useful, but CVSS itself says consumers should enrich base metrics with threat and environmental context.

The strategic read

Project Glasswing is a preview of a world where software security moves faster on both sides. Defenders may finally get tools that can find classes of bugs before attackers do. Attackers may eventually get similar capabilities. The difference will come down to whether serious software organisations can turn discovery into disciplined remediation faster than adversaries can turn discovery into exploitation.

Anthropic's strongest claim is that the security bottleneck has moved. The evidence so far supports taking that claim seriously, with caveats. The numbers are large, partner reports are concrete and independent evaluations point in the same direction. But the article should not overstate certainty. Many findings remain undisclosed, some figures are company-reported and the detailed technical evidence will arrive slowly because responsible disclosure exists for a reason.

For now, the practical lesson is clear: AI-assisted vulnerability discovery is no longer a research curiosity. It is becoming a patch-management, software-supply-chain and executive-risk problem.

About the author