Anthropic says its Project Glasswing effort has used Claude Mythos Preview, with roughly 50 partners, to find more than 10,000 high- or critical-severity vulnerabilities across important software. The headline is not just that an AI model can find bugs. It is that the operational bottleneck may be moving from vulnerability discovery to triage, coordinated disclosure, patch engineering and safe deployment.

That makes Project Glasswing a cybersecurity story, a frontier-model story and a software-maintenance story at the same time. The strongest immediate reading is defensive: organisations with critical code now have a brief window to use these tools before similar capabilities become easier for attackers to access. The uncomfortable part is that validation and patching capacity may not scale as quickly as model capability.

What Anthropic reported

Anthropic published its initial Project Glasswing update on 22 May 2026. The company said it launched Glasswing the previous month as a collaborative effort to secure critical software before increasingly capable AI models can be used against it. Anthropic says Claude Mythos Preview and approximately 50 partners have found more than 10,000 high- or critical-severity vulnerabilities.

The company also made a crucial caveat: many details cannot be disclosed immediately because vulnerability disclosure is intentionally delayed to protect end users. Anthropic points to the common 90-day disclosure convention, or roughly 45 days after a patch becomes available, and says detailed technical write-ups will come later once patches are widely deployed.

ClaimReported figureSource statusWhat it means
Project partnersApproximately 50Anthropic-reportedGlasswing is not a single internal scan. It is a multi-organisation defensive effort.
Partner findingsMore than 10,000 high- or critical-severity vulnerabilitiesAnthropic-reportedThe scale is large enough to stress normal triage and patching workflows.
Open-source projects scannedMore than 1,000Anthropic-reportedAnthropic is scanning widely used open-source code, not only partner-owned code.
Open-source total findings23,019 estimated vulnerabilitiesAnthropic-reportedIncludes estimated low, medium, high and critical findings before full validation.
Open-source high or critical estimates6,202Anthropic-reportedThe most important subset for urgent security work.
Independently assessed high or critical estimates1,752Anthropic-reportedFindings assessed by six independent security research firms or Anthropic in a small number of cases.
Valid true positives among assessed findings1,587, or 90.6%Anthropic-reportedSuggests the signal is high, though it applies to the assessed subset.
Confirmed high or critical among assessed findings1,094, or 62.4%Anthropic-reportedIndicates many findings remained severe after review, but not all did.

The most important sentence in Anthropic's update is not the biggest number. It is the observation that progress used to be limited by finding vulnerabilities, while now it is limited by how quickly humans can verify, disclose and patch the vulnerabilities AI finds.

Why this is different from another bug bounty story

Bug bounties, fuzzing, static analysis and elite manual review have all found important vulnerabilities for years. Glasswing is different because the claims sit at the intersection of scale, reasoning and exploit validation.

Cloudflare, one of the partner organisations, described Mythos Preview as a clear step forward rather than a small refinement. Cloudflare said the model stood out in two areas: constructing exploit chains and generating proofs. In practical terms, that means the model was not merely pointing at suspicious code. It could connect lower-level primitives into higher-severity exploit paths and test whether a suspected bug behaved as expected.

Mozilla's public account is also unusually concrete. It says Firefox 150 included fixes for 271 vulnerabilities identified during its initial Claude Mythos Preview evaluation. Mozilla had previously used Claude Opus 4.6 for Firefox 148 and found 22 security-sensitive bugs, making its Mythos result more than ten times larger by Mozilla's own comparison.

OrganisationPublicly reported resultPractical signalCaveat
Cloudflare2,000 bugs found, including 400 high or critical, according to Anthropic's summary of Cloudflare's reportMythos could produce useful findings across critical-path systemsCloudflare's detailed false-positive rate is described qualitatively, not as a public percentage.
Mozilla271 Firefox vulnerabilities fixed in Firefox 150 after initial Mythos evaluationStrong evidence that AI-assisted review can surface real browser bugsMozilla's own posts note the work required harnessing, steering and filtering models.
wolfSSL8 CVEs and wolfSSL 5.9.1 triggered by Mythos findingsA concrete cryptography-library example with shipped fixesThe public wolfSSL post is vendor-reported.
Palo Alto Networks26 CVEs representing 75 issues in one May advisory, versus usual volume below 5 CVEs in a monthAI-assisted review may increase security-advisory volume sharplyPalo Alto says this reflects frontier AI testing and broader process changes.
MicrosoftLarger Patch Tuesday releases expected for some timeLarge vendors are preparing for a higher-volume disclosure environmentMicrosoft does not attribute the change to one model.
OracleMonthly Critical Security Patch Updates starting in May 2026Patch cadence is changing in response to faster discoveryOracle frames this as AI plus existing security operations.

The open-source numbers deserve careful reading

The open-source part of Glasswing is especially important because it points at supply-chain risk. Anthropic says it scanned more than 1,000 open-source projects that underpin much of the internet and its own infrastructure. The model estimated 23,019 total vulnerabilities, with 6,202 in the high- or critical-severity category.

Those are not all confirmed vulnerabilities. Anthropic says 1,752 high- or critical-rated findings have been assessed so far. Of those, 1,587 were valid true positives and 1,094 were confirmed as high or critical. Anthropic projects that, if no further vulnerabilities were found, current post-triage true-positive rates would put Mythos on track to surface nearly 3,900 high- or critical-severity open-source vulnerabilities, in addition to partner findings.

MetricCountPercentageInterpretation
Total estimated open-source findings23,019100%Full estimated pool across severity levels.
Estimated high or critical open-source findings6,20226.9% of total estimated findingsPriority pool for urgent review.
Assessed high or critical-rated findings1,75228.2% of the estimated high or critical poolThe reviewed subset so far.
Valid true positives in assessed subset1,58790.6% of assessed findingsStrong signal if the assessed subset is representative.
Confirmed high or critical in assessed subset1,09462.4% of assessed findingsSevere findings remain a majority after review.
Projected high or critical open-source findings at current ratesNearly 3,900About 62.4% of 6,202Anthropic's forward-looking estimate based on current triage rates.

This is the most defensible way to write the story: Anthropic is not saying every estimated finding is already confirmed. It is saying the confirmed subset is large and accurate enough to change the defender workload.

The wolfSSL case shows why the stakes are high

Anthropic highlighted one open-source example in wolfSSL, a widely used cryptography library. wolfSSL says Mythos findings generated 8 CVEs and triggered the release of wolfSSL 5.9.1. Anthropic says one now-patched issue, assigned CVE-2026-5194, involved certificate forgery risk that could allow an attacker to make a fake website appear legitimate to an end user.

That example matters because cryptography libraries are systemic. A single library can sit inside devices, applications and infrastructure far removed from the original maintainers. If AI models can uncover bugs in code that already goes through reviews, fuzzing, static analysis and external audit, defenders need a way to feed AI-assisted discovery into existing release and dependency management systems without overwhelming maintainers.

Benchmarks back the concern, but benchmarks are not the whole story

The UK AI Security Institute says Claude Mythos Preview and GPT-5.5 have substantially exceeded previous cyber time-horizon trends in its narrow cyber task suite. AISI also says a newer Mythos Preview checkpoint was the first model to complete both of its cyber ranges end to end.

The ExploitGym paper adds another piece. It defines exploitation as turning a vulnerability into concrete security impact, such as unauthorised file access or code execution. The benchmark includes 898 instances from real-world vulnerabilities across userspace programs, Google's V8 JavaScript engine and the Linux kernel. The paper reports that Claude Mythos Preview and GPT-5.5 were the strongest configurations, producing working exploits for 157 and 120 instances respectively.

Evidence typeWhat it supportsWhat it does not prove on its own
Partner reportsAI-assisted scanning is producing real patches in real codebases.It does not provide a universal false-positive rate across all code.
Open-source triage dataAnthropic has a large reviewed subset with high true-positive rates.It does not mean every estimated finding is already validated.
AISI cyber rangesFrontier models are improving on autonomous cyber tasks.Lab ranges do not equal every real-world attack path.
ExploitGymAI agents can turn some real vulnerabilities into working exploits.Benchmark success does not directly translate to unrestricted real-world capability.
Vendor patching changesMajor vendors expect larger or more frequent patch work.Not every increase is caused by Mythos alone.

The responsible conclusion is not panic. It is that model-assisted vulnerability discovery is now credible enough that security teams should treat it as an operational planning issue.

The new bottleneck: triage, disclosure and patch delivery

Finding a vulnerability is only the first step. Someone still needs to decide whether it is real, how severe it is, whether it is exploitable in realistic conditions, which versions are affected, how to fix it, how to test the fix, how to disclose it, how to ship it and how to get users to update.

That is where Glasswing becomes a systems problem. A model can generate thousands of leads faster than a response team can safely process them. If maintainers receive low-quality reports, they drown. If high-quality reports arrive faster than patch teams can act, attackers may gain a map of what to study after patches ship. If disclosure is rushed, users may be exposed. If disclosure is too slow, risk remains buried.

Workflow stageOld constraintNew constraint if Glasswing scalesWhat leaders should change
DiscoveryScarce elite researchers and uneven scanner coverageAI can generate many more plausible findingsBuild model-assisted scanning into secure development lifecycles.
ValidationConfirming a small flow of reportsConfirming a flood of high-signal reportsAdd repeatable reproduction, severity and exploitability workflows.
DisclosureCoordinating a manageable set of vulnerabilitiesCoordinating many simultaneous reports across dependenciesStrengthen coordinated vulnerability disclosure capacity.
PatchingScheduled release trains and emergency fixesLarger routine security releasesPrepare teams and customers for higher patch volume.
PrioritisationCVSS score plus local contextCVSS plus exploitability, exposure, KEV status and asset criticalityUse risk-based patch queues rather than raw severity alone.
Customer adoptionGetting users to install updatesGetting users to handle more frequent updatesInvest in automated updates, exposure reduction and detection.

What defenders should do now

The near-term defensive playbook is not complicated, but it is operationally demanding. Security teams should assume discovery volume will rise, and that competitors, researchers and eventually attackers will have better tools for code reasoning and exploit development.

Start with the code that matters most: internet-facing services, authentication, parsers, sandbox boundaries, cryptography, update mechanisms, administrative interfaces and heavily used open-source dependencies. Apply AI-assisted review there first, but do not treat model output as ground truth. The Mozilla and Cloudflare examples both point to the importance of harnesses, context and human review.

Second, widen patch capacity before the queue arrives. That means clear ownership for security fixes, faster test pipelines, release processes for emergency and routine patches, and an agreed standard for when a finding is credible enough to move.

Third, join up vulnerability management with exposure management. A high CVSS score is not always the same thing as immediate organisational risk, and a lower-scored bug in an exposed critical asset can be urgent. CISA's Known Exploited Vulnerabilities catalogue remains useful because exploitation in the wild is a strong prioritisation signal. FIRST's CVSS framework is also useful, but CVSS itself says consumers should enrich base metrics with threat and environmental context.

The strategic read

Project Glasswing is a preview of a world where software security moves faster on both sides. Defenders may finally get tools that can find classes of bugs before attackers do. Attackers may eventually get similar capabilities. The difference will come down to whether serious software organisations can turn discovery into disciplined remediation faster than adversaries can turn discovery into exploitation.

Anthropic's strongest claim is that the security bottleneck has moved. The evidence so far supports taking that claim seriously, with caveats. The numbers are large, partner reports are concrete and independent evaluations point in the same direction. But the article should not overstate certainty. Many findings remain undisclosed, some figures are company-reported and the detailed technical evidence will arrive slowly because responsible disclosure exists for a reason.

For now, the practical lesson is clear: AI-assisted vulnerability discovery is no longer a research curiosity. It is becoming a patch-management, software-supply-chain and executive-risk problem.

Jason Futrill

About the author

Hi, I'm Jason Futrill.

I'm an tech professional and commentator exploring how intelligent systems are reshaping work, creativity, and society.

More about me