A model access gate with security classifiers sorting safe requests from blocked cyber-risk paths
A model access gate with security classifiers sorting safe requests from blocked cyber-risk paths
+ Anthropic News

Anthropic restores Fable 5 and proposes a jailbreak severity framework

Fable 5 returns after US export controls were lifted, but the bigger change is Anthropic's push for a common way to score AI jailbreak risk.

about 2 hours ago

Anthropic says US export controls on Claude Fable 5 and Claude Mythos 5 have been lifted, clearing the way for Fable 5 to return globally on July 1. The model will be available on the Claude Platform, Claude.ai, Claude Code, and Claude Cowork, with cloud surfaces such as AWS, Google Cloud, and Microsoft Foundry to be re-enabled as quickly as possible.

The access change resolves the most visible problem from the June 12 suspension. But the more important part of Anthropic’s post is the framework it wants the industry to build around AI jailbreaks.

Anthropic says it is working with Amazon, Microsoft, Google, and other Glasswing partners on a common way to score jailbreak severity. The proposed criteria are capability gain, breadth of capability gain, ease of weaponization, and discoverability. That is a shift from treating jailbreaks as a binary pass/fail event to treating them more like security vulnerabilities with severity classes.

The restoration is uneven by design

Fable 5 and Mythos 5 are not returning in the same way.

Anthropic says Fable 5 will be available globally. It will be included for up to 50% of weekly usage limits through July 7 for Pro, Max, Team, and select Enterprise plans, then will be available through usage credits. Standard Enterprise seats do not get the included allowance unless they are premium seats; Anthropic says they need usage credits enabled for Fable 5 to work.

Mythos 5 remains more restricted. Anthropic says access has been restored for a set of US organizations after US government approval on June 26, and that the company is coordinating with the government to expand access to more domestic and international Glasswing partners.

That distinction is the story. Anthropic is presenting Fable as the general model with heavier safeguards and Mythos as the more sensitive cybersecurity-capable system. Restoring access does not mean the same governance model applies to both.

The classifier fix has a cost

The incident began after a report from Amazon researchers found a way to bypass Fable 5’s safeguards so the model could identify software vulnerabilities. Anthropic says one case produced code showing how the vulnerability could be exploited.

Anthropic’s post says its testing found the reported behavior did not expose unique Mythos-level cyber capabilities. It also says many less capable models could identify the same vulnerabilities, and that several models could produce the same demonstration. That is Anthropic’s own analysis, but it matters because it frames the issue as a borderline safeguard case rather than a unique release of offensive capability.

The immediate mitigation is a new safety classifier. Anthropic says the classifier blocks the specific technique described in the Amazon report in over 99% of cases. The caveat is practical: the company also says the new classifier will flag benign routine coding and debugging requests more often.

That is the trade-off users will feel. A stronger cyber safety margin can protect against misuse, but it can also create false positives for legitimate security and debugging work. If Fable 5 is going back into Claude Code and enterprise workflows, Anthropic will have to tune that boundary without making ordinary developers feel like routine analysis is randomly unavailable.

Jailbreak scoring is the bigger precedent

The proposed framework is more consequential than one classifier patch. Anthropic says the industry lacks a consensus way to describe jailbreak severity in objective terms. That creates uncertainty for labs, governments, and customers when a new technique appears.

The four criteria are sensible because they separate different risks. A jailbreak that unlocks a capability already available through common tools is not the same as one that gives non-experts access to expert-level offensive behavior. A narrow technique is not the same as a universal one. A finding that requires many retries and specialist prompting is not the same as a one-shot method spreading online.

That kind of scoring would not remove judgment. It would make the judgment more visible. Labs could triage findings consistently. Governments could avoid overreacting to low-severity reports or underreacting to serious ones. Customers could ask better questions about what a reported jailbreak actually enabled.

The analogy is not perfect, but the security world already has a model for this: vulnerability severity is not only about whether a bug exists. It is about exploitability, impact, preconditions, and real-world exposure. AI jailbreaks need a similar vocabulary if advanced models are going to be reviewed, patched, and released under pressure.

Sources

The AI Feed Desk

The AI Feed Desk

Editorial desk

The AI Feed Desk tracks AI provider updates, model releases, agent tooling, and enterprise adoption, turning fast-moving announcements into source-linked context for builders and operators.

Noticed a typo, incorrect information, or translation error?

Tell us so we can fix it.

Help Improve This Article

Related Articles

Anthropic suspends Claude Fable 5 and Mythos 5 after US directive

Anthropic says it disabled Claude Fable 5 and Claude Mythos 5 for all customers after a US export-control directive covering foreign-national access.

The AI Feed Desk

By The AI Feed Desk

Anthropic maps AI-enabled cyber threats to MITRE ATT&CK

Anthropic analyzed 832 banned malicious cyber accounts and found AI use moving from basic access work into lateral movement, account discovery, and chained attack activity.

The AI Feed Desk

By The AI Feed Desk

Anthropic releases Claude Fable 5 and Claude Mythos 5

Anthropic's first broadly available Mythos-class model arrives as Claude Fable 5, with sensitive requests routed to Opus 4.8 and Mythos 5 reserved for trusted access.

The AI Feed Desk

By The AI Feed Desk

Anthropic launches Claude Tag for shared Slack agent work

Claude Tag puts a shared Claude inside Slack channels for Team and Enterprise customers, with scoped memory, admin controls, tool access, and asynchronous task work.

The AI Feed Desk

By The AI Feed Desk

Claude reaches Microsoft Foundry with Azure governance and GB300 compute

Anthropic made Claude generally available in Microsoft Foundry, while NVIDIA framed the Azure deployment as a GB300 Blackwell Ultra agent platform.

The AI Feed Desk

By The AI Feed Desk