Sunday, July 5, 2026

News

Anthropic, Amazon, Microsoft and Google Build Joint AI Jailbreak Severity Scale

AI SafetyPatryk RabaJuly 4, 2026

Anthropic, working with Amazon, Microsoft and Google, has developed a five-tier scale for rating how dangerous AI jailbreak techniques used in cyberattacks are. The goal is to replace chaotic, ad hoc decisions about cutting off access to the most powerful models, like the recent shutdown of Claude Fable 5.

Contents
  1. A Scale From Informational to Critical
  2. Background on the Fable 5 Case
  3. Four Categories of Use
  4. What It Means for Poland

Anthropic has released details of a new severity scale for jailbreaks, techniques used to bypass safeguards in language models, developed jointly with Amazon, Microsoft and Google. The company published it on July 2, alongside a description of the safeguards built into Claude Fable 5, which had returned to global availability just days earlier after the US government temporarily blocked access to it.

A Scale From Informational to Critical

The new system is called Cyber Jailbreak Severity, or CJS, and sorts discovered vulnerabilities into five levels, from CJS-0, purely informational, to CJS-4, rated critical. The rating rests on four axes: how much a given technique boosts an attacker's real capabilities beyond what existing non-AI tools already provide, how many different targets and offensive tasks it can be used for, how much effort is needed to turn the model's output into a working attack, and how easily someone else could discover the same technique independently.

Anthropic argues that the industry has so far lacked an agreed standard for classifying jailbreaks, which made any formal model review process before release difficult. The company's statement says the joint work on the standard is meant to let the technology be used for defensive purposes while limiting the potential for abuse.

Background on the Fable 5 Case

The context here is the Fable 5 saga itself. The US Department of Commerce imposed export controls on the model on June 12 after a vulnerability exploited for offensive cybersecurity tasks was discovered. Because Anthropic could not verify users' nationality in real time, the company suspended access to the model globally, for everyone, not just customers subject to the controls.

The restrictions were lifted on June 30, and Fable 5 returned to Claude.ai, Claude Platform, Claude Code and Claude Cowork on July 1, this time with a new classifier that Anthropic says blocks the specific reported technique in more than 99 percent of cases. Its sibling model, Mythos 5, which shares the same core but has looser safety restrictions, is for now returning only to select US organizations that have passed additional government vetting.

Four Categories of Use

Fable 5's own safeguards rest on a four-tier breakdown of cybersecurity-related uses. Prohibited uses are blocked outright. High-risk dual-use applications are blocked until better access controls are in place. Low-risk dual-use applications are monitored and sometimes blocked out of caution. Benign uses are permitted, though still watched by the system.

The purpose of this architecture is practical: models like Fable 5 are useful to security teams hunting for vulnerabilities in their own systems, but potentially dangerous in the hands of attackers. Anthropic wants the new CJS standard to route such findings into a structured evaluation process going forward, instead of automatically escalating them into emergency export controls, as happened in June.

What It Means for Poland

For Polish cybersecurity firms and public institutions, this is not an abstract issue. Poland recently gained access to GPT-5.6 Cyber, an OpenAI model variant being deployed to NASK (Poland's national research and academic network operator) and CERT Polska (Poland's national computer emergency response team), showing that dual-use cybersecurity models are already reaching national defensive infrastructure. A shared jailbreak severity standard, if adopted by other labs, would also make it easier for Polish security teams to assess risk when deploying similar tools.

It remains an open question whether other major labs, including OpenAI, will formally adopt the CJS scale or treat it as a proposal for further negotiation. For now the framework has the status of an early draft, and Anthropic is inviting industry, government and academic collaborators to weigh in. The next test will be whether future jailbreak findings on other models get evaluated against the same scale, or whether each company keeps operating by its own undisclosed criteria.

Sources: More details on Fable 5's cyber safeguards and our jailbreak framework (anthropic.com), Anthropic restores AI models Fable, Mythos after the U.S. lifts export controls (coindesk.com), Claude Fable 5 is making a dramatic return with extraordinarily strong safeguards (9to5google.com).

Share: