Fable is Back: This Safeguard Has Some AI in It!
Let's analyze Anthropic and the US government's comms
Hey, Alberto here! đ Each week, I publish long-form AI analysis covering culture, philosophy, and business. Paid subscribers get Monday how-to guides and Friday news commentary. If youâd like to become a paid subscriber, hereâs a button for that:
Will keep you updated on the events surrounding Fable until the situation normalizes.
Today, July 1, Anthropicâs Fable 5 is back.
But thereâs some fine print attached to the redeployment, and I want to comment on that. I will quote excerpts from Anthropicâs blog post and Commerce Secretary Lutnickâs letter. To see where the AI industryâs heading, we just need to read between the lines of what they say in public.
Hereâs Anthropicâs blog post:
Fable 5 will be available starting tomorrow, Wednesday, July 1, to users globally on the Claude Platform, Claude.ai, Claude Code, and Claude Cowork. For Pro, Max, Team, and select Enterprise plans,1 Fable 5 will be included for up to 50% of weekly usage limits through July 7, after which it will be available via usage credits.
This is more or less what we had before the export restriction, except for two things: 1) you have one week of Fable under your paid subscription instead of two weeks (and then it moves to a pay-as-you-go credit system, that doesnât change) and 2) only up to 50% of the tokens can go to Fable instead of 100%. (No Mythos either way, as expected.) I find it interesting that they chose the 50% limit. Itâs bad optics in the sense that itâs not clean and it also feels unnecessary. Itâs probably necessary though, or they wouldnât do itâwhich can only mean that they donât have the compute.
The export control directive on June 12 came after the government became aware of a report in which Amazon researchers had found a method of bypassing Fable 5âs safeguards: prompting it so that it identified a number of software vulnerabilities. . . . Our testing confirmed that many less capable modelsâincluding Claude Opus 4.8, GPT-5.5, and Kimi K2.7âcould identify the same vulnerabilities as Fable 5 did in the [Amazon] report.
This jailbreak that sparked the withdrawal of the model. Anthropic is restating what they had already told the government (the gov didnât like this), prompting the export control restriction: the jailbreak is not an issue because it does not bear on Fableâs broader capabilities relative to other models. It is a known, lower-priority jailbreak that poses little actual danger and is found everywhere.
This reads like a defense but, together with the whole âthe industry needs a consistent way to assess and fix potential âjailbreaksâ of AI models,â itâs also a jab at those players not targeted by the government (particularly open-source models).
. . . there are some tasks that are unlikely to be dangerous but are nonetheless blocked by the safeguards out of an abundance of caution. . . .
Tighter safeguards mean lower capabilities and thus greater unreliability for the user. An âabundance of cautionâ is their way of saying the new Fable 5 will be more crippled than the previous version, which was already a downgrade from Mythos. Anthropic tends to err on the side of caution, but this was the governmentâs doing; take this as a sign of whatâs coming with future models.
As stated, this is not a âthat badâ (who doesnât want that bad things donât happen, right?) The problem comes with what âabundanceâ and âcautionâ mean here, about which we have no say whatsoever, nor apparently does Anthropic.
Working closely with the government, we trained an improved safety classifier that targets and blocks the behavior described in the report. Users will be notified if a request to Fable 5 is blocked, and the request will instead be sent to Opus 4.8.
Ok, so no invisible re-routing at all, which is great. âImprovedâ here presumably means fewer false positives, but it can also mean less permission and thus less risk.
The new classifier also comes at the cost of flagging benign requests more often during routine coding and debugging tasks. As with all our safeguards, weâll continue to refine this to better distinguish genuine misuse from legitimate requests and reduce false positives. . . . This âsafety marginâ approach means that a request has to look very clearly safe to avoid triggering the classifier (see row A in the diagram below). Users experience the safety margin as a model refusing to respond to some reasonable, non-harmful requests. For Fable 5, we made this safety margin much larger than in any prior launch (row B), meaning that many more benign requests would be blocked.
Alright, there you go. This is the one clearly backward move. Fable 5 was criticized initially due to an exaggerated sensitivity to standard prompts. If this new version will flag âbenign requests more oftenâ precisely for coding tasks (even if not all)âwhich is the main use case for Fable or any other frontier modelâthen I wonder whether the effective AI capability frontier is actually capped at Opus 4.8/GPT-5.5 moving forward (presumably, better models will have to fall back to the same level of capabilities).
So the question is: Can they train the classifier that separates dangerous tasks from non-dangerous ones in a way that allows users to conduct complex experiments at the frontier of intelligence? They will âcontinue to refineâ it, but given the stakes and the sensitivity of AI models to different forms of jailbreak, I donât think, under this regime, that we will ever see a ânon-cappedâ model again.
âVery clearly safeâ is more or less equivalent to âyou canât do shit.â (I will be excited to read testimonies by people working at the frontier and I hope to be wrong.)
We understood that these kinds of false positives would be frustrating for users, but made this tradeoff in the interest of making the modelâs other capabilities widely available.
In practice, the eventual compromise will keep the government happy and users scratching their heads because the opposite is untenable. Users provide money/data, which is important, and might be annoyed, but the government has the power to shut you down. AI companies and the US government will be increasingly interwoven to 1) prevent this kind of problem from repeating and 2) pave the way toward a full-scale Manhattan Project for AI. If anything, this will happen in the medium-term (will be absolutely undeniable by 2028), but again, this is the first step toward that and the last step of the AI industry as we knew it.
As we noted when we launched Fable 5, it is probably impossible to make any AI model fully robust (that is, impervious) to jailbreaks. We expect that some jailbreaks will be found for our models, and that they will vary in severity . . .
They canât allow a jailbreak to happen, so they will indeed err on the side of caution, as they do. To ensure a 99% jailbreak avoidance, they will never be able to escape overly conservative restrictions. The problem is that jailbreaks canât be fully solved in code due to the infinite scope of language as the interface between human and AIâAI models are grown, not designedâand sometimes thatâs a blessing and sometimes a curse. This is the latter.
Itâs simply the case that weâre at a point in the AI trajectory that âfrontier capabilitiesâ and âabundance of cautionâ can hardly go hand-in-hand anymore. So, this looks like an effective redefinition of what âthe frontier of AI capabilityâ means for the rest of us.
Finally, the explicit role of the government moving forward.
. . . Anthropic has worked closely with the US government . . . Our engagement spanned the Office of the National Cyber Director, the Office of Science and Technology Policy, the Department of the Treasury, the Department of Commerce (including CAISI), and relevant national security agencies.
We are committed to continuing that work, building on nearly two years of pre-existing collaborations with US government partners on pre-deployment testing and evaluation. The commitments below reflect both that pre-existing work and our new proposals to scale up our government collaboration as the above framework is finalized:
Preârelease government access and evaluation. For models that materially advance the capability frontier in areas relevant to national security, we will provide designated government partners with expanded early access to both the models and the safeguards that accompany them. . . .
Dedicated resources for joint research. We are substantially scaling up joint work with government partners on AI security. We will stand up dedicated Anthropic teams to work on shared government priorities, provide a significant compute allocation to support government testing and research . . .
This is basically the US government at the helm of the worldâs top AI companies. (It will apply to OpenAI and Google all the same.) They will have preferential access, veto power beyond just cybersecurity, and dedicated resources (talent and compute).
To me, itâs clear whoâs in charge.
Ok, so now moving onto Lutnickâs letter.
. . . Anthropic has agreed to proactively detect and address security risks associated with the models; to work diligently with the U.S. government on protocols and standards and releases for Mythos, Fable, and future models. . . .
This is the same thing Anthropic said except for one important detail: âfuture models.â They are crystal clear that this will be the standard release procedure moving forward. Essentially, the government is now a gatekeeper of some undisclosed kind with some undisclosed power between AI companies and us, users.
Commerce reserves the right to reevaluate the decisions made in this letter and the necessity of reimposing a license requirement, should circumstances change or should Anthropic fail to adhere to its commitments.
If âcircumstances changeââessentially a description of AI as a technologyâthe government can âreevaluate the decisions.â What are those circumstances? We donât know, but given that the premise of the AI industry is that emergent capabilities can be achieved by scaling compute and data, itâs fair to assume that anything that happens at any point is subject to be deemed a change in circumstance.
Thereâs nothing super weird about the state being in charge (state > industry, at least for now), but realize where the trajectory points to: better models will display better capabilities, which will entail more safeguards for us and tighter control of the industry by the state.
âThis is the worst AI will ever be.â
Yes, and this is also the freest AI will ever be.
Let me summarize what we just read.
A problem reproducible in lesser modelsâAnthropic shakes off responsibilityâhas cascaded into a full-blown protocol: stricter classifier to re-route prompts, undisclosed government audit of safeguards, dedicated compute and staff handed to the state, a signed commitment about every model to come, and a clear asymmetry between them: whereas Anthropic concedes everything, the government retains the power to tighten the restrictions and redefine the industry at some point by some reason.
You donât build all of this over a minor bug (there could have been confusion at first about jailbreak types and whatever, but at this point, the government knows exactly what itâs doing). So it was never about the bug.
If the jailbreak is really the cause, why didnât this happen at any point during the last few years, while the capability frontier was not as scary but this kind of issue was already well-known? Why did it happen now? Because only now can it be sold to us as an acceptable compromise: on the one hand, âEven the government says Mythos is scaryâ; and on the other, âGo have fun with the restored capabilities.â And because Anthropic has been the underdog for years and only in 2026 did they take the lead.
How can we say no to this? Everything Anthropic presents as a safety choiceâthe wider margin, the coding requests that now bounce down to Opus 4.8, the compromise for usersâis also lending permanent control to the government.
Anthropic asked for this. They say the arrangement has to be durable; an orderly and lasting process for government involvement in releases that includes them and everyone else. Anthropic engineered its own subordination and would do it again, because, to them, a chained frontier beats an open race.







