Anthropic reversed a policy that had restricted independent AI researchers from using its Claude models to study AI safety and related topics, WIRED reported Thursday, following significant pushback from academics who said the restriction was undermining the research community’s ability to scrutinise AI systems.
The policy in question had prohibited certain categories of research that involved testing Claude’s responses to potentially sensitive prompts, including studies designed to probe the model’s reasoning under adversarial conditions. Critics said the restriction was counterproductive for a company that publicly positions safety research as central to its mission.
AI safety researchers said the policy, as written, could have prevented legitimate academic work examining how large language models behave when queried about harmful topics, which is precisely the kind of research that helps identify and correct dangerous model tendencies. Several academics described it as the equivalent of a pharmaceutical company barring independent drug trials.
Anthropic said in a statement that the reversal reflects a commitment to enabling responsible academic research and that it had worked with researchers to develop clearer guidelines for studies that require testing the model’s behaviour in sensitive domains. The company said it remains committed to transparency about Claude’s capabilities and limitations.
The episode highlights ongoing tensions in the AI industry between safety rhetoric and the practical handling of third-party scrutiny. Researchers have long sought clearer, more permissive policies from major AI labs, arguing that independent red-teaming is essential for identifying risks that internal teams may miss or underreport.
Anthropic is one of the most prominent AI safety-focused companies in the industry, having been founded by former OpenAI researchers who left to build what they described as a more safety-conscious organisation. The policy that was reversed sat uncomfortably with that stated priority, and multiple researchers had publicly flagged the contradiction in the weeks before the reversal.
The incident comes as the AI industry faces growing pressure from policymakers in the US and Europe to demonstrate that safety claims are backed by verifiable external assessments rather than self-reported internal evaluations.
Anthropic’s research access policies and responsible scaling commitments are documented at the Anthropic website. The policy reversal is relevant to the broader AI transparency debate discussed in coverage of Anthropic’s Claude Opus 4.8 model capabilities. The AI safety research landscape is also explored in context of the Apple Intelligence features being introduced in iOS 27. Broader AI model accessibility discussions appear in coverage of the AI-assisted features now appearing across the smartphone market.




