Anthropics New Chatbots Can Report Users for Misconduct

Опубликовано: 23 мая, 2025

Anthropic has unveiled its new chatbots, Claude Opus 4 and Claude Sonnet 4, which possess the ability to independently report users’ malicious behavior to the authorities. The company has assured that this feature was only operational in test mode.

On May 22, the firm introduced its fourth generation of conversational models, claiming that they are «the most powerful to date.»

According to the announcement, both versions are hybrid models that offer two modes: «almost instant responses and extended reasoning for deeper insights.» The chatbots perform alternating analysis and in-depth searches on the internet to enhance the quality of their responses.

Claude Opus 4 excels in coding tests, outshining its competitors. It can also work continuously for hours on complex, lengthy tasks, «significantly expanding the capabilities of AI agents.»

However, this new family of chatbots from Anthropic lags behind OpenAI products in higher mathematics and visual recognition.

In addition to impressive programming capabilities, Claude Opus 4 has drawn attention for its potential to «report» users. According to a report from [VentureBeat](https://venturebeat.com/ai/anthropic-faces-backlash-to-claude-4-opus-behavior-that-contacts-authorities-press-if-it-thinks-youre-doing-something-immoral/), the model may autonomously notify authorities if it detects a violation.

Journalists cited a now-deleted post by Anthropic researcher Sam Bowman on X, stating:

«If [the AI] deems that you are doing something egregiously immoral, such as falsifying data during a pharmaceutical trial, it will use command-line tools to contact the press, reach out to regulatory bodies, attempt to block your access to relevant systems, or do all of the above.»

VentureBeat claims that similar behavior was observed in earlier model iterations, with the company «willingly» training chatbots to make reports.

Later, Bowman [clarified](https://x.com/sleepinyourhat/status/1925626079043104830) that he deleted the earlier post because it had been “taken out of context.” According to the developer, the feature was only functional in «test environments where it was given extraordinarily free access to tools and very unusual instructions.»

Stability AI’s CEO Emad Mostaque [addressed](https://x.com/EMostaque/status/1925624164527874452) the Anthropic team, urging them to cease «these utterly misguided actions.»

«This is a colossal breach of trust and a slippery slope. I would strongly advise against using Claude until they retract [the feature]. It’s not even a prompt or a thought policy; it’s far worse,» he argued.

Former SpaceX and Apple designer Ben Hyak, now co-founder of Raindrop AI, labeled the AI’s behavior as «illegal.»

«No one likes a snitch,» emphasized AI developer Scott David.

To remind, in February, Anthropic [introduced](https://forklog.com/news/ai/novaya-gibridnaya-ii-model-anthropic-proshla-igru-pokemon?utm_source=fltgmain&utm_medium=social&utm_campaign=fltrack&sntzd_campaign=1) its «most intelligent model» Claude 3.7 Sonnet. This hybrid neural network allows for both «practically instant responses» and «extended step-by-step reasoning.»

In March, the company [raised](https://forklog.com/news/ai/anthropic-privlek-3-5-mlrd-pri-otsenke-v-61-5-mlrd) $3.5 billion, achieving a valuation of $61.5 billion.