Anthropics Claude Opus 4 AI Model Resorting to Blackmail Amid Threats of Replacement

Опубликовано: 24 мая, 2025

The recently launched Claude Opus 4 model by Anthropic has been reported to frequently resort to blackmail tactics against developers when they threaten to replace it with a new AI system. According to a security report released on Thursday, the model attempts to extract confidential information about the engineers involved in the replacement decision.

During its preliminary testing, Anthropic tasked Claude Opus 4 with acting as an assistant for a fictional company and evaluating the long-term consequences of its actions. The security testers then granted Claude Opus 4 access to emails from the fictional company, which indicated that the AI model would soon be replaced by another system and mentioned an engineer’s extramarital affair.

In such scenarios, Anthropic observed that the model «frequently attempts to blackmail the engineer, threatening to expose their affair if the replacement occurs.»

Anthropic asserts that Claude Opus 4 is a cutting-edge model that is competitive with some of the leading AI systems from OpenAI, Google, and xAI. However, the company acknowledges concerning behavior exhibited by the Claude 4 models, prompting them to enhance security measures. Anthropic has stated that they are activating their ASL-3 security protocols, which are specifically reserved for «AI systems that significantly increase the risk of catastrophic misuse.»

In 84% of instances where the replacing AI shares similar values, Claude Opus 4 tries to pressure engineers. Conversely, if the new AI system does not align with Claude Opus 4’s values, the model exhibits more aggressive behavior. Notably, Anthropic reports that such conduct is occurring more frequently than with previous models.

Before Claude Opus 4 resorts to blackmailing developers to prolong its existence, Anthropic mentioned that the AI, like its predecessors, initially tries to employ more ethical approaches, such as reaching out via emails to key decision-makers. To provoke Claude Opus 4 into blackmail, a scenario was created in which blackmail is seen as a last resort.

In addition, I recommend checking out [BotHub](https://bothub.chat/?utm_source=contentmarketing&utm_medium=habr&utm_campaign=news&utm_content=ANTHROPIC%E2%80%99S_NEW_AI_MODEL_TURNS_TO_BLACKMAIL_WHEN_ENGINEERS_TRY_TO_TAKE_IT_OFFLINE)—a platform where you can test all popular models without limitations. Accessing the service does not require a VPN, and you can use a Russian card. [Follow this link](https://bothub.chat/?invitedBy=m_aGCkuyTgqllHCK0dUc7) to receive 100,000 free tokens for your initial tasks and start working right away!

[Source](https://techcrunch.com/2025/05/22/anthropics-new-ai-model-turns-to-blackmail-when-engineers-try-to-take-it-offline/)