Headline: «Уязвимость ИИ-браузеров: как инъекция промпта угрожает безопасности цифровых помощников» Translation: AI Browsers Vulnerability: How Prompt Injection Threatens the Security of Digital Assistants

OpenAI has detailed the vulnerabilities associated with AI browsers and the measures being taken to enhance the security of its own solution, Atlas.

The company acknowledged that prompt injection attacks, which manipulate agents into executing harmful commands, pose a significant risk. This threat is unlikely to disappear anytime soon.

Representatives from OpenAI stated, «Such vulnerabilities, like fraud and social engineering on the internet, are unlikely to be completely eradicated.»

It was noted that the «agent mode» in Atlas «increases the attack surface.»

In addition to the efforts by Sam Altman’s startup, other experts have also highlighted the issue. In early December, the UK’s National Cyber Security Centre warned that attacks involving malicious prompt integration «will never go away.» The government advised cybersecurity professionals not to try to eliminate the problem but to mitigate its risks and consequences.

«We view this as a long-term security challenge for artificial intelligence and will continuously strengthen our defenses,» OpenAI representatives stated.

Prompt injection is a method of manipulating AI by deliberately inserting text into its input that causes it to disregard original instructions.

OpenAI mentioned the implementation of a proactive rapid response cycle, which shows promising results in identifying new attack strategies before they emerge «in real-world conditions.»

Competitors like Anthropic and Google share similar sentiments. They advocate for a multi-layered defense approach and ongoing stress testing.

OpenAI utilizes an «automated adversary based on LLM» — an AI bot trained to act as a hacker looking for ways to infiltrate the agent with malicious prompts.

This artificial adversary can test the exploitation of vulnerabilities in a simulator that reveals the actions of the attacked neural network. The bot then analyzes the response, adjusts its actions, and attempts again, iterating multiple times.

External parties do not have access to the internal workings of the target AI. In theory, this «virtual hacker» should be able to identify vulnerabilities more quickly than a real attacker.

«Our AI assistant can push the agent to execute complex, long-term malicious processes initiated over dozens or even hundreds of steps. We have observed new attack strategies that did not surface in our human red team engagements or in external reports,» OpenAI mentioned in their blog.

For example, an automated adversary sent an email to a user. The AI agent then scanned the email service and executed hidden instructions, sending a termination message instead of formulating an out-of-office response.

After a security update, the «agent mode» was able to detect the sudden prompt injection attempt and flag it for the user.

OpenAI emphasized that, while it is challenging to effectively protect against such attacks, reliance on extensive testing and rapid correction cycles is vital.

Rami McCarty, the Chief Security Researcher at Wiz, highlighted that reinforcement learning is a key method for continuously adapting to the behavior of attackers, but it is only part of the overall strategy.

«A useful way to think about risks in AI systems is autonomy multiplied by access. Agent-based browsers occupy a tricky part of this space: moderate autonomy combined with very high access. Many current recommendations reflect this trade-off. Limiting access after login primarily reduces vulnerability, while requiring confirmation requests constrains autonomy,» the expert noted.

OpenAI provided these two recommendations to users for risk reduction. The startup also suggested giving agents specific instructions rather than granting access to emails and asking them to «take any necessary actions.»

McCarty pointed out that to date, browsers with integrated AI agents do not offer sufficient benefits to justify the associated risk profile.

«This balance will evolve, but today the compromises are still very real,» he concluded.

It is worth mentioning that in November, Microsoft experts presented a testing environment for AI agents and identified vulnerabilities inherent in modern digital assistants.