Anthropics Claude 3.7 Sonnet: A Game-Changer in AI Tested on Pokémon Red

Anthropic has utilized Pokémon for evaluating its latest AI model.

In a blog post released on Monday, Anthropic announced that it tested its newest model, Claude 3.7 Sonnet, by playing the classic Game Boy title Pokémon Red. The model was equipped with basic memory, the ability to input pixel data to the screen, and function calls to simulate button presses and screen navigation, enabling continuous gameplay in Pokémon.

A standout feature of Claude 3.7 Sonnet is its capacity for «advanced reasoning.» Similar to OpenAI’s o3-mini and DeepSeek’s R1, it can «think through» complex tasks by utilizing more computational resources and requiring additional time.

This capability proved beneficial in Pokémon Red.

Unlike its predecessor, Claude 3.7 Sonnet successfully completed the challenges and defeated three Pokémon Gym Leaders to collect their badges.

However, it’s unclear how much computation was necessary for Claude 3.7 Sonnet to achieve these feats and the time involved. Anthropic only disclosed that the model executed 35,000 actions to reach the final level, Surge.

It won’t be long before some innovative developer takes notice of this.

While Pokémon Red is more of a pastime than a serious endeavor, there’s a lengthy history of leveraging games for AI testing. Recently, several new applications and platforms have emerged for assessing AI capabilities in games ranging from Street Fighter to Pictionary.

Source