Anthropic uses Pokémon to benchmark updated AI

0
294
Anthropic uses Pokémon to benchmark updated AI

In a blog post published on Monday, Anthropic announced that it has tested its latest model, the Claude 3.7 Sonnet, on the classic Game Boy game Pokémon Red. The company has equipped the model with basic memory, pixel input, and function calls for button pressing and screen navigation, allowing it to play Pokémon continuously.

A unique feature of the Claude 3.7 Sonnet is its ability to “think big”. Similar to OpenAI’s o3-mini and DeepSeek’s R1, Claude 3.7 Sonnet can “reason” about complex problems by applying more computation – and spending more time.

Obviously, this came in handy in Pokémon Red.

Compared to Claude’s previous version, Claude 3.0 Sonnet, which was unable to leave the house in Palletown where the story begins, Claude 3.7 Sonnet successfully battled three Pokémon Gym leaders and won their tokens.

It is unclear how many computations Claude 3.7 Sonnet needed to reach these milestones – and how long each one took. Anthropic only reported that the model performed 35,000 actions to reach the last gym leader, Surge.

Surely it won’t be long before some enterprising developer finds out about this.

LEAVE A REPLY

Please enter your comment!
Please enter your name here