Anthropic uses Pokémon to benchmark updated AI

February 25, 2025 17:30

329

In a blog post published on Monday, Anthropic announced that it has tested its latest model, the Claude 3.7 Sonnet, on the classic Game Boy game Pokémon Red. The company has equipped the model with basic memory, pixel input, and function calls for button pressing and screen navigation, allowing it to play Pokémon continuously.

A unique feature of the Claude 3.7 Sonnet is its ability to “think big”. Similar to OpenAI’s o3-mini and DeepSeek’s R1, Claude 3.7 Sonnet can “reason” about complex problems by applying more computation – and spending more time.

Obviously, this came in handy in Pokémon Red.

Compared to Claude’s previous version, Claude 3.0 Sonnet, which was unable to leave the house in Palletown where the story begins, Claude 3.7 Sonnet successfully battled three Pokémon Gym leaders and won their tokens.

It is unclear how many computations Claude 3.7 Sonnet needed to reach these milestones – and how long each one took. Anthropic only reported that the model performed 35,000 actions to reach the last gym leader, Surge.

Surely it won’t be long before some enterprising developer finds out about this.

Anthropic uses Pokémon to benchmark updated AI

LEAVE A REPLY Cancel reply

Don't Miss

Blackview: 5G tablets and phones with discounts of up to 59%

Battle of the Racks: NVIDIA Rubin NVL72 vs. AMD Helios (2026...

Silicon Powerhouse: Samsung and AMD Sign Historic MOU for HBM4 and...

Benchmarking the 2026 AI Frontier: Lenovo’s Blackwell-Powered Workstations Revealed

Ukraine & UK Launch Defense AI Center “A1”: The Brain of...