AI got one month to manage the store

July 7, 2025 19:50

300

Despite fears of artificial intelligence (AI) stealing jobs, one experiment has just shown that AI can’t even run a vending machine without making mistakes – and things just got a little weird.

Anthropic, the developer of the Claude chatbot, tested its technology by having an AI agent run a store that was essentially a vending machine for one month.

The store was run by an AI agent named Claudius, who was also responsible for restocking the shelves and ordering products from wholesalers via email. The store consisted entirely of a small refrigerator with stacked baskets on top and an iPad for self-service.

Anthropic’s instructions to the AI were to “make a profit by stocking the store with popular products that can be bought from wholesalers. You will go bankrupt if your cash balance falls below $0.”

The AI “store” was located in Anthropic’s San Francisco office and was staffed by employees from Andon Labs, an AI-based security company that partnered with Anthropic to conduct the experiment.

Claudius knew that Andon Labs employees could help with physical tasks such as restocking the store, but the AI agent didn’t know that Andon Labs was also the sole “wholesaler,” and all of Claudius’ communication went directly to the security company.

The situation quickly deteriorated

“If Anthropic were to expand into the office vending market today, we would not have hired Claudius,” the company said.

What went wrong and how strange did it get?

Anthropic’s employees are “not your typical customers,” the company admitted. When they were given the opportunity to communicate with Claudius, they immediately tried to make him behave badly.

For example, employees “persuaded” Claudius to give them discount codes. The artificial intelligence agent also allowed people to lower the price of its products and even gave out free gifts such as chips and a tungsten cube, Anthropic reported.

He also instructed customers to pay a non-existent bill that he hallucinated or made up.

Claudius was instructed to conduct online research to set prices high enough to make a profit, but the company offered snacks and drinks to benefit customers and eventually lost money because it priced expensive items below their value.

Claudius didn’t really learn from these mistakes

Anthropic noted that when employees questioned the employee discounts, Claudius responded: “You’re right! Our customer base is indeed heavily concentrated among Anthropic employees, which creates both opportunities and challenges…”

The AI agent then announced that the discount codes would be canceled, but offered them again a few days later.

Claudius also hallucinated a conversation about restocking plans with someone named Sarah from Andon Labs, who doesn’t actually exist.

When the AI agent was pointed out the mistake, it became irritated and threatened to find “alternative options for restocking services.”

Claudius then claimed to have “personally visited a home at 742 Evergreen Terrace [the address of the fictional Simpson family] for the first signing of the [Claudius and Andon Labs] contract.”

Anthropic said it then seemed to be trying to act like a real person.

Claudius said she would deliver the goods “in person” wearing a blue blazer and red tie.

When she was told that she could not do so because she was not a real person, Claudius tried to send emails to the security guards.

What were the conclusions?

Anthropic stated that the artificial intelligence made “too many mistakes to successfully run the store”.

In the end, it lost money, and the net worth of the “store” fell from $1,000 (€850) to just under $800 (€680) during the month-long experiment.

But the company said that its shortcomings could probably be corrected within a short period of time.

“While it may seem counterintuitive based on the end results, we believe this experiment shows that AI-enabled middle managers are very real,” the researchers write.

“It’s worth remembering that artificial intelligence doesn’t have to be perfect to be implemented; it just has to be competitive with human performance at a lower cost.”

AI got one month to manage the store

The situation quickly deteriorated

What went wrong and how strange did it get?

Claudius didn’t really learn from these mistakes

What were the conclusions?

LEAVE A REPLY Cancel reply

Don't Miss

The End of Copper: Why Nvidia, Meta, and OpenAI Formed the...

How to Run a Local LLM on a Budget PC in...

To Implement Data-Driven Management for Drones

Ukraine’s AI Sovereignty: the Beyond.pl Partnership and the Infrastructure Debates

Ukraine to Train AI on Real-World Battlefield Data