OpenAI launches an Operator that can control a computer

0
281
OpenAI launches an Operator that can control a computer

OpenAI is reportedly preparing to launch Operator this week. Operator is the name of a computer agent that can perform tasks in a user’s web browser on their behalf. Other companies, such as Google and Anthropic, are developing similar “agents” in the hope that they will be the next big leap toward AI fulfilling its promise of being able to perform tasks that humans currently do.

According to The Information, which first reported on the upcoming launch, Operator will provide users with suggested suggestions in categories such as travel, restaurants, and events. For example, users can ask the operator to find a good flight from New York to Maui that does not land too late in the evening. The operator will not complete the transaction – the user will remain in the loop and complete the checkout process.

It is easy to imagine how Operator can be useful. Elderly people who don’t have computer skills can ask Operator to help them send an email and see it navigate to Gmail and open a texting window for them. Tech-savvy people don’t need this kind of help, but older people often find it difficult to navigate the Internet and struggle to complete even simple tasks. Bots can also help in other areas, such as quality testing, when companies need to check whether their new websites or services work properly.

The so-called “computer-assisted agents” carry potential risks. We’ve already seen one startup introduce a web navigation bot to automate the process of posting marketing spam on Reddit. Bots that take control of the end-user client are able to bypass API restrictions designed to block automation. AI startups will have to take some measures to combat abuse, otherwise websites will become even more spammed than they are today.

Agents like Operator essentially work by taking screenshots of a user’s browser and sending the images back to OpenAI for analysis. Once its models determine the next step needed to complete a task, the browser is instructed to move and click the mouse on the appropriate target or type text into the input field. It takes advantage of multimodal technology from OpenAI and other developers that can interpret different forms of input, in this case text and images.

The promise of the recent wave of AI startups is that they will be able to create artificial general intelligence (AGI) that can replace humans in most of the tasks they perform today and make everyone’s lives more efficient. Since the exponential performance gains of language models have slowed down, these companies are looking for new ways to help them achieve this goal, and computer-assisted use agents are one of them. Artificial intelligence can’t truly replace humans until it can physically perform tasks for them – writing text is only part of the task. Bots also need to be able to work with spreadsheets, watch videos, and more.

After Anthropic released a preliminary version of its computer bot, early testers complained that it was at best underdeveloped, getting stuck in loops where it didn’t know what to do, or forgetting about the task and starting to do something completely different, such as looking at nature photos on Google Images. It is also slow and expensive to operate.

Keeping a human in the loop will be extremely important with a bot that is given such a high level of control and access to critical data. It seems that perhaps computer agents will be similar to self-driving cars. Google was able to make a car drive on a straight road on its own quite easily, but it took years to solve edge scenarios.

There is debate about how to measure AGI and when it will be “achieved,” but OpenAI has told its largest sponsor, Microsoft, that it believes AGI will be achieved when it creates an AI that can generate at least $100 billion in revenue. This is a lofty goal considering that OpenAI predicts that it will generate $12 billion in revenue in 2025, while still losing billions.

At the same time, neither Microsoft nor Google has seen enterprise customers ready to adopt AI tools as quickly as they had hoped. Instead of charging $20-30 per employee to add AI tools to their packages, both companies are now shoving AI into their standard packages and raising prices by a couple of dollars accordingly.

LEAVE A REPLY

Please enter your comment!
Please enter your name here