OpenAI launches a universal agent in ChatGPT

0
241
OpenAI launches a universal agent in ChatGPT

OpenAI is launching a new general-purpose AI agent in ChatGPT that the company says can perform a wide range of computer tasks on behalf of users. OpenAI claims that the agent can automatically navigate a user’s calendar, create presentations and slideshows that can be edited, and run code.

The tool, called the ChatGPT agent, combines several features from previous OpenAI agent tools, including the agent’s ability to navigate websites and Deep Research’s ability to synthesize information from dozens of websites into a concise research report. OpenAI says that users will be able to interact with the agent by simply asking ChatGPT in natural language.

The ChatGPT agent will be available on Thursday for OpenAI Pro, Plus, and Team subscribers. To activate the tool, users can select “agent mode” in the ChatGPT tool drop-down menu.

The launch of the ChatGPT agent is OpenAI’s boldest attempt yet to turn ChatGPT into an agent-based product that can perform actions and offload tasks for users, rather than just answering questions. In recent years, Silicon Valley companies including OpenAI, Google, and Perplexity have introduced dozens of AI agents that promised to do just that. However, these early versions of AI agents have proven to have difficulty handling complex tasks, and they seem less attractive as products than those touted by the executives of the tech companies developing the AI agents.

Nevertheless, OpenAI claims that the ChatGPT agent is much more powerful than its previous offerings.

The company’s new agent has access to ChatGPT connectors, which allows users to connect applications such as Gmail and GitHub so that the agent can find relevant information for your prompts. OpenAI claims that the ChatGPT agent has terminal access and can use the API to access specific applications.

OpenAI suggests that users can use the ChatGPT agent to “plan and buy ingredients for a Japanese breakfast for four” and “analyze three competitors and create a slideshow.” Such capabilities require the ChatGPT agent to analyze websites, plan a course of action, and use tools – much more complex tasks than OpenAI has previously attempted to solve with agents.

The model behind the ChatGPT agent offers state-of-the-art performance across several benchmarks, according to OpenAI.

The company claims that the ChatGPT agent model scored 41.6% on the latest Humanity’s Exam (pass@1), a challenging test consisting of thousands of questions on more than a hundred subjects. That’s about twice as much as o3 and o4-mini from OpenAI scored on this test.

On FrontierMath, one of the toughest math tests known, OpenAI says the ChatGPT agent scored 27.4% when it has access to tools like a terminal to execute code. The previous state-of-the-art result came from o4-mini, which scored only 6.3%.

OpenAI notes that it developed the ChatGPT agent with security in mind, mainly because the product has some new features that could make it more dangerous in the hands of an attacker. OpenAI has previously warned that agent-based models may have more dangerous capabilities.

In the ChatGPT agent security report, OpenAI notes that it has identified this model as “highly capable” in the area of biological and chemical weapons, which is defined in the OpenAI Preparedness Framework as a model that can “enhance existing pathways to cause serious harm.” OpenAI notes that it has no direct evidence of this, but has decided to take a precautionary approach and activate new safeguards to reduce these risks.

The new security measures for the ChatGPT agent include a monitor that runs in real time as users interact with the product. OpenAI says it runs a classifier for every query entered into the ChatGPT agent, determining whether the query is bio-related. If so, OpenAI passes the ChatGPT agent’s response through a second monitor that determines whether the content could be used to create a biological threat.

OpenAI also states that it has disabled the ChatGPT memory feature for this agent to prevent abuse. In other parts of ChatGPT, OpenAI’s memory feature allows a chatbot to refer to information from previous user chats. However, OpenAI claims that attackers can exploit this feature in the ChatGPT agent to leak sensitive data through rapid injection attacks. However, the company says it may return to adding this feature in the future.

While the ChatGPT agent sounds impressive, it remains to be seen how capable it really is in the real world. So far, agent technology has proven to be relatively fragile when interacting with the real world. Nevertheless, OpenAI claims to have developed a more efficient model that can fulfill the promise of AI agents.

LEAVE A REPLY

Please enter your comment!
Please enter your name here