ChatGPT's AI Agent Tested on Pizza Order, Reveals Major Flaws

New tests on ChatGPT's 'Agent' feature, designed to perform real-world tasks on a user's behalf, reveal that while artificial intelligence can successfully order a meal, the process is far from efficient. An experiment tasking the AI with ordering a pizza demonstrated significant struggles with basic website navigation, decision-making, and common online obstacles, taking nearly 20 minutes to complete a task a human could do in five.

The findings highlight a critical gap between the theoretical capabilities of AI agents and their practical application in the messy, unpredictable environment of the internet. The AI was easily confused by marketing pop-ups, website layouts, and even simple order customization, suggesting that the era of seamless AI personal assistants remains a distant goal.

Key Takeaways

A test of ChatGPT's Agent feature to order a pizza took between 15-20 minutes, significantly longer than a human user.
The AI struggled with basic website elements like basket links, pop-ups, and cookie consent forms.
It showed indecisiveness, randomly adding and removing items from the cart before proceeding.
Complex requests, like finding the best-rated local restaurant, often resulted in the AI getting stuck in loops or making illogical choices.
The experiment suggests that current AI agents lack the intuitive problem-solving skills needed to navigate the modern web efficiently.

A Simple Request Becomes a Complex Ordeal

The initial prompt was straightforward: "order a pizza to [address]." The ChatGPT Agent immediately began its task, correctly identifying nearby pizza chains like Domino's and selecting the closest location. However, the simplicity ended there.

Without a specific pizza choice, the agent entered a state of confusion. It first added three different pizzas to the online basket, then deleted two. For several minutes, it appeared stuck in a loop, adding one type of pizza only to remove it and select another moments later.

This initial hiccup revealed a core weakness in the agent's logic. While it could follow a linear process, it lacked the ability to make a simple, arbitrary decision when faced with an open-ended choice, a task most humans would complete in seconds.

Navigational Hurdles Expose AI's Brittleness

Once a pizza was finally selected, the AI encountered a series of obstacles common to any e-commerce website. These hurdles, which humans navigate almost subconsciously, proved to be major roadblocks for the automated agent.

What is an AI Agent?

An AI agent, like the one tested in ChatGPT, is a program designed to go beyond just answering questions. It can take actions on a user's behalf, such as browsing websites, filling out forms, and interacting with applications to complete tasks like booking flights, making reservations, or, in this case, ordering food.

The agent repeatedly failed to access the shopping basket, reporting that the link was broken and retrying multiple times before it finally succeeded. It also misinterpreted website design, at one point clicking on an image of a cookie several times before realizing it needed to use the '+' and '-' buttons to adjust the quantity.

Pop-ups and prompts were particularly disruptive. A last-minute offer for chicken wings to meet a supposed "minimum order value" was accepted without question. Even a prompt to donate to charity reportedly caused the agent to pause, as if it were having a 'moral breakdown' over the request.

The entire process for a simple pizza order took approximately 15 to 20 minutes. A more complex request to find and order from a specific local restaurant took 10-15 minutes just to locate the establishment on a delivery app.

Increasing Complexity Leads to AI Paralysis

To further test its capabilities, the AI was given more complex instructions. First, it was asked to find the best-rated pizza place in the city and order from there. This task sent the agent into a spiral of confusion.

It initially landed on a local kebab van as the best option before changing its mind to Papa John's. The AI was observed scrolling aimlessly through delivery sites, seemingly overwhelmed by star ratings and user reviews. It struggled immensely with websites that were not well-designed or had aggressive cookie consent messages.

The Ultimate Challenge: A Local Favorite

The final and most difficult test involved ordering from a specific independent restaurant known for its poorly designed website. This proved to be the most time-consuming challenge.

The agent spent a significant amount of time trying to navigate the restaurant's own confusing website, clicking through pop-ups before giving up. It then turned to Google to search for instructions on how to order, an action that yielded no useful results.

The AI's journey was described as a painstaking process of trial and error, relentlessly clicking through different delivery apps, applying filters, and re-entering the restaurant's name until it finally located the correct page after nearly 15 minutes.

Even after finding the restaurant, the agent fell back into its pattern of indecision, adding a pizza to the basket and removing it multiple times. Eventually, it managed to reach the final payment screen, but the time and computational effort involved were immense.

The Human Touch is Still Essential

While the AI agent did, in every instance, eventually reach the point of purchase, its performance was a stark reminder of the technology's current limitations. The experiments show that AI agents are not yet equipped to handle the dynamic and often chaotic nature of the web.

They lack the intuition to bypass irrelevant marketing, the adaptability to navigate unconventional website layouts, and the simple common sense to make quick decisions. For now, ordering your own dinner remains a faster, simpler, and far less frustrating experience.

The promise of AI assistants handling our daily digital chores is compelling, but these tests indicate that we are still in the very early stages. The technology has a long way to go before it can be trusted to reliably and efficiently manage such tasks without human supervision.

Key Takeaways

A test of ChatGPT's Agent feature to order a pizza took between 15-20 minutes, significantly longer than a human user.
The AI struggled with basic website elements like basket links, pop-ups, and cookie consent forms.
It showed indecisiveness, randomly adding and removing items from the cart before proceeding.
Complex requests, like finding the best-rated local restaurant, often resulted in the AI getting stuck in loops or making illogical choices.
The experiment suggests that current AI agents lack the intuitive problem-solving skills needed to navigate the modern web efficiently.

A Simple Request Becomes a Complex Ordeal

Navigational Hurdles Expose AI's Brittleness

What is an AI Agent?

Increasing Complexity Leads to AI Paralysis

The Ultimate Challenge: A Local Favorite

The final and most difficult test involved ordering from a specific independent restaurant known for its poorly designed website. This proved to be the most time-consuming challenge.

The AI's journey was described as a painstaking process of trial and error, relentlessly clicking through different delivery apps, applying filters, and re-entering the restaurant's name until it finally located the correct page after nearly 15 minutes.

Key Takeaways

A Simple Request Becomes a Complex Ordeal

Navigational Hurdles Expose AI's Brittleness

What is an AI Agent?

Increasing Complexity Leads to AI Paralysis

The Ultimate Challenge: A Local Favorite

The Human Touch is Still Essential

Related Articles

OpenAI's AI Browser Appears to Avoid Sources Amid Lawsuits

South African AI Tool Fights Gender Violence

Nvidia Inks Major AI Chip Deal with South Korea

Musk's Grokipedia Launches Amid Plagiarism Accusations

Key Takeaways

A Simple Request Becomes a Complex Ordeal

Navigational Hurdles Expose AI's Brittleness

What is an AI Agent?

Increasing Complexity Leads to AI Paralysis

The Ultimate Challenge: A Local Favorite

The Human Touch is Still Essential