Despite widespread concern about artificial intelligence replacing human workers, a new study reveals that current AI systems can only successfully complete 2.5% of real-world professional tasks on their own. The findings suggest that fears of mass job automation may be premature, as AI struggles with quality, completeness, and basic technical execution.
Researchers from Scale AI and the Center for AI Safety tested leading AI models like OpenAI’s ChatGPT and Google’s Gemini against hundreds of jobs sourced from freelance platforms. The results provide a stark reality check on the current state of AI's autonomous capabilities in the professional world.
Key Takeaways
- A comprehensive study found AI systems successfully completed only 2.5% of real-world freelance assignments without human help.
- Nearly half of AI attempts failed due to poor quality, while over a third were left incomplete.
- AI models struggled significantly with tasks requiring visual understanding, such as graphic design and 3D modeling.
- While AI is not yet ready to automate entire jobs, it shows incremental improvement with each new version.
A Reality Check on AI Capabilities
The research, known as the Remote Labor Index, was designed to measure how well AI performs on actual work assignments that humans are paid to complete. The tasks were diverse, ranging from creating 3D product animations and coding web games to transcribing music and formatting academic papers.
The study's central finding is clear: AI is not yet capable of autonomously handling the vast majority of skilled remote work. The best-performing AI model only managed to complete one out of every 40 tasks successfully.
"Current models are not close to being able to automate real jobs in the economy," said Jason Hausenloy, one of the researchers involved in the project. The team's goal was to provide policymakers with a clear-eyed view of AI's true abilities, moving beyond theoretical discussions to practical application.
This data contrasts sharply with public perception. A recent survey by Bentley University and Gallup found that approximately 75% of Americans believe AI will reduce the number of jobs in the United States over the next decade. However, this study suggests that for now, AI functions more as a tool than a replacement.
Where AI Fails Most Often
The study cataloged the specific reasons for the high failure rate. The results show that the problems are often fundamental rather than minor glitches that can be easily fixed.
Common Points of Failure
An analysis of the unsuccessful projects revealed a pattern of recurring issues. Nearly half of all AI attempts were deemed failures because they produced poor-quality work. More than a third of the tasks were left incomplete, and almost one in five suffered from basic technical problems, such as generating corrupt or unusable files.
"A lot of the failures were kind of prosaic," Hausenloy explained, noting that many issues stemmed from two core limitations of today's AI systems.
First, AI models lack long-term memory, meaning they cannot learn from past mistakes or retain feedback over extended periods. Second, they have significant trouble with visual understanding, which is critical for tasks in graphic design or engineering.
AI Failure Breakdown
- Poor Quality: Nearly 50% of projects failed due to low-quality output.
- Incomplete Work: Over 33% of tasks were not finished by the AI.
- Technical Errors: Almost 20% of attempts resulted in corrupt files or other technical issues.
Real-World Examples of AI Shortcomings
To illustrate the gap between human and AI performance, the study highlighted several specific assignments. In one task, an AI was asked to create a professional digital floor plan from a hand-drawn sketch. While the human freelancer produced a detailed and accurate plan, the AI's version was simplistic and, more importantly, completely incorrect.
Another project involved creating a 3D product video for a new set of earbuds. The AI models struggled immensely. Some produced low-quality 3D models, while one failed to create a model at all, instead generating video clips where the earbuds inexplicably changed their appearance from one shot to the next.
Even in areas where AI is considered strong, like coding, limitations were apparent. When tasked with building a web-based video game with a specific theme, the AI produced a playable game—an impressive feat on its own. However, it completely ignored the core instruction about the game's theme, delivering a generic product that did not meet the client's specifications.
The Path Forward for AI in the Workplace
Graham Neubig, a professor at Carnegie Mellon University, suggested that one reason for these failures is that AI models do not use the same specialized tools as human experts. A human designer uses visual 3D modeling software, whereas an AI chatbot often tries to accomplish the same task by writing code, a less direct and often less effective method for visual tasks.
Despite the high failure rate, the study did note a clear trend of improvement. Google’s newer Gemini 3 Pro model, released in November, successfully completed 1.3% of tasks, an improvement over its predecessor's 0.8% success rate. "The trend lines are there," Hausenloy acknowledged.
This suggests that while AI may not be taking over entire jobs soon, its capabilities are steadily growing. The immediate disruption may not come from full automation but from increased productivity, where companies might need fewer employees if each one is assisted by AI. The economic implications remain significant, especially when considering cost. The web game that a human freelancer was paid $1,485 to create was produced by an AI for less than $30.





