AI Agents Use Code Execution for Major Efficiency Gains with MCP

A new method for connecting AI agents to external tools is dramatically reducing costs and increasing speed by teaching them to write and execute code instead of relying on direct commands. This approach, centered on the Model Context Protocol (MCP), addresses a critical bottleneck that has slowed down even the most advanced AI systems as they scale.

By shifting from simple tool-calling to dynamic code generation, AI agents can now interact with thousands of data sources and applications with up to 98% greater efficiency, a development that promises to unlock more complex and powerful automated workflows.

Key Takeaways

AI agents are becoming inefficient as they connect to more tools, overloading their processing capacity (context window) and increasing operational costs.
The new "code execution" method allows agents to write small programs to interact with tools, rather than loading all tool information at once.
This technique can reduce token consumption and costs by over 98% in some cases, leading to faster response times.
It also enhances data privacy by keeping sensitive information out of the AI model's direct view and enables agents to perform more complex, multi-step tasks.

The Scaling Problem Facing Modern AI Agents

Artificial intelligence agents are designed to perform tasks by connecting to various external systems like databases, calendars, and customer relationship management (CRM) software. The Model Context Protocol (MCP) emerged as a universal standard to make these connections seamless, allowing developers to plug their agents into a vast ecosystem of tools without building custom integrations for each one.

Since its introduction in late 2024, MCP has seen widespread adoption. However, this success has revealed a significant challenge. As developers connect agents to hundreds or even thousands of tools, the systems become bogged down.

This inefficiency stems from two primary issues related to the agent's "context window," which is essentially its short-term memory.

1. Overloaded Context Windows

Traditionally, an agent must load the definitions and instructions for every single tool it can access directly into its context window at the beginning of a task. For an agent connected to thousands of tools, this means processing hundreds of thousands of tokens of information before it even reads the user's request. This initial loading process is slow and costly.

2. Redundant Data Processing

When an agent performs a multi-step task, it often passes large amounts of data back and forth through its context window. For example, to move a meeting transcript from Google Drive to a Salesforce record, the agent first calls the Drive tool, loads the entire transcript into its memory, and then calls the Salesforce tool, writing the full transcript out again. For a long document, this can add tens of thousands of unnecessary tokens to the process, increasing both latency and the risk of errors.

By the Numbers: Context Overload

In a test scenario involving numerous tools, loading all definitions upfront required an AI agent to process 150,000 tokens. Switching to a code execution model for the same task reduced that number to just 2,000 tokens—a 98.7% reduction in both cost and processing time.

A Solution Inspired by Software Engineering

To solve this scaling problem, developers are turning to a familiar concept: writing code. Instead of treating tools as direct commands, the new approach presents them to the AI agent as a library of code functions within a file system. The agent can then browse this system, select only the functions it needs for a specific task, and write a script to execute the entire workflow.

This method, sometimes referred to as "Code Mode," leverages the powerful code-writing abilities of modern large language models (LLMs).

For the Google Drive to Salesforce task, the agent no longer calls the tools one by one. Instead, it writes a simple script:

const transcript = await gdrive.getDocument({ documentId: 'abc123' }); await salesforce.updateRecord({ objectType: 'SalesMeeting', recordId: 'xyz789', data: { Notes: transcript } });

In this model, the large transcript is retrieved and passed directly to Salesforce within the execution environment. The AI model only sees the commands, not the massive data payload, dramatically reducing token usage.

The Broader Benefits of Code Execution

The shift to code execution offers more than just cost savings and speed. It introduces a more sophisticated and secure way for agents to operate, bringing several key advantages.

Progressive Disclosure of Tools

Agents can now discover tools as needed. Instead of loading everything at once, an agent can search for relevant tools, read only their descriptions to understand their purpose, and then load the full definition only when it decides to use one. This "just-in-time" approach keeps the context window clean and focused.

Enhanced Data Privacy

With code execution, intermediate data stays within a secure execution environment by default. Sensitive information, like customer personally identifiable information (PII), can be processed without ever entering the model's context window. The system can even automatically tokenize sensitive data like emails and phone numbers, allowing them to be moved between systems like Google Sheets and Salesforce without the AI model ever "seeing" the actual private information.

More Powerful Workflows

Programming constructs like loops, conditionals (if/then statements), and error handling can be executed far more efficiently in code. An agent can be instructed to check a Slack channel every five seconds for a deployment notification. Instead of repeatedly calling the Slack tool, it can write a simple `while` loop that runs independently, saving countless tokens and reducing latency.

State Persistence and Skill Building

Agents with filesystem access can save their work. They can write intermediate results to a file and resume a complex task later. More importantly, they can save useful scripts as reusable "skills." Once an agent perfects a script for a task like converting a spreadsheet to a CSV file, it can save that script. In the future, it can simply call upon its saved skill instead of writing the code from scratch, allowing it to build a library of higher-level abilities over time.

Security Considerations

While powerful, running AI-generated code introduces new challenges. This approach requires a secure, sandboxed execution environment with strict resource limits and monitoring to prevent potential misuse. Developers must weigh the significant benefits of code execution against the added operational complexity and security considerations.

The Future of AI Agent Architecture

The move toward code execution marks a significant maturation in the field of AI agents. It applies established software engineering principles to solve emerging problems in AI, such as context management, state persistence, and tool composition.

By treating AI agents less like simple command-followers and more like autonomous developers, this approach enables them to handle more complex tasks with greater efficiency, privacy, and power. As this method becomes more widespread, it is expected to accelerate the development of sophisticated agents capable of automating increasingly intricate business and personal workflows.

Key Takeaways

AI agents are becoming inefficient as they connect to more tools, overloading their processing capacity (context window) and increasing operational costs.
The new "code execution" method allows agents to write small programs to interact with tools, rather than loading all tool information at once.
This technique can reduce token consumption and costs by over 98% in some cases, leading to faster response times.
It also enhances data privacy by keeping sensitive information out of the AI model's direct view and enables agents to perform more complex, multi-step tasks.

The Scaling Problem Facing Modern AI Agents

This inefficiency stems from two primary issues related to the agent's "context window," which is essentially its short-term memory.

1. Overloaded Context Windows

2. Redundant Data Processing

By the Numbers: Context Overload

A Solution Inspired by Software Engineering

This method, sometimes referred to as "Code Mode," leverages the powerful code-writing abilities of modern large language models (LLMs).

For the Google Drive to Salesforce task, the agent no longer calls the tools one by one. Instead, it writes a simple script:

const transcript = await gdrive.getDocument({ documentId: 'abc123' }); await salesforce.updateRecord({ objectType: 'SalesMeeting', recordId: 'xyz789', data: { Notes: transcript } });

The Broader Benefits of Code Execution

The shift to code execution offers more than just cost savings and speed. It introduces a more sophisticated and secure way for agents to operate, bringing several key advantages.

Key Takeaways

The Scaling Problem Facing Modern AI Agents

1. Overloaded Context Windows

2. Redundant Data Processing

By the Numbers: Context Overload

A Solution Inspired by Software Engineering

The Broader Benefits of Code Execution

Progressive Disclosure of Tools

Enhanced Data Privacy

More Powerful Workflows

State Persistence and Skill Building

Security Considerations

The Future of AI Agent Architecture

Related Articles

OpenAI Scrambles for GPUs as Stargate Supercomputer Stalls

The AI Command Shift: Why Users Are Ditching Niceties

ChatGPT Use Shifts From Work to Personal Tasks, OpenAI Data Shows

Silicon Valley's AI Hype Fails to Convince the Public

Key Takeaways

The Scaling Problem Facing Modern AI Agents

1. Overloaded Context Windows

2. Redundant Data Processing

By the Numbers: Context Overload

A Solution Inspired by Software Engineering

The Broader Benefits of Code Execution

Progressive Disclosure of Tools

Enhanced Data Privacy

More Powerful Workflows

State Persistence and Skill Building

Security Considerations

The Future of AI Agent Architecture