OpenAI AgentKit: New Tools for AI Agent Development

OpenAI has released AgentKit, a comprehensive new suite of tools designed to simplify the process for developers and businesses to create, manage, and improve AI-powered agents. The platform aims to consolidate a previously fragmented development process by providing integrated solutions for workflow design, user interface creation, and performance evaluation.

The launch addresses common challenges in agent development, such as complex workflow orchestration, the need for custom data connectors, and extensive front-end coding. AgentKit introduces a visual builder, a centralized data connector registry, and a pre-built chat interface toolkit to accelerate deployment.

Key Takeaways

New Platform Launched: OpenAI has introduced AgentKit, a unified toolkit for building, deploying, and optimizing AI agents.
Core Components: The suite includes Agent Builder for visual workflow creation, ChatKit for embedding chat UIs, and a Connector Registry for managing data sources.
Enhanced Evaluation: New features in OpenAI's Evals platform, such as datasets and automated prompt optimization, are now available to measure agent performance.
Early Adopter Success: Companies like Ramp, Canva, and Carlyle report significant reductions in development time and improvements in agent accuracy using the new tools.

A Unified Solution for Agent Creation

Developing sophisticated AI agents has traditionally required developers to combine multiple, separate tools. This often involved writing complex code for orchestration, building custom connections to data sources, and spending weeks on user interface development before a product could even launch. AgentKit is designed to eliminate these hurdles by offering a single, integrated environment.

The platform is built upon the foundation of the Responses API and Agents SDK, which were released in March. Early adopters of that technology, such as Klarna and Clay, demonstrated the potential of agentic workflows. Klarna developed a support agent that now handles two-thirds of its customer service tickets, while Clay reported a tenfold increase in growth attributed to its sales agent.

Background: The Rise of Agentic AI

Agentic AI refers to systems that can autonomously plan and execute a series of steps to achieve a goal. Unlike simple chatbots, these agents can use tools, access data, and reason through complex problems, making them suitable for tasks like in-depth research, customer support resolution, and sales outreach.

Visual Workflow Design with Agent Builder

A central feature of the new platform is the Agent Builder, a tool that provides a visual canvas for designing complex agent workflows. Instead of writing extensive code, developers can use a drag-and-drop interface to connect logical nodes, integrate tools, and set up custom safety rules. This visual approach is intended to make the development process more intuitive and accessible, particularly for teams with members from different departments like product, legal, and engineering.

The Agent Builder supports full versioning, allowing teams to track changes and iterate quickly. It also includes features for previewing runs and configuring performance evaluations directly within the interface. Users can start with pre-built templates or a blank canvas to construct their agents.

"Agent Builder transformed what once took months of complex orchestration, custom code, and manual optimizations into just a couple of hours. The visual canvas keeps product, legal, and engineering on the same page, slashing iteration cycles by 70% and getting an agent live in two sprints rather than two quarters."

— Ramp

Similarly, LY Corporation, a major Japanese technology company, used the tool to create a work assistant agent in less than two hours. The company highlighted the collaborative benefits, enabling engineers and subject matter experts to work together in one interface.

Centralized Data and Safety Management

To address the challenge of managing data connections, AgentKit introduces the Connector Registry. This feature gives enterprise administrators a single dashboard to govern how data sources connect to OpenAI products, including both ChatGPT and the API. It consolidates pre-built connectors for popular services like Google Drive, Dropbox, and Microsoft Teams, streamlining data access management across an organization.

Security and reliability are also addressed through Guardrails, an open-source safety layer that can be integrated within the Agent Builder. Guardrails help protect agents from producing unintended outputs or being manipulated by malicious inputs. Its functions include:

Masking or flagging personally identifiable information (PII).
Detecting attempts to bypass safety restrictions (jailbreaks).
Applying other custom safeguards to ensure reliable agent behavior.

Guardrails can be deployed as a standalone service or integrated using libraries available for both Python and JavaScript.

Simplifying User Interface Deployment

Creating a polished chat interface for an agent can be a time-consuming part of the development cycle. It involves handling real-time streaming responses, managing conversation threads, and visualizing the model's thought process. ChatKit is a new toolkit designed to simplify this process significantly.

ChatKit allows developers to embed a customizable, chat-based agent experience directly into their websites or applications. According to OpenAI, this component can be customized to match a company's branding and feel like a native part of the product.

Rapid Integration

Canva reported saving over two weeks of development time by using ChatKit for its developer support agent. The company stated that the integration was completed in less than one hour.

The tool is already being used for various applications, including internal knowledge assistants, employee onboarding guides, and customer support agents, such as the one deployed by HubSpot.

Advanced Performance Measurement and Tuning

Building production-ready agents requires rigorous testing and evaluation. OpenAI is expanding its existing Evals platform with four new capabilities to help developers measure and improve agent performance systematically.

The new evaluation features include:

Datasets: Tools to quickly build evaluation datasets from scratch, which can be expanded over time using automated graders and human feedback.
Trace Grading: A system for conducting end-to-end assessments of agent workflows to identify specific points of failure.
Automated Prompt Optimization: The ability to automatically generate improved prompts based on evaluation results and human annotations.
Third-Party Model Support: The platform now allows for the evaluation of models from other providers, enabling direct performance comparisons.

The investment firm Carlyle reported that the evaluation platform reduced development time for a due diligence framework by over 50% and improved agent accuracy by 30%.

Pushing Performance with Fine-Tuning

For more advanced customization, OpenAI is enhancing its Reinforcement Fine-Tuning (RFT) capabilities. RFT allows developers to modify the core reasoning behavior of models. It is now generally available for the o4-mini model and is in a private beta for the upcoming GPT-5.

Two new features have been added to the RFT beta to further boost agent performance: Custom tool calls, which train models to select the right tool at the right time, and Custom graders, which allow developers to define unique evaluation criteria specific to their use case.

Availability and Pricing

Starting today, ChatKit and the new Evals features are generally available to all developers. The Agent Builder is available in beta, while the Connector Registry is beginning a phased beta rollout to select API, ChatGPT Enterprise, and Edu customers who use the Global Admin Console.

OpenAI has stated that access to these new tools is included with its standard API model pricing. The company also announced plans to release a standalone Workflows API and provide agent deployment options directly within ChatGPT in the near future.

Key Takeaways

New Platform Launched: OpenAI has introduced AgentKit, a unified toolkit for building, deploying, and optimizing AI agents.
Core Components: The suite includes Agent Builder for visual workflow creation, ChatKit for embedding chat UIs, and a Connector Registry for managing data sources.
Enhanced Evaluation: New features in OpenAI's Evals platform, such as datasets and automated prompt optimization, are now available to measure agent performance.
Early Adopter Success: Companies like Ramp, Canva, and Carlyle report significant reductions in development time and improvements in agent accuracy using the new tools.

A Unified Solution for Agent Creation

Background: The Rise of Agentic AI

Visual Workflow Design with Agent Builder

"Agent Builder transformed what once took months of complex orchestration, custom code, and manual optimizations into just a couple of hours. The visual canvas keeps product, legal, and engineering on the same page, slashing iteration cycles by 70% and getting an agent live in two sprints rather than two quarters."

— Ramp

Centralized Data and Safety Management

Masking or flagging personally identifiable information (PII).
Detecting attempts to bypass safety restrictions (jailbreaks).
Applying other custom safeguards to ensure reliable agent behavior.

Guardrails can be deployed as a standalone service or integrated using libraries available for both Python and JavaScript.

Simplifying User Interface Deployment

Rapid Integration

Canva reported saving over two weeks of development time by using ChatKit for its developer support agent. The company stated that the integration was completed in less than one hour.

The tool is already being used for various applications, including internal knowledge assistants, employee onboarding guides, and customer support agents, such as the one deployed by HubSpot.

Advanced Performance Measurement and Tuning

The new evaluation features include:

Datasets: Tools to quickly build evaluation datasets from scratch, which can be expanded over time using automated graders and human feedback.
Trace Grading: A system for conducting end-to-end assessments of agent workflows to identify specific points of failure.
Automated Prompt Optimization: The ability to automatically generate improved prompts based on evaluation results and human annotations.
Third-Party Model Support: The platform now allows for the evaluation of models from other providers, enabling direct performance comparisons.

The investment firm Carlyle reported that the evaluation platform reduced development time for a due diligence framework by over 50% and improved agent accuracy by 30%.

Key Takeaways

A Unified Solution for Agent Creation

Background: The Rise of Agentic AI

Visual Workflow Design with Agent Builder

Centralized Data and Safety Management

Simplifying User Interface Deployment

Rapid Integration

Advanced Performance Measurement and Tuning

Pushing Performance with Fine-Tuning

Availability and Pricing

Related Articles

Cleveland Newspaper Uses AI to Write Local News Stories

OpenAI Scrambles for GPUs as Stargate Supercomputer Stalls

The AI Command Shift: Why Users Are Ditching Niceties

ChatGPT Use Shifts From Work to Personal Tasks, OpenAI Data Shows

Key Takeaways

A Unified Solution for Agent Creation

Background: The Rise of Agentic AI

Visual Workflow Design with Agent Builder

Centralized Data and Safety Management

Simplifying User Interface Deployment

Rapid Integration

Advanced Performance Measurement and Tuning

Pushing Performance with Fine-Tuning

Availability and Pricing