AI29 views6 min read

Google Unveils Gemini Robotics 1.5 for Advanced AI Agents

Google has introduced Gemini Robotics 1.5, a new suite of AI models that enable robots to plan, reason, and execute complex, multi-step physical tasks.

Chloe Bennett
By
Chloe Bennett

Chloe Bennett is a technology correspondent for Neurozzio, specializing in robotics, artificial intelligence, and automation. She reports on the latest advancements in AI agents and their application in the physical world.

Author Profile
Google Unveils Gemini Robotics 1.5 for Advanced AI Agents

Google has introduced a new generation of artificial intelligence models designed to significantly advance the capabilities of robots. The new models, Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, enable robots to perceive their environment, plan complex actions, and execute multi-step tasks with greater autonomy and transparency.

Key Takeaways

  • Google announced two specialized AI models: Gemini Robotics 1.5 for physical action and Gemini Robotics-ER 1.5 for reasoning and planning.
  • The models work in tandem, allowing robots to break down complex requests into manageable steps and execute them in the physical world.
  • A key feature is the ability for the action model to "think before acting," generating an internal reasoning process to guide its movements.
  • The system can learn skills on one type of robot and transfer them to another without specialized retraining, a concept known as learning across embodiments.
  • Gemini Robotics-ER 1.5 is now available for developers through the Gemini API in Google AI Studio, while Gemini Robotics 1.5 is accessible to select partners.

A Two-Model System for Physical Intelligence

The latest development in robotics from Google is not a single model but a coordinated system of two specialized AIs. This framework is designed to mimic how humans approach complex tasks by separating high-level planning from low-level physical execution.

This approach aims to create what Google calls "physical agents"—robots that can operate with a higher degree of general intelligence and adaptability in real-world settings.

The Planner: Gemini Robotics-ER 1.5

The first component, Gemini Robotics-ER 1.5, functions as the system's strategic brain. This Vision-Language Model (VLM) is optimized for embodied reasoning, which means it can understand and make logical decisions about physical environments.

Its primary role is to interpret a complex command, such as sorting waste according to local regulations. To do this, it can access external tools, like using Google Search to find the specific recycling rules for a given location. It then formulates a detailed, step-by-step plan to complete the mission.

State-of-the-Art Performance

According to Google's technical report, Gemini Robotics-ER 1.5 achieves state-of-the-art performance across 15 different academic benchmarks for embodied reasoning, including ERQA and Point-Bench, which measure spatial understanding and question answering.

The Actor: Gemini Robotics 1.5

Once a plan is created, Gemini Robotics-ER 1.5 sends instructions in natural language to the second model, Gemini Robotics 1.5. This Vision-Language-Action (VLA) model is responsible for translating those instructions into precise motor commands for the robot.

This model handles the physical execution, using its visual understanding to interact with objects and perform the required actions. It bridges the gap between a high-level plan and the physical movements needed to carry it out.

Advancing Robotic Capabilities

The combination of these two models unlocks several new capabilities that have been significant challenges in the field of robotics. These advancements move robots from simple command-followers to more proactive and intelligent systems.

Thinking Before Acting

A notable feature of Gemini Robotics 1.5 is its ability to generate an internal thought process before taking action. When given a task, the model produces a sequence of reasoning in natural language that outlines its strategy.

For a command like "Sort my laundry by color," the system first establishes the goal, such as putting white clothes in one bin and colored clothes in another. It then breaks this down into smaller steps, like picking up a specific item and placing it in the correct bin, making its decision-making process more transparent and robust.

This internal monologue allows the robot to handle semantically complex tasks and adapt to unexpected changes in its environment. It can also turn a long, complex task into a series of shorter, more manageable actions that are easier to execute successfully.

Learning Across Different Robot Forms

Robots are built in many different shapes and sizes, with varying sensors and movement capabilities. Historically, transferring a skill learned by one robot to another has been a difficult and time-consuming process.

Gemini Robotics 1.5 demonstrates a strong ability to learn across different embodiments. This means that a skill trained on one type of robot, such as the dual-arm ALOHA 2, can be successfully performed by a completely different robot, like the humanoid Apollo from Apptronik, without needing to retrain the model for the new hardware.

What is an Embodiment?

In robotics, "embodiment" refers to the physical form of a robot, including its size, shape, sensors, and degrees of freedom (how it can move). The ability to generalize skills across different embodiments is a major step toward creating truly general-purpose AI for robotics.

This breakthrough has the potential to accelerate the development of new robotic behaviors, as skills can be shared and deployed across a wide range of platforms, making robots smarter and more versatile more quickly.

Commitment to Responsible Development

As AI models grant robots greater autonomy, ensuring their safe operation in human-centric environments becomes critical. Google has stated that the development of these models is guided by its AI Principles and overseen by its Responsibility & Safety Council.

The safety framework for Gemini Robotics 1.5 includes several layers:

  • Semantic Reasoning: The model is designed to think about safety before acting.
  • Policy Alignment: It adheres to existing Gemini Safety Policies to ensure respectful interaction with humans.
  • Low-Level Systems: It can trigger on-board safety protocols, such as collision avoidance systems, when necessary.

To further advance safety research, Google is also releasing an upgraded version of its ASIMOV benchmark. This collection of datasets is used to evaluate and improve the semantic safety of AI systems. According to the company, Gemini Robotics-ER 1.5 shows leading performance on this benchmark, with its reasoning ability contributing to a better understanding of physical safety constraints.

A Step Toward General-Purpose Robots

The introduction of the Gemini Robotics 1.5 models marks a significant milestone in the pursuit of artificial general intelligence (AGI) within the physical world. By equipping robots with agentic capabilities—the ability to reason, plan, and use tools—these systems move beyond simply reacting to commands.

This foundational step is aimed at building robots that can navigate the complexities of the real world with greater intelligence and dexterity. The ultimate goal is to create machines that can be more helpful and seamlessly integrated into daily human life. The availability of these models to the broader developer community is expected to spur further innovation in the field of robotics.