Gemini for Chrome gets a second AI agent to watch over it
Google is deploying a second AI model to monitor its Gemini-powered Chrome browsing agent after acknowledging the agent could be tricked into taking unauthorized actions through prompt injection attacks.
“We’re introducing a user alignment critic where the agent’s actions are vetted by a separate model that is isolated from untrusted content,” the company said in a blog post about the addition. If the critic determines an action doesn’t match what the user asked for, it blocks the action, Google said.
“The primary new threat facing all agentic browsers is indirect prompt injection,” Chrome security engineer Nathan Parker wrote in the post, describing a situation where an agent is prompted to process information that then seeks to modify the initial prompt.
The Gemini-powered browsing agent, launched in September and currently in preview, can navigate websites, click buttons, and fill forms while users are logged into email, banking, and corporate systems. Malicious instructions hidden in web pages, iframes, or user-generated content could “cause the agent to take unwanted actions such as initiating financial transactions or exfiltrating sensitive data,” Parker wrote.
That’s where the user alignment critic comes in: The second model reviews each proposed action before Chrome executes it, acting as what Parker called “a powerful, extra layer of defense against both goal-hijacking and data exfiltration.”
Why prompt injection is hard to stop
Prompt injection has emerged as the top vulnerability in AI systems over the past year. OWASP found it in 73% of production AI deployments it assessed in 2024, ranking it the number one risk in its list of threats to large language model applications.
The UK’s National Cyber Security Centre warned Sunday that prompt injection attacks may never be fully mitigated because LLMs can’t reliably distinguish between instructions and data. The agency called it a “confused deputy” vulnerability, where a trusted system is tricked into performing actions on behalf of an untrusted party.
Researchers have already demonstrated the threat. In January, attackers embedded instructions in a document that caused an enterprise AI system to leak business intelligence and disable its own safety filters. Security firm AppOmni disclosed last month that ServiceNow’s AI agents could be manipulated through instructions hidden in form fields, with one agent recruiting others to perform unauthorized actions.
For Chrome, the stakes are particularly high. A compromised browsing agent would have the user’s full privileges on any logged-in site, potentially bypassing the browser’s site isolation protections that normally prevent websites from accessing each other’s data.
Google’s two-model defense
To address these risks, Google’s solution splits the work between two AI models. The main Gemini model reads web content and decides what actions to take. The user alignment critic sees only metadata about proposed actions, not the web content that might contain malicious instructions.
“This component is architected to see only metadata about the proposed action and not any unfiltered untrustworthy web content, thus ensuring it cannot be poisoned directly from the web,” Parker wrote in the blog. When the critic rejects an action, it provides feedback to the planning model to reformulate its approach.
The architecture is based on existing security research, drawing from what’s known as the dual-LLM pattern and CaMeL research from Google DeepMind, according to the blog post.
Google is also limiting which websites the agent can interact with through what it calls “origin sets.” The system maintains lists of sites the agent can read from and sites where it can take actions like clicking or typing. A gating function, isolated from untrusted content, determines which sites are relevant to each task.
The company acknowledged this first implementation is basic. “We will tune the gating functions and other aspects of this system to reduce unnecessary friction while improving security,” Parker wrote.
Beyond the user alignment critic and origin controls, Chrome will require user confirmation before the browsing agent navigates to banking or medical sites, uses saved passwords through Google Password Manager, or completes purchases, according to the blog post. The browsing agent has no direct access to stored passwords.
A classifier runs in parallel checking for prompt injection attempts as the agent works. Google has built automated red-teaming systems generating malicious test sites, prioritizing attacks delivered through user-generated content on social media and advertising networks.
Grappling with an unsolved problem
The prompt injection challenge isn’t unique to Chrome. OpenAI has called it “a frontier, challenging research problem” for its ChatGPT agent features and expects attackers to invest significant resources in these techniques.
Gartner has gone one step further and advised enterprises to block AI browsers in their systems. The research firm warned that AI-powered browsing agents could expose corporate data and credentials to prompt injection attacks.
The NCSC took a similar position, urging organizations to assume AI systems will be attacked and to limit their access and privileges accordingly. The agency said organizations should manage risk through design rather than expecting technical fixes to eliminate the problem.
Chrome’s agent features are optional and remain in preview, the blog post said.Gemini for Chrome gets a second AI agent to watch over it – ComputerworldRead More