A Gemini-colored cursor moves through a browser window inside a guarded sandbox
A Gemini-colored cursor moves through a browser window inside a guarded sandbox
+ Google News

Gemini 3.5 Flash gets a Computer Use tool for agent workflows

Google's Gemini API now previews Computer Use with browser, mobile, and desktop environments, making execution safety and logging part of the developer workflow.

2 minutes ago

Google added public-preview Computer Use support to the Gemini API on June 24. The release notes say the feature works with Gemini 3.5 Flash and includes simplified actions with intents, built-in support for browser, mobile, and desktop environments, configurable safety policies, and advanced prompt injection detection.

The important detail is where execution happens. The model does not magically take over a browser on its own. Google’s docs describe a loop where the application sends the model a prompt, configuration, and screenshot; the model returns a function call with an action; and the developer’s client executes that action in the target environment.

That split makes Computer Use a developer-infrastructure story as much as a model story.

The model suggests actions; the client owns execution

Computer Use turns a model response into a proposed interaction with a graphical environment. In practice, that can mean moving through a website, filling a form, clicking controls, or using an app workflow. The model reads the screen and the instruction, then suggests the next action.

Google’s docs put responsibility for execution on the client. That is the right architecture for a risky capability. A model can propose a click, but the application decides whether to carry it out, where it can navigate, what can be typed, and what gets logged.

For builders, this changes the work from “call a model” to “operate a controlled agent loop.” The loop needs screenshots, state handling, tool execution, failure recovery, and a policy layer around what the agent can do.

The safety guidance is not optional

Google’s docs list several practices that should be treated as baseline engineering, not launch-page fine print. They recommend running the agent in a secure execution environment, sanitizing user-generated prompt text, using guardrails and safety APIs, applying allowlists or blocklists, keeping detailed logs, and starting from a consistent environment.

Those recommendations map directly to the failure modes of computer-use agents. A hidden prompt injection in a webpage can try to redirect the agent. A logged-in browser can expose private data. A pop-up can cause the model to misread the task. A broad navigation scope can turn a routine workflow into an uncontrolled action path.

The new Gemini feature includes prompt injection detection, but Google is careful not to present that as a replacement for sandboxing and execution controls. That is the practical read: detection helps, but the deployment boundary matters more.

The Computer Use page lists Gemini 3.5 Flash as the recommended model for the feature. The docs say it supports browser, mobile, and desktop environments, includes streamlined actions with intents, configurable safety policies, and prompt injection detection.

The “intent” detail is useful. If the model can explain the reasoning behind each step, the client and the human reviewer have more context for whether an action makes sense. That can help with debugging, auditing, and deciding when to pause for confirmation.

The model list also includes Gemini 3 Flash Preview and a legacy Gemini 2.5 Computer Use preview. That suggests Google is moving the feature from a narrow experimental model into the current Gemini line.

The first use cases are controlled workflows

The best early use cases are not open-ended browsing sessions. They are bounded workflows: filling repetitive forms, testing web application flows, comparing product pages, or collecting structured information from known sites.

Those tasks have enough structure for a computer-use agent to help, and enough risk that the environment should still be fenced. A browser agent that can click anywhere is much less trustworthy than one that operates inside a known site, with logs and guardrails, on data that can be checked afterward.

That is the product consequence of Google’s preview. Computer Use is becoming a normal model tool, but the useful implementations will look more like controlled automation systems than autonomous web workers.

The next checkpoint is how developers wire this into real evals. A good Computer Use deployment will measure not only task completion, but wrong clicks, prompt-injection hits, blocked actions, retries, and human interventions. That is where the feature moves from demo to operational tool.

Sources

The AI Feed Desk

The AI Feed Desk

Editorial desk

The AI Feed Desk tracks AI provider updates, model releases, agent tooling, and enterprise adoption, turning fast-moving announcements into source-linked context for builders and operators.

Noticed a typo, incorrect information, or translation error?

Tell us so we can fix it.

Help Improve This Article

Related Articles

Gemini 3.5 Flash beats last year's Pro on the work builders ship

Google's Gemini 3.5 Flash beats last year's 3.1 Pro on coding and agentic benchmarks at ~40% lower cost — with reasoning and 1M-context limits worth testing.

The AI Feed Desk

By The AI Feed Desk

Gemini API adds TTS streaming as media model shutdown dates arrive

Google's Gemini API changelog added streaming speech generation for a preview TTS model and set near-term shutdown dates for older Imagen and Veo model IDs.

The AI Feed Desk

By The AI Feed Desk

Google brings Gemini models to Apple developers

Google says Apple developers can call Gemini models through Apple's Foundation Models framework and use Gemini inside Xcode.

The AI Feed Desk

By The AI Feed Desk

Google rolls out Gemini Omni Flash for video generation

Gemini Omni Flash turns mixed inputs into video and is rolling into Gemini, Flow, YouTube Shorts, and YouTube Create before the API arrives.

The AI Feed Desk

By The AI Feed Desk

Google releases DiffusionGemma for faster local text generation

Google's DiffusionGemma is an experimental open text-diffusion model that generates blocks of text in parallel for lower-latency local workflows.

The AI Feed Desk

By The AI Feed Desk