Google released computer use for Gemini 3.5 Flash on its official blog, enabling the model to interact directly with a computer screen through mouse movements, clicks and keyboard input. The feature surfaced in an HN thread that received 40 points and 12 comments.
What Computer Use Enables
Gemini 3.5 Flash can now observe a desktop or browser window and execute actions such as moving the cursor, clicking buttons, typing text and scrolling. The capability runs through a controlled environment where the model receives screenshots and returns structured action commands.
This matches the pattern introduced by Anthropic earlier in 2024 but targets the lighter Gemini 3.5 Flash checkpoint rather than a larger flagship model.
Access and Integration Paths
Developers can test the feature through the Gemini API by enabling the computer-use tool parameter in requests. Google provides a sandbox environment and sample code for both web and desktop scenarios.
Early HN comments note that the API currently requires explicit user approval for each session and logs every action taken.
Comparison with Existing Tools
| Feature | Gemini 3.5 Flash | Claude 3.5 Sonnet Computer Use | OpenAI Operator (preview) |
|---|---|---|---|
| Base model size | Flash (light) | Sonnet (mid) | o3-mini |
| Screen resolution | Up to 1080p | Up to 1440p | 1080p |
| Latency per action | Not disclosed | ~1.5–3 s | ~2 s |
| API availability | Public | Public | Waitlist |
| Sandbox provided | Yes | Yes | Limited |
Gemini 3.5 Flash offers lower per-token cost than Claude Sonnet while adding the same core interaction primitives.
Trade-offs Reported So Far
- Positive: Lower cost per action and faster context handling on long desktop sessions.
- Negative: Fewer safety guardrails than Claude in early tests; some users reported unintended clicks on system dialogs.
- Positive: Native support for both browser tabs and native desktop applications in one endpoint.
- Negative: No public benchmark numbers released yet, making direct speed comparisons difficult.
Recommended Use Cases
Teams building internal automation scripts or research agents gain the most immediate value. Production customer-facing agents should wait for more detailed safety documentation.
Skip this release if your workload requires pixel-perfect accuracy on high-resolution displays or strict audit trails beyond basic logging.
Current Verdict
Gemini 3.5 Flash computer use delivers the same interface-control capability at lower cost than Claude, but lacks published benchmarks and mature safety tooling. Early adopters can start testing via the Gemini API today while monitoring the next round of safety updates.

Top comments (0)