Claude 3.5 Sonnet Computer Use, Glossary, Textbook of AI

Computer Use is a capability Anthropic introduced for Claude 3.5 Sonnet (new) on 22 October 2024. It allows the model to drive a desktop environment by repeatedly receiving a screenshot and emitting structured tool calls for cursor movement, clicking, typing, and keyboard shortcuts.

Mechanism. The developer hosts a sandboxed virtual machine and exposes three tools: a computer tool for screenshots and input events, a bash tool for shell commands, and a text_editor tool for file edits. On each turn Claude receives the current screenshot, reasons about what to do next, and returns a tool call. The host executes it, captures a fresh screenshot, and feeds it back. This forms a perception-action loop driven entirely by vision plus language.

Capabilities. In the launch demos Claude filled web forms, navigated spreadsheets, looked up information across multiple browser tabs, and wrote and ran code in a terminal. On the OSWorld benchmark for computer agents Claude 3.5 Sonnet scored 14.9% in screenshot-only mode, compared to roughly 7-8% for prior systems and over 70% for humans, signalling that the capability was real but immature.

Significance. Computer Use was the first general-purpose agentic affordance shipped by a frontier lab as a public API. It bypassed the need for bespoke integrations: any application with a graphical interface became automatable, so long as Claude could see and click. This unlocked workflow automation, accessibility tools, and end-to-end software testing without per-app engineering.

Foundational role. Through 2025 the pattern was widely copied. OpenAI shipped Operator in January 2025 with a similar screenshot-plus-actions design, Google added comparable capabilities to Gemini, and open-source projects such as Open Interpreter and browser-use followed. Computer Use also became the backbone of more specialised agent products: Anthropic's own Claude Code uses related tool-call patterns for software engineering, and the Model Context Protocol generalised the idea to arbitrary structured tools.

Limits and risks. Anthropic's launch documentation emphasised that the capability was research-grade: Claude was prone to misclicks, struggled with dynamic interfaces, and could be manipulated by prompt injection embedded in pages it viewed. The recommended deployment was a tightly sandboxed VM with no access to credentials or sensitive data. These concerns drove parallel work on agent safety, including capability gating, action confirmation, and the constitutional AI updates that landed in Claude 4.

Computer Use marked the operational transition from chatbots to agents and is the single most influential capability release in the agentic-AI wave of 2024-2025.

Discussed in:

Chapter 16: Ethics & Safety, Agents

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).