Glossary

OpenAI Codex (2025 generation)

OpenAI Codex (2025 generation) is OpenAI's family of agentic software-engineering products launched through 2025. The name reuses an earlier brand: the original Codex (2021) was a fine-tuned GPT-3 model trained on public code that powered the first version of GitHub Copilot. The 2025 Codex is a different beast: a cloud-based agent product built on top of o3-class reasoning models, with dedicated infrastructure for long-horizon coding tasks.

Components. The 2025 Codex line comprises several deployment surfaces:

  • Codex (cloud): a hosted environment in chatgpt.com/codex where the user describes a task, Codex spins up a sandboxed container with the repository checked out, and the agent works for minutes to hours, returning a pull request.
  • Codex CLI: a terminal-based local agent, comparable to Anthropic's Claude Code, that runs against the user's own machine.
  • Codex IDE extensions for VS Code and JetBrains IDEs that delegate longer tasks to the cloud agent while handling short edits locally.

Underlying models. Codex agents run over reasoning-trained models in the o3, GPT-5, and successor families, often a Codex-specialised variant fine-tuned on software-engineering trajectories. The agent issues tool calls for shell, file editing, web browsing and Git operations, and reasons explicitly between steps using thinking tokens.

Performance. On SWE-Bench Verified Codex (cloud) is among the leaders, posting scores in the 70%+ band through 2025. On the OpenAI internal "engineering tasks" benchmark and on customer-defined tasks, Codex is positioned as competitive with Claude Code and Devin.

Distinction from 2021 Codex. The earlier model was a single-shot autoregressive code completer with no notion of tools, agency or planning. It was retired in March 2023 in favour of GPT-3.5 / GPT-4. The 2025 Codex resurrected the name to mark OpenAI's return to a software-engineering-focused product, but the underlying technology shares almost nothing with its namesake: it is a reasoning-model-driven agent, not a code-completion model.

Position in the field. As of early 2026, the competitive landscape for cloud coding agents has three major frontier-lab entrants: Anthropic Claude Code, OpenAI Codex (2025) and Google Jules, plus Devin and a long tail of open-source agents (OpenHands, Aider, Plandex). Each has converged on a similar architecture: reasoning model, sandbox, file/Git/shell tools, optional browser, MCP for extensibility, and pull-request-shaped output. Differentiation is mostly in reliability on multi-hour tasks, integration with company workflows, and pricing.

The Codex story illustrates a broader 2025 trend: products are increasingly defined by the scaffolding and harness around the model rather than the model alone, even though the model remains the primary determinant of capability.

Related terms: SWE-Bench, Devin / AI Software Engineer, OpenAI o3, Reasoning Model Training, Model Context Protocol

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).