Multi-agent orchestration is the architectural pattern in which $N \ge 2$ LLM agents collaborate. Each agent has its own context window, system prompt, tool set, and (often) model. Coordination is achieved through messages, shared state, or a supervising agent.
Why multiple agents
- Specialisation, a "Researcher" agent with web-search tools, a "Coder" agent with Python, a "Critic" agent with no tools but a strong system prompt.
- Context segmentation, long tasks blow up a single context window; sub-agents work in their own windows and report summaries.
- Parallelism, independent sub-tasks run concurrently.
- Adversarial dynamics, debate between agents (Du et al. 2023) improves factuality.
- Role-play emergent behaviour, Park et al.'s Generative Agents (2023) showed believable simulated societies.
Topologies
| Topology | Description | Frameworks |
|---|---|---|
| Pipeline | Linear chain: A → B → C | LangChain Sequential |
| Supervisor | One leader spawns and gathers from workers | OpenAI Swarm, CrewAI, Claude Code's sub-agents |
| Group chat | All agents see a shared transcript; a router picks the next speaker | AutoGen GroupChat |
| Hierarchical | Tree of supervisors and workers | MetaGPT, AutoGen |
| Debate | Two or more agents argue, judge picks winner | Du et al. 2023 |
Pseudocode (supervisor pattern)
def supervisor(task):
plan = llm("plan", task)
results = []
for step in plan:
agent = pick_specialist(step)
results.append(agent.run(step))
return llm("synthesize", task, results)
Frameworks
- AutoGen (Microsoft, 2023), group chat + tool use.
- CrewAI, role-based, production-focused.
- MetaGPT, software-team simulation, SOP-driven.
- OpenAI Swarm (2024), minimal handoff-based orchestration.
- LangGraph (LangChain, 2024), stateful directed graph of agents.
When not to use multi-agent
Empirical lesson from 2024–2025 production: multi-agent often hurts. Anthropic's 2024 post "Building effective agents" and Cognition's "Don't Build Multi-Agents" both argue that:
- Communication overhead dominates.
- Errors compound across agents.
- A single capable model with good memory management and tool use usually outperforms a pipeline of specialised agents.
The remaining use cases are true parallelism (e.g. searching 100 documents simultaneously) and strict role separation for safety.
Related terms: AutoGen, CrewAI, MetaGPT, Tool Use, Memory and Context Management
Discussed in:
- Chapter 15: Modern AI, Modern AI