Memory and Context Management, Glossary, Textbook of AI

Memory management (sometimes "context engineering") is one of the central engineering challenges of agentic AI. A frontier LLM in 2025 has 200k–2M tokens of context, but long-running agents quickly exceed even those bounds. Production systems borrow the memory hierarchy metaphor from operating systems.

Three tiers

Tier	Substrate	Lifespan	Capacity
Working / short-term	LLM context window	One inference call	200k–2M tokens
Long-term semantic	Vector database	Permanent	Unbounded
Episodic	Summarised past conversations	Session-scoped or permanent	Hundreds–thousands of summaries

Working memory tactics

System prompt + recent turns, the canonical chat layout.
Scratchpad, <thinking>...</thinking> blocks (e.g. chain-of-thought or reasoning models).
Tool result truncation, drop verbose tool outputs after they are summarised.
Dynamic prompting, re-inject critical state at the bottom of context (recency bias).

Long-term memory

Implemented as a vector DB keyed by embeddings:

def remember(text):
    emb = embed(text)
    vector_store.upsert(id=uuid(), embedding=emb, metadata={"text": text})

def recall(query, k=5):
    return vector_store.search(embed(query), top_k=k)

Triggers for storage are typically:

User states a fact about themselves.
Agent learns a new tool or skill.
Conversation crosses a summary boundary.

Episodic memory

For long conversations, agents periodically summarise older turns:

[System] [Summary of turns 1-50] [Verbatim turns 51-100] [Current user message]

Implementations include:

MemGPT / Letta (Packer et al. 2023), virtual context paging à la OS virtual memory.
Mem0, managed long-term memory service.
Zep, temporal knowledge graphs of conversation entities.

Anthropic's Memory Tool (2025)

Claude 4.5+ ships a structured memory tool: a file-system-like store with create_file, read_file, update_file, delete_file operations. The agent decides what to write and when to read; persistence is across sessions. This effectively turns the file system into the long-term memory.

Compaction

When context fills up, compaction rewrites the history into a shorter summary. Claude Code, Codex, and Cursor all implement automatic compaction at ~80% context usage. Naïve compaction loses information; better systems keep recent verbatim plus a structured summary of older turns.

Open problems

What to remember, agents store too much (noise) or too little (forgetting).
Retrieval drift , embedding similarity ≠ relevance.
Memory poisoning, an adversary plants false "memories" via prompt injection.
Cross-session identity, should the assistant remember you between accounts? Privacy-vs-utility tradeoff.

Citation

Packer, C. et al. (2023). MemGPT: Towards LLMs as Operating Systems. arXiv:2310.08560.

Discussed in:

Chapter 15: Modern AI, Modern AI

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).