Skip to content

From Context Window to Multi-Context Workflow: Building Reliable AI Agent Memory and State Management

A multi-context workflow lets an AI agent span several independent context windows (picking up work across sessions and across days) by externalizing memory, progress artifacts, and verification scripts into files the agent rebuilds on each run. Even with Claude Sonnet 4.5's 200K-token context window (docs.claude.com), long-running agents still need this pattern: single sessions truncate, lose state between runs, and re-explore work that's already done.

As AI agents (Claude Agent SDK, LangChain/AutoGen, OpenAI Agents, and friends) show up more in everyday engineering work, the single-conversation context window keeps getting in the way: one token window can't hold a long task, and nothing carries state across separate runs. Long-running agents need multi-context workflows and explicit state management.


1. Why Multi-Context Workflow?

In traditional LLM usage:

  • Models can only work within a single context window;
  • If limits are exceeded, the system will either truncate tokens or return an error;
  • State and progress cannot be automatically preserved across multiple sessions.

The core goal of multi-context workflow is:

To enable agents to span multiple independent context windows (e.g., working multiple times per day, or completing complex tasks in stages), while being able to remember, restore, and continue making progress.

Anthropic's docs walk through this for long-running agents: each session has to leave clear "progress artifacts" so the next one doesn't re-explore work that's already done. (Anthropic)

These workflows typically have two parts:

  1. Initialization (initializer agent): set up the architecture, scripts, logs, and memory files for the multi-stage workflow;
  2. Incremental progress: each run should make small changes → verify → write state → prepare for next use. (Anthropic)

2. State and Context Management in Agent SDK (Using Claude Agent SDK as Example)

Claude Agent SDK (formerly Claude Code SDK) is Anthropic's official agent toolkit. You use it to build agents that can:

  • Read and write files;
  • Call tools;
  • Run commands;
  • Work through tasks iteratively.

From the docs:

Claude Agent SDK provides the same tools as Claude Code (such as reading files, running bash, etc.), and supports context management-related features like memory and sessions. (Claude)

Out of the box the SDK already covers a chunk of multi-window workflow:

Feature Description
Sessions Session lifecycle corresponding to multiple execution phases
Memory & Skills Support for storing and referencing rebuildable long-term state
Subagents Support for launching sub-agents for specific tasks
MCP (Model Context Protocol) Standardized context exchange between LLM and tools (files, command results) (Claude)

MCP (Model Context Protocol) is an open protocol for sharing state between LLMs and applications — reading external files, tool outputs, and so on. (Wikipedia) For a concrete OpenClaw plugin built with these patterns, see building a reminder plugin for OpenClaw.


3. Multi-context Window Workflow | Core Design Principles

(1) Initial Prompt and Architecture Setup

Don't have the first context window jump straight into work. Set things up first:

  • Initialization scripts (e.g., init.sh);
  • Initial progress files like progress.txt or memory.md;
  • Test scripts and verification patterns. (Anthropic)

This is the "give the agent scripts and a memory layout it can rebuild state from" pattern.


(2) Maintaining Incremental Progress

Each session should make progress, not reinvent the wheel:

Clearly record each change → fully execute verification → write changes and state to state management files. (Anthropic)

Next startup:

  1. Read the state stored in files;
  2. Use tools to rebuild context;
  3. Push forward incrementally.

Workflow stays sustainable and you avoid context overflow or stale guesses.


(3) Sub-Agent and Context Partitioning Design

For more complex, multi-role tasks:

  • Launch different agents (sub-agents) for different subtasks;
  • Each sub-agent works within its own context window;
  • Coordinate progress through shared external state (shared memory, shared files, message queues). (Vellum AI)

Sub-agents fit when you want each agent on a focused subtask, tighter scope per Context Window, and parallelism.


4. State Management Best Practices

Under multi-context workflows, state management matters more than in single-context use. A few patterns from industry and research:


① Use Structured Formats to Record State (Structured State)

For things like task status, test results, and TODOs, structured formats (JSON, YAML) beat free text because:

✔ Easy for programs to read and verify ✔ Convenient for version comparison (diff) ✔ Can be directly parsed by agents using MCP or tools ✔ Reduces LLM parsing ambiguity

For example:

{
  "tests": {
    "unit": "pass",
    "integration": "fail"
  },
  "todo": [
    "Correctly implement API",
    "Re-run integration tests"
  ]
}

Most serious agent memory work pushes you toward this kind of structured state. (Medium)


② Use Unstructured Text to Supplement Progress Notes

Structured formats are great for state variables, but design decisions, downstream risks, and the path you took to a choice still belong in plain prose (Markdown). It helps the next agent understand the why, not just the what.

Use both. (Medium)


③ Use Version Control to Track State Changes (Git as State Tracking)

Instead of asking the model to reconstruct history from context fragments, lean on Git for:

✔ Checkpoints ✔ Diff (difference comparison) ✔ Rollback ✔ Blame (responsibility tracking)

And file-level state is cheaper than padding prompts with more tokens. Anthropic's agent docs call out version control as one of the things that keeps long-running agents reliable. (Anthropic)


④ Emphasize Incremental Progress

Don't expect an agent to finish a long task in one shot. Always:

  1. Make small changes
  2. Verify
  3. Write state
  4. Prepare for next session

Incremental progress is the core principle for reliable multi-window workflows. (Anthropic)


5. Memory Engineering in Multi-Agent Systems

A recent piece on multi-agent memory engineering puts it bluntly:

Multi-agent system failures are often not about communication problems, but about memory/state management failures. Agents that cannot reliably share consistent state will lead to duplicate work, version conflicts, and low efficiency. (MongoDB)

To address this, they propose five key memory engineering principles for developing multi-agent systems:

  1. Persistent storage and state management
  2. Retrieval intelligence
  3. Atomic consistency updates
  4. Conflict resolution
  5. Cost-performance optimization

These overlap with the structured-state, Git, and incremental-progress points above.


Multi-context window workflows aren't a prompt-tuning problem. They need:

✔ Structured state ✔ Sustainably rebuildable artifacts (memory files, test results, git commits) ✔ Clear incremental workflow ✔ Tool chain (filesystem, shell, git, testing) integrated with models ✔ Shared memory / coordination patterns (for multi-agent)

These practices show up across papers, industry write-ups, and frameworks like Claude Agent SDK and LangChain/AutoGen.

As MCP and context orchestration mature, truly executable, interruptible, restartable agent workflows will become standard engineering practice — instead of leaning on ever-bigger context windows as a "token duplication trick".


References

  1. Claude Agent SDK Overview — Official Anthropic Agent SDK documentation. (Claude)
  2. Effective context engineering for AI agents — Anthropic's design philosophy on context management (context awareness). (Anthropic)
  3. Medium: The Ultimate Guide to LLM Memory — Conceptual memory best practices. (Medium)
  4. Ultimate LLM Agent Build Guide — Multi-agent workflow & context engineering recommendations. (Vellum AI)
  5. Effective harnesses for long-running agents — Official experience on cross-context session workflows. (Anthropic)
  6. How and when to build multi-agent systems — LangChain on multi-agent workflow and context retrieval. (LangChain Blog)
  7. Memory engineering for multi-agent systems — Deep dive into multi-agent memory architecture. (MongoDB)
  8. Model Context Protocol (MCP) — API interaction standard using JSON-RPC for state and context exchange. (Wikipedia)