Design a Context Management System for LLM Agents — Practice

Design a context management system for an LLM-powered agent that needs to operate within a limited context window while drawing on a large and diverse set of information sources: system prompts, user conversation history, retrieved documents, tool outputs, agent scratchpad notes, and task instructions. The system must decide what to include, what to compress, and what to exclude — dynamically, for each interaction.

Constraints

LLM context window: 100K tokens (but performance degrades on information in the middle of the window)
System prompt: 2K tokens (fixed, always included)
Conversation history: can grow to 200K+ tokens in long sessions
Retrieved documents: 10-50 documents per query, average 1K tokens each
Tool outputs: can range from 50 tokens to 10K tokens per call
Target: assemble context in under 200ms per agent step

Design Requirements

Design the context budget allocation — how much of the window goes to each source?
Design the dynamic context assembly pipeline that runs before each LLM call.
Explain your strategy for compressing conversation history without losing critical information.
Design the tool output integration — how to decide which tool outputs are relevant enough to include.
Address what happens when the context overflows — graceful degradation strategies.