Beyond the Context Window
The context window is working memory. Everything the model uses for a single inference pass lives there. But real tasks span multiple inference passes, multiple sessions, days or weeks of work.
Memory systems extend agent capabilities beyond immediate context. They store information that might be relevant later, retrieve it when needed, and evolve as the agent learns. Without memory, every session starts from zero.
This isn't about storing everything forever. It's about building systems that know what to remember, how to find it, and when to forget it.
The Four-Layer Architecture Revisited
Chapter 04 introduced the four-layer memory architecture. Let's dig deeper into each layer:
Layer 1: Working Context
The tokens actually sent to the model. This layer is:
- Assembled fresh for each inference
- Optimized for the immediate task
- Subject to strict capacity limits
The critical insight: working context is computed, not accumulated. You don't grow it by appending--you compile it by selecting what's relevant now.
Layer 2: Sessions
Structured logs of everything that happened in the current interaction. User messages, agent responses, tool calls, results, errors. Sessions provide:
- Complete audit trail
- Ability to resume if interrupted
- Raw material for summarization
Sessions are model-agnostic. The same session can drive different models on different days. This portability matters for maintenance and experimentation.
Layer 3: Memory
Searchable knowledge that persists across sessions. This includes:
- Insights extracted from past interactions
- Learned patterns and strategies
- User preferences and constraints
Memory is retrieved, not loaded. You don't copy it into context by default--you query it when relevant topics arise.
Layer 4: Artifacts
Large objects stored by reference. Codebases, documents, database contents. These are:
- Too large to fit in context
- Accessed through pointers and queries
- Often shared across multiple agents
Artifacts are the external world the agent acts upon.
Session Design Patterns
Sessions are the primary unit of interaction state. How you design them affects everything downstream.
Event sourcing: Store every state change as an event. The current state is derived by replaying events. This provides complete history and enables time-travel debugging.
Checkpoint + delta: Store periodic full snapshots plus changes since the last snapshot. Faster state reconstruction but incomplete history.
Summary + recent: Keep compressed summaries of older content plus full detail for recent turns. Balances context preservation with capacity constraints.
Branching: Support multiple paths from the same point. User says "let's try approach B instead"--branch from where we diverged rather than starting over.
The choice depends on your use case. Long-running coding assistants benefit from event sourcing--you want to trace exactly how the code evolved. Quick Q&A interactions might need only summary + recent.
Memory Search Strategies
When does past experience become relevant? Memory search determines which stored information surfaces.
Semantic search: Embed the current context and find similar past content. Works for conceptual relevance but can miss exact keyword matches.
Keyword search: Full-text search for specific terms. Finds exact mentions but misses conceptually related content.
Hybrid search: Combine semantic and keyword scores. More robust coverage but more complex to tune.
Recency weighting: More recent memories get priority. Useful when recent context is more likely to be relevant.
Graph traversal: Follow relationships between entities. "What do we know about X's dependencies?"
The best systems combine strategies. Semantic search finds the neighborhood; keyword search pinpoints within it; recency breaks ties; graph traversal expands context.
What to Remember
Not everything is worth storing. Aggressive memory leads to noise overwhelming signal. The art is recognizing what matters.
Worth remembering:
- User preferences that apply across sessions
- Domain knowledge learned through interaction
- Strategies that worked for similar problems
- Errors and their root causes
Not worth remembering:
- Exact conversation transcripts (summarize instead)
- Temporary context that won't recur
- Information easily re-retrieved from tools
- Failed approaches without generalizable lessons
The filter matters. A memory system that stores everything becomes a search problem. A memory system that stores nothing fails to learn. The goal is selective retention of generalizable knowledge.
State Synchronization
When multiple agents share state, synchronization becomes necessary. Without it, agents act on stale information or conflict with each other.
Lock-based: Agents acquire locks before modifying shared state. Prevents conflicts but creates bottlenecks.
Optimistic concurrency: Agents proceed without locks but check for conflicts before committing. More throughput but requires conflict resolution.
Event-driven: Agents publish changes as events. Others subscribe and update their views. Eventual consistency rather than immediate.
Leader-follower: One agent owns state; others read but request changes through the leader. Simplifies coordination at cost of bottleneck.
No approach is universally best. The right choice depends on access patterns, conflict frequency, and tolerance for staleness.
Forgetting
Memory systems that never forget eventually fail. Storage fills up. Search quality degrades as noise accumulates. Old information contradicts current reality.
Intentional forgetting is a feature:
Expiration: Information becomes stale after a period. Remove or archive it automatically.
Relevance decay: Information accessed frequently stays; neglected information fades.
Contradiction resolution: When new information conflicts with old, update the old rather than storing both.
Consolidation: Merge multiple similar memories into one generalized representation.
The brain doesn't store everything forever. Neither should memory systems. The goal is maintaining an accurate, useful model of what matters--which requires letting go of what doesn't.
The ADHD Parallel Deepened
Memory management for AI agents mirrors compensation strategies for ADHD brains:
External state representation: Writing things down because working memory can't hold them. Agents use artifact storage for the same reason.
Retrieval cues: Environmental triggers that bring relevant information to mind. Semantic search serves the same function.
Forgetting the unimportant: Not a bug--necessary for managing capacity. Both systems need to filter rather than accumulate.
Structured storage: Organizing information so it can be found later. File systems, knowledge graphs, note-taking systems all serve this function.
The difference: ADHD brains evolved these compensations through trial and error over years. AI systems can implement them by design from the start.
Implementation Considerations
Building memory systems involves practical trade-offs:
Storage backend: Databases, vector stores, file systems, or hybrid approaches. Each has performance characteristics for different access patterns.
Indexing cost: Rich indexing enables fast retrieval but adds overhead on writes. Balance based on read/write ratio.
Consistency model: Strong consistency simplifies reasoning but limits scale. Eventual consistency scales but requires careful design.
Privacy and security: Memories may contain sensitive information. Access control, encryption, and retention policies matter.
Observability: You need to understand what's in memory and how it's being used. Dashboards, search interfaces, and audit logs help.
Memory systems are infrastructure. They require the same care as any production system: monitoring, backup, capacity planning, performance tuning.
Related: D2 explores this theme from the ADHD perspective. A3 covers how tools interact with memory systems. B6 discusses knowledge graph evolution.