Skip to content
D1

Context Window as Working Memory

Part of ADHD Patterns

The constraint that defines AI context windows is the same constraint that defines ADHD working memory - limited capacity that must be actively managed - and the solutions that work for AI systems are the solutions ADHD brains have been inventing for decades.

The Constraint That Unites

When I first understood how AI context windows work, I felt a shock of recognition. This wasn't new information--it was familiar information in a new domain.

Context windows are working memory.

The constraint that defines AI system architecture--limited capacity that must be actively managed--is the same constraint that defines ADHD experience. The solutions that work for AI systems are the solutions I'd been inventing for myself for decades.

This isn't metaphor. It's structural isomorphism. The parallel runs deep enough to be genuinely useful.


The ADHD Working Memory Experience

For those without ADHD, working memory often feels transparent. You think about something, it stays available, you use it. The system works invisibly.

For ADHD brains, working memory is conspicuously limited:

Capacity constraints: Hold fewer items simultaneously. Complex multi-part instructions overflow.

Decay rate: Information fades faster. Turn away from the task for a moment, lose the thread.

Interference sensitivity: New information pushes out old. Someone interrupts, and the previous context evaporates.

Load variability: Capacity fluctuates with state. Tired or stressed, even less is available.

These aren't character flaws. They're architectural characteristics. Understanding them as constraints--like engineering constraints--enables systematic compensation.


The AI Context Window Parallel

AI context windows exhibit the same structural properties:

Capacity constraints: Fixed token limits. Long conversations or complex tasks overflow.

Attention decay: Information in the middle of long contexts gets less attention. The "lost in the middle" phenomenon.

Interference patterns: New content can dilute or override earlier content. Recent tokens dominate.

Variable effective capacity: Depending on content, the effective usable capacity varies.

The parallel isn't perfect, but the structure matches. Both systems face limited capacity for active processing. Both must manage what occupies that capacity. Both benefit from similar compensations.


Shared Failure Modes

Chapter 04 catalogued nine failure modes for AI agents. Every one maps to ADHD experience:

Context overflow: Trying to hold too much leads to losing track of everything. ADHD: The feeling when someone gives you five things to remember and you forget all of them.

Lost in the middle: Information present but not attended to. ADHD: The instruction that was definitely given but somehow didn't register.

Instruction decay: Original directions fade as recent activity dominates. ADHD: Starting with clear intentions that blur into "wait, what was I doing?"

Loop traps: Repeating ineffective approaches without trying alternatives. ADHD: The hour spent trying the same thing that isn't working.

Goal drift: Losing sight of the original objective while pursuing tangents. ADHD: The classic "I came into this room for... something."

Recovery failures: Struggling to restart after interruption. ADHD: The lost momentum after any break.

The experience is identical. The mechanisms are parallel. The solutions apply in both directions.


Why the Parallel Matters

Understanding ADHD as a context window constraint changes how you approach it:

From deficit to architecture: Not "something wrong with me" but "this is how this system works."

From shame to engineering: Not "why can't I just remember" but "how do I design around this constraint."

From struggle to strategy: Not fighting the architecture but building complementary systems.

This reframe isn't denial. The constraint is real. Life is harder. But seeing it as a design problem makes solutions tractable.


Lessons from AI to ADHD

AI system design offers strategies for ADHD compensation:

Aggressive summarization: Don't try to hold raw details. Compress to essentials. Summary notes instead of complete transcripts.

Strategic positioning: Put critical information where it can't be missed. Visual reminders, persistent lists, environmental cues.

Pinned context: Some information should always be present. Core priorities visible at all times.

Computed context: Don't maintain running context. Reconstruct as needed from external systems.

Checkpoint and restore: Save state frequently. Enable recovery after interruptions.

These aren't just metaphors. They're directly applicable techniques.


Lessons from ADHD to AI

The traffic runs both ways. ADHD adaptations inform AI system design:

Environmental scaffolding: Structured environments reduce cognitive load. Structured prompts reduce agent confusion.

Habit automation: Routines that run without conscious attention. Agent behaviors that don't require explicit instruction each time.

External memory systems: Notes, calendars, knowledge bases. The memory and artifact layers in AI architecture.

Transition rituals: Explicit practices for context switches. Structured handoffs between agents.

Accountability partners: External observers who provide reality checks. Evaluation and monitoring systems for agents.

People who've adapted to ADHD have been solving these problems for life. The solutions generalize.


The Validation of the Parallel

There's something validating in seeing multi-billion-dollar AI systems face the same constraints I face.

It's not that ADHD is good or that struggle is imaginary. The constraint is real and costly.

But the constraint isn't unique or anomalous. It's a fundamental property of systems operating with limited working state. The most advanced AI systems face it. The solutions are engineering solutions, not moral improvements.

If I'd been told at 15 that the fundamental architecture of my brain worked the same way that trillion-parameter language models would work--that my challenges were structural in a way that has optimal compensations--it might have changed how I understood myself.


Moving Forward

Understanding the parallel changes the question.

Old question: Why can't I just remember things / pay attention / stay focused like everyone else?

New question: Given working memory constraints, what infrastructure compensates effectively?

The new question has answers. Good answers. Systematic answers that compound over time.

The next chapters explore those answers: external memory as compensation, pattern recognition as strength, hyperfocus as optimal batch processing. The parallel continues through each.