Designing Episodic Memory for Agents
Build the episodic memory layer that stores specific past events and
Episodic memory is the agent's memory of specific events. The conversation last Tuesday. The decision made in March. The document drafted three weeks ago. Distinct from semantic memory (general facts about the user) and from working memory (current context).
## Key Points
- A complete conversation turn or session.
- A specific user decision.
- A discrete project milestone.
- A meaningful agent action with its result.
- **Timestamp.** When did this happen?
- **Participants.** Who was involved?
- **Content.** What happened?
- **Outcome.** What was the result?
- **Tags.** Categories for easier retrieval.
- "Note that we decided to use PostgreSQL."
- "Remember that I prefer Tuesday meetings."
- The user uses past-tense success language ("we shipped X").
## Quick Example
```
"Read the following conversation. Extract any events worth
remembering: decisions, preferences expressed, milestones reached,
specific information the user shared. Return as a list of events."
```
```
"On April 15, 2026, the user decided to use PostgreSQL for their
project's database, citing familiarity and tooling support over
MongoDB."
```skilldb get agent-memory-skills/Designing Episodic Memory for AgentsFull skill: 265 linesEpisodic memory is the agent's memory of specific events. The conversation last Tuesday. The decision made in March. The document drafted three weeks ago. Distinct from semantic memory (general facts about the user) and from working memory (current context).
Episodic memory is hard. The naive approach — "store every conversation; retrieve when similar" — produces noisy retrieval. The agent gets confused between what it knows now and what it knew then. Past stale state contaminates current responses.
This skill covers the patterns that make episodic memory useful rather than noisy.
What's an Episode
An episode is a coherent unit of past activity:
- A complete conversation turn or session.
- A specific user decision.
- A discrete project milestone.
- A meaningful agent action with its result.
Episodes have:
- Timestamp. When did this happen?
- Participants. Who was involved?
- Content. What happened?
- Outcome. What was the result?
- Tags. Categories for easier retrieval.
Not everything in a conversation becomes an episode. The agent should be selective. Storing every turn produces a useless mass; storing the moments worth remembering produces useful recall.
Event Extraction
When does an event get extracted into episodic memory?
Manual Triggers
The user explicitly asks to remember:
- "Note that we decided to use PostgreSQL."
- "Remember that I prefer Tuesday meetings."
Easiest case. The user signals what's important; the agent stores it.
LLM-Mediated Extraction
After each session (or each significant turn), an LLM evaluates: is anything in this conversation worth remembering as an event?
"Read the following conversation. Extract any events worth
remembering: decisions, preferences expressed, milestones reached,
specific information the user shared. Return as a list of events."
The output is structured episodes ready to store.
Heuristic Triggers
Specific patterns trigger storage:
- The user uses past-tense success language ("we shipped X").
- The user states a preference ("I prefer X").
- An agent action completed (a meeting was scheduled, an email was sent).
Heuristics catch the obvious cases; LLM-mediated extraction catches the nuanced ones.
Hybrid is best: heuristics handle the volume; LLM extraction handles the residual.
Storage Format
Two main approaches:
Structured
Store episodes as records in a database with explicit fields:
{
"id": "ep-123",
"type": "decision",
"timestamp": "2026-04-15T14:30:00Z",
"participants": ["user", "agent"],
"content": "User decided to use PostgreSQL for the database.",
"outcome": "Decision recorded; will inform future architecture queries.",
"tags": ["database", "decision", "architecture"],
"related_episodes": ["ep-118", "ep-104"]
}
Pros:
- Easy to query (filter by type, time, tag).
- Structured for downstream use (analytics, exports).
- The agent can reason about specific fields.
Cons:
- Requires schema definition; new event types require schema changes.
- Misses nuanced detail that doesn't fit the schema.
Unstructured Vector
Store episodes as natural-language descriptions, embedded for vector search:
"On April 15, 2026, the user decided to use PostgreSQL for their
project's database, citing familiarity and tooling support over
MongoDB."
Pros:
- Flexible; any event can be represented.
- Vector retrieval finds semantically related episodes naturally.
Cons:
- Harder to query structurally ("show all decisions in March").
- Retrieval can miss exact matches.
Hybrid
Most production systems do both: structured fields for the basics + unstructured prose for the nuance, both indexed for retrieval.
{
"id": "ep-123",
"type": "decision",
"timestamp": "2026-04-15T14:30:00Z",
"tags": ["database"],
"summary": "User chose PostgreSQL over MongoDB for the database.",
"details": "After a 30-minute discussion, user decided to go with PostgreSQL...",
"embedding": [0.1, -0.4, ...]
}
Filter on structured fields; vector search on details/embedding; combine.
Retrieval Patterns
When should episodic memory be queried?
On Topic Reference
The user mentions something that might have a past episode. "Remember when we discussed the database?" — query for past episodes about databases.
On Implicit Reference
Less explicit; the user's question might benefit from past context. "What was that thing we tried?" — search for recent activities.
The agent decides. An LLM-mediated query analyzer can determine "is this query likely to need episodic memory?"
As a Tool
The agent has a search_episodes(query) tool. The agent calls it when it thinks past episodes might help.
Most flexible; most aligned with the agent loop pattern.
Avoiding Stale Memory Contamination
The hardest problem in episodic memory: avoiding stale episodes that contaminate current responses.
The user said "I prefer brief responses" three months ago. They've since adjusted their preference. The episodic memory still says "brief." If the agent retrieves and acts on the old episode, it acts wrong.
Mitigations:
Recency Weighting
Recent episodes outweigh old ones. Retrieval scores include a recency factor; older episodes are surfaced only when no recent equivalent exists.
Supersession
When a new episode is recorded that contradicts or updates an old one, mark the old one as superseded. Retrieval skips superseded episodes by default.
{
"id": "ep-123",
"type": "preference",
"content": "User prefers brief responses.",
"superseded_by": null,
"active": true
}
// Later:
{
"id": "ep-456",
"type": "preference",
"content": "User prefers detailed responses with examples.",
"supersedes": "ep-123",
"active": true
}
// ep-123 is now { ..., active: false, superseded_by: "ep-456" }
Confidence
Each episode has a confidence score. Old, unsupported episodes have lower confidence. The agent treats low-confidence memories as suggestions, not facts.
Refresh
Periodically, the agent confirms still-relevant memories with the user. "I noticed you mentioned preferring brief responses a while ago — is that still your preference?"
Annoying if done too often; helpful when done sparingly. Calibrate.
Decay and Expiration
Some episodes should expire. A discussion about a now-finished project may not be relevant six months later.
Decay strategies:
- Time-based. Episodes older than N days are deprioritized; older than M days, deleted.
- Activity-based. Episodes that haven't been retrieved in N weeks are deprioritized.
- Importance-based. High-importance episodes (decisions, milestones) persist longer; routine episodes (small interactions) expire faster.
Don't delete aggressively; user surprise at "why don't you remember that?" is real. Default to long retention; expose deletion to the user.
User-Visible Memory
For trust, expose episodic memory to the user:
- A "memories" page showing what the agent remembers.
- The ability to delete specific memories.
- The ability to mark memories as outdated.
Users who can see and curate their own memory profile trust the system more. Black-box memory feels surveillant; transparent memory feels useful.
Cross-User Isolation
Each user's episodic memory is isolated. The agent doesn't transfer episodes from User A to User B's context.
For multi-user scenarios (team workspaces), episodes can be shared explicitly. The "team" has its own episodic store; team members access it. Individual user episodes stay private.
Cost
Episodic memory accumulates. Without limits, storage grows unboundedly.
Budget per user:
- Storage size limit (e.g., 100 MB of episodic data per user).
- Retrieval cost limit (queries per day).
When limits approach, decay or expire. Don't let runaway storage become a financial problem.
Anti-Patterns
Storing every turn. Storage grows unboundedly; retrieval is noisy. Be selective.
No supersession. Contradictory memories accumulate; agent acts on stale state. Implement update logic.
Pure vector storage without structure. Hard to query; metadata filtering impossible. Use hybrid.
No recency weighting. Old memories compete with new on equal footing. Newer is usually more relevant.
Memory invisible to user. Trust suffers. Provide a way to inspect and edit.
No expiration. Storage grows forever. Decay or expire on time/importance.
Cross-user contamination. User A's preferences appear for User B. Always isolate by user.
Install this skill directly: skilldb add agent-memory-skills
Related Skills
Semantic Memory and User Modeling
Build the agent's accumulated model of the user — preferences, expertise,
Short-Term vs Long-Term Agent Memory
Design the memory architecture for a stateful agent — what's in
Vector-Backed Agent Memory with RAG
Implement an agent memory system using a vector database with retrieval-
Adversarial Code Review
Adversarial implementation review methodology that validates code completeness against requirements with fresh objectivity. Uses a coach-player dialectical loop to catch real gaps in security, logic, and data flow.
API Design Testing
Design, document, and test APIs following RESTful principles, consistent
Architecture
Design software systems with sound architecture — choosing patterns, defining boundaries,