Skip to main content
Technology & EngineeringAgent Memory265 lines

Designing Episodic Memory for Agents

Build the episodic memory layer that stores specific past events and

Quick Summary32 lines
Episodic memory is the agent's memory of specific events. The conversation last Tuesday. The decision made in March. The document drafted three weeks ago. Distinct from semantic memory (general facts about the user) and from working memory (current context).

## Key Points

- A complete conversation turn or session.
- A specific user decision.
- A discrete project milestone.
- A meaningful agent action with its result.
- **Timestamp.** When did this happen?
- **Participants.** Who was involved?
- **Content.** What happened?
- **Outcome.** What was the result?
- **Tags.** Categories for easier retrieval.
- "Note that we decided to use PostgreSQL."
- "Remember that I prefer Tuesday meetings."
- The user uses past-tense success language ("we shipped X").

## Quick Example

```
"Read the following conversation. Extract any events worth
remembering: decisions, preferences expressed, milestones reached,
specific information the user shared. Return as a list of events."
```

```
"On April 15, 2026, the user decided to use PostgreSQL for their
project's database, citing familiarity and tooling support over
MongoDB."
```
skilldb get agent-memory-skills/Designing Episodic Memory for AgentsFull skill: 265 lines
Paste into your CLAUDE.md or agent config

Episodic memory is the agent's memory of specific events. The conversation last Tuesday. The decision made in March. The document drafted three weeks ago. Distinct from semantic memory (general facts about the user) and from working memory (current context).

Episodic memory is hard. The naive approach — "store every conversation; retrieve when similar" — produces noisy retrieval. The agent gets confused between what it knows now and what it knew then. Past stale state contaminates current responses.

This skill covers the patterns that make episodic memory useful rather than noisy.

What's an Episode

An episode is a coherent unit of past activity:

  • A complete conversation turn or session.
  • A specific user decision.
  • A discrete project milestone.
  • A meaningful agent action with its result.

Episodes have:

  • Timestamp. When did this happen?
  • Participants. Who was involved?
  • Content. What happened?
  • Outcome. What was the result?
  • Tags. Categories for easier retrieval.

Not everything in a conversation becomes an episode. The agent should be selective. Storing every turn produces a useless mass; storing the moments worth remembering produces useful recall.

Event Extraction

When does an event get extracted into episodic memory?

Manual Triggers

The user explicitly asks to remember:

  • "Note that we decided to use PostgreSQL."
  • "Remember that I prefer Tuesday meetings."

Easiest case. The user signals what's important; the agent stores it.

LLM-Mediated Extraction

After each session (or each significant turn), an LLM evaluates: is anything in this conversation worth remembering as an event?

"Read the following conversation. Extract any events worth
remembering: decisions, preferences expressed, milestones reached,
specific information the user shared. Return as a list of events."

The output is structured episodes ready to store.

Heuristic Triggers

Specific patterns trigger storage:

  • The user uses past-tense success language ("we shipped X").
  • The user states a preference ("I prefer X").
  • An agent action completed (a meeting was scheduled, an email was sent).

Heuristics catch the obvious cases; LLM-mediated extraction catches the nuanced ones.

Hybrid is best: heuristics handle the volume; LLM extraction handles the residual.

Storage Format

Two main approaches:

Structured

Store episodes as records in a database with explicit fields:

{
  "id": "ep-123",
  "type": "decision",
  "timestamp": "2026-04-15T14:30:00Z",
  "participants": ["user", "agent"],
  "content": "User decided to use PostgreSQL for the database.",
  "outcome": "Decision recorded; will inform future architecture queries.",
  "tags": ["database", "decision", "architecture"],
  "related_episodes": ["ep-118", "ep-104"]
}

Pros:

  • Easy to query (filter by type, time, tag).
  • Structured for downstream use (analytics, exports).
  • The agent can reason about specific fields.

Cons:

  • Requires schema definition; new event types require schema changes.
  • Misses nuanced detail that doesn't fit the schema.

Unstructured Vector

Store episodes as natural-language descriptions, embedded for vector search:

"On April 15, 2026, the user decided to use PostgreSQL for their
project's database, citing familiarity and tooling support over
MongoDB."

Pros:

  • Flexible; any event can be represented.
  • Vector retrieval finds semantically related episodes naturally.

Cons:

  • Harder to query structurally ("show all decisions in March").
  • Retrieval can miss exact matches.

Hybrid

Most production systems do both: structured fields for the basics + unstructured prose for the nuance, both indexed for retrieval.

{
  "id": "ep-123",
  "type": "decision",
  "timestamp": "2026-04-15T14:30:00Z",
  "tags": ["database"],
  "summary": "User chose PostgreSQL over MongoDB for the database.",
  "details": "After a 30-minute discussion, user decided to go with PostgreSQL...",
  "embedding": [0.1, -0.4, ...]
}

Filter on structured fields; vector search on details/embedding; combine.

Retrieval Patterns

When should episodic memory be queried?

On Topic Reference

The user mentions something that might have a past episode. "Remember when we discussed the database?" — query for past episodes about databases.

On Implicit Reference

Less explicit; the user's question might benefit from past context. "What was that thing we tried?" — search for recent activities.

The agent decides. An LLM-mediated query analyzer can determine "is this query likely to need episodic memory?"

As a Tool

The agent has a search_episodes(query) tool. The agent calls it when it thinks past episodes might help.

Most flexible; most aligned with the agent loop pattern.

Avoiding Stale Memory Contamination

The hardest problem in episodic memory: avoiding stale episodes that contaminate current responses.

The user said "I prefer brief responses" three months ago. They've since adjusted their preference. The episodic memory still says "brief." If the agent retrieves and acts on the old episode, it acts wrong.

Mitigations:

Recency Weighting

Recent episodes outweigh old ones. Retrieval scores include a recency factor; older episodes are surfaced only when no recent equivalent exists.

Supersession

When a new episode is recorded that contradicts or updates an old one, mark the old one as superseded. Retrieval skips superseded episodes by default.

{
  "id": "ep-123",
  "type": "preference",
  "content": "User prefers brief responses.",
  "superseded_by": null,
  "active": true
}

// Later:
{
  "id": "ep-456",
  "type": "preference",
  "content": "User prefers detailed responses with examples.",
  "supersedes": "ep-123",
  "active": true
}

// ep-123 is now { ..., active: false, superseded_by: "ep-456" }

Confidence

Each episode has a confidence score. Old, unsupported episodes have lower confidence. The agent treats low-confidence memories as suggestions, not facts.

Refresh

Periodically, the agent confirms still-relevant memories with the user. "I noticed you mentioned preferring brief responses a while ago — is that still your preference?"

Annoying if done too often; helpful when done sparingly. Calibrate.

Decay and Expiration

Some episodes should expire. A discussion about a now-finished project may not be relevant six months later.

Decay strategies:

  • Time-based. Episodes older than N days are deprioritized; older than M days, deleted.
  • Activity-based. Episodes that haven't been retrieved in N weeks are deprioritized.
  • Importance-based. High-importance episodes (decisions, milestones) persist longer; routine episodes (small interactions) expire faster.

Don't delete aggressively; user surprise at "why don't you remember that?" is real. Default to long retention; expose deletion to the user.

User-Visible Memory

For trust, expose episodic memory to the user:

  • A "memories" page showing what the agent remembers.
  • The ability to delete specific memories.
  • The ability to mark memories as outdated.

Users who can see and curate their own memory profile trust the system more. Black-box memory feels surveillant; transparent memory feels useful.

Cross-User Isolation

Each user's episodic memory is isolated. The agent doesn't transfer episodes from User A to User B's context.

For multi-user scenarios (team workspaces), episodes can be shared explicitly. The "team" has its own episodic store; team members access it. Individual user episodes stay private.

Cost

Episodic memory accumulates. Without limits, storage grows unboundedly.

Budget per user:

  • Storage size limit (e.g., 100 MB of episodic data per user).
  • Retrieval cost limit (queries per day).

When limits approach, decay or expire. Don't let runaway storage become a financial problem.

Anti-Patterns

Storing every turn. Storage grows unboundedly; retrieval is noisy. Be selective.

No supersession. Contradictory memories accumulate; agent acts on stale state. Implement update logic.

Pure vector storage without structure. Hard to query; metadata filtering impossible. Use hybrid.

No recency weighting. Old memories compete with new on equal footing. Newer is usually more relevant.

Memory invisible to user. Trust suffers. Provide a way to inspect and edit.

No expiration. Storage grows forever. Decay or expire on time/importance.

Cross-user contamination. User A's preferences appear for User B. Always isolate by user.

Install this skill directly: skilldb add agent-memory-skills

Get CLI access →