Technology & EngineeringAgent Memory200 lines

Short-Term vs Long-Term Agent Memory

Design the memory architecture for a stateful agent — what's in

Quick Summary18 lines

Stateful agents need memory. The single-turn agent processes a request and forgets; useful but limited. The agent that remembers prior conversations, learned preferences, accumulated context becomes more capable over time.

## Key Points

- The current user message.
- Recent conversation turns.
- Tool calls and results from the current turn.
- Active task state.
- "The user mentioned they live in Vancouver" — relevant when location-related queries come up.
- "We tried option A and it didn't work" — relevant when the user revisits the topic.
- "The last document we discussed was X" — relevant for "let's go back to that document."
- "User asked about building a cabin in February 2026."
- "Agent generated a draft of contract X on date Y."
- "User reported satisfaction with response Z."
- "User prefers concise responses."
- "User's main project is a SaaS for restaurants."

skilldb get agent-memory-skills/Short-Term vs Long-Term Agent MemoryFull skill: 200 lines

Paste into your CLAUDE.md or agent config

Stateful agents need memory. The single-turn agent processes a request and forgets; useful but limited. The agent that remembers prior conversations, learned preferences, accumulated context becomes more capable over time.

But memory is hard. The naive approach — "save everything; retrieve everything" — drowns the model in noise. Production memory architectures separate concerns: what's in immediate context, what's retrieved on demand, what's stored as concrete facts, what's distilled into general knowledge.

The Layers

Modern agent memory has multiple layers:

1. Working Memory (In-Context)

What's currently in the model's prompt. Limited to the model's context window. Refreshed each turn.

Includes:

The current user message.
Recent conversation turns.
Tool calls and results from the current turn.
Active task state.

Working memory is the fast layer. Everything in it is immediately available; nothing in it requires retrieval. But it's small (~200K tokens for current models), and putting too much in it slows the model.

Manage carefully. Truncate old turns. Compress prior context into summaries. Don't just append forever.

2. Short-Term Episodic Memory

Recent events the agent should remember within the current session but doesn't need in every turn's context.

Examples:

"The user mentioned they live in Vancouver" — relevant when location-related queries come up.
"We tried option A and it didn't work" — relevant when the user revisits the topic.
"The last document we discussed was X" — relevant for "let's go back to that document."

Stored in a session-scoped store. Retrieved by relevance to the current turn. Cleared at session end (or kept for cross-session reference; depends on your architecture).

3. Long-Term Episodic Memory

Memorable events from prior sessions. Specific things that happened.

Examples:

"User asked about building a cabin in February 2026."
"Agent generated a draft of contract X on date Y."
"User reported satisfaction with response Z."

Stored in a database (often a vector store with metadata). Retrieved by query relevance. Persists indefinitely unless purged.

4. Long-Term Semantic Memory

General knowledge about the user, accumulated across sessions. The agent's "model" of the user.

Examples:

"User prefers concise responses."
"User's main project is a SaaS for restaurants."
"User is intermediate at Python; expert at JavaScript."
"User is in the Pacific time zone."

Stored as structured data (a profile) or as natural-language facts in a vector store. Updated incrementally as new information emerges.

The distinction from episodic: semantic memory is a generalized statement, not a specific event.

5. Tool Memory

What the agent has access to via tools — files, databases, APIs. Not "memory" in the traditional sense, but contextually similar: information the agent can retrieve.

Different from episodic/semantic memory because it's external, queryable, and the agent doesn't decide to "remember" it.

Designing the Layers

The art of memory architecture is deciding what goes where.

Principles:

Working memory: what the model needs right now. Conversation flow, active task, recent context.
Short-term episodic: things from this session that might come up. Decisions, attempts, references.
Long-term episodic: events worth recalling later. Important meetings, specific projects, previous solutions.
Long-term semantic: general truths about the user. Preferences, expertise, recurring patterns.
Tool memory: information that lives elsewhere. Files, profiles, system state.

Mismatches are common:

Putting everything in working memory: context blows out, slow, expensive.
Putting everything in long-term storage: retrieval is noisy; relevant information competes with old, irrelevant memories.
Treating episodic and semantic as the same: episodic accumulates noise; semantic doesn't update.

Memory Operations

Memory needs operations beyond store/retrieve:

Write

When new information emerges, decide which layer to write to.

A user mentioning their preference once: maybe write to semantic memory ("user prefers brief responses").

A user asking a one-off question: write to short-term episodic, possibly let it expire.

A user completing a project milestone: write to long-term episodic ("user shipped feature X on date Y").

The decision is often LLM-mediated: a "memory writer" agent that decides what to remember and where.

Retrieve

When formulating a response, decide which memories to retrieve.

For a query, retrieve:

Most recent conversation turns from working memory (already there).
Relevant facts from semantic memory (what do we know about the user?).
Relevant past events from episodic memory (have we done this before?).

Retrieval is typically vector-search-based, with hybrid keyword + semantic ranking.

Update

Existing memories may need updating. The user changed their preference; their project shifted; their stated location is wrong.

Naive memory systems append; they accumulate contradictions. ("User prefers brief" + "User prefers detailed.") Mature systems update or supersede:

A new preference replaces the old.
An old fact is marked stale.
A semantic memory is merged with the new evidence.

This requires the memory system to know that two facts are about the same thing — challenging in unstructured text.

Forget

Some memories should expire. Conversation turns from a year ago may not be relevant. Users may request forgetting (privacy).

Forget operations:

Time-based expiry (auto-forget after N days).
User-triggered forgetting ("forget that").
Compaction (older episodic memories distill into semantic, and the episodic versions are removed).

Privacy and Consent

Memory systems handle user data. Privacy matters:

Disclosure. The user knows the agent has memory and what kind.
Inspection. The user can see what's stored about them.
Deletion. The user can request deletion. Comply.
Scope. Memory doesn't cross users (no learning from user A applied to user B without explicit consent).
Sensitive data. PII, financial, medical information — handle with extra care or don't store.

Build the inspection and deletion UI from day one. It's much harder to retrofit.

Memory and Hallucination

A poorly-architected memory system causes hallucinations. The agent retrieves a fact that's no longer accurate (or never was) and treats it as ground truth.

Mitigations:

Source tagging. Each memory has a source (when it was learned, from what conversation). The agent can weight reliability by source.
Confidence scoring. Memories have confidence; retrieved low-confidence ones are flagged in the prompt.
Recency bias. More recent memories outweigh older ones for facts about the user (preferences change).
Cross-reference. When two memories conflict, the agent surfaces the conflict to the user.

Cost

Memory costs:

Storage. Vector stores per million memories.
Retrieval per turn. Multiple lookups; each costs.
Memory writes. LLM-mediated decisions about what to remember.
Compression. Periodic summarization is its own LLM cost.

Track:

Memory store size per user.
Retrieval calls per agent turn.
LLM calls for memory operations (write, compress).

Cost can dominate the agent's overall expense. Cap memory size per user; expire aggressively when costs are problematic.

Anti-Patterns

Single bag of memory. Everything in one vector store. No separation between recent, important, ephemeral. Retrieval is noisy.

Append-only. Every memory is added; none is updated or removed. Contradictions accumulate. Implement update/supersede.

No expiration. Memories from years ago compete with current context. Time-based decay or compression.

No semantic vs. episodic distinction. Specific events and general facts in the same store. Retrieval surfaces episodes when general truths are wanted.

Privacy as afterthought. No deletion path. Compliance fail; user trust loss. Build deletion from day one.

Memory writes without judgment. Everything gets written. Memory becomes noise. Use a memory-writer that decides what's worth keeping.

Install this skill directly: skilldb add agent-memory-skills

Get CLI access →

Short-Term vs Long-Term Agent Memory

The Layers

1. Working Memory (In-Context)

2. Short-Term Episodic Memory

3. Long-Term Episodic Memory

4. Long-Term Semantic Memory

5. Tool Memory

Designing the Layers

Memory Operations

Write

Retrieve

Update

Forget

Privacy and Consent

Memory and Hallucination

Cost

Anti-Patterns

Related Skills

Designing Episodic Memory for Agents

Semantic Memory and User Modeling

Vector-Backed Agent Memory with RAG

Adversarial Code Review

API Design Testing

Architecture