agent-memory
Memory systems for AI agents: conversation history management, summarization strategies, vector-based long-term memory, entity memory, episodic memory, and memory retrieval patterns. Covers practical implementations for giving agents persistent, searchable memory across sessions and within long-running tasks.
Give agents persistent, searchable memory that works across turns, sessions, and tasks.
skilldb get ai-agent-orchestration-skills/agent-memoryFull skill: 443 linesAgent Memory
Give agents persistent, searchable memory that works across turns, sessions, and tasks.
Conversation History Management
The simplest memory: the messages array. The challenge is keeping it within context limits.
Sliding Window
def sliding_window(messages: list[dict], max_turns: int = 20) -> list[dict]:
"""Keep the first message (task) and last N turn pairs."""
if len(messages) <= max_turns * 2 + 1:
return messages
first = messages[0]
recent = messages[-(max_turns * 2):]
return [first] + recent
Token-Aware Truncation
def estimate_tokens(text: str) -> int:
"""Rough token estimate: 1 token per 4 characters."""
return len(str(text)) // 4
def truncate_to_budget(messages: list[dict], token_budget: int = 80000) -> list[dict]:
"""Remove oldest messages (except the first) to fit within budget."""
total = sum(estimate_tokens(str(m)) for m in messages)
if total <= token_budget:
return messages
# Always keep first and last messages
first = messages[0]
trimmed = list(messages[1:])
while total > token_budget and len(trimmed) > 2:
removed = trimmed.pop(0)
total -= estimate_tokens(str(removed))
return [first] + trimmed
Summarization Memory
When the conversation grows long, summarize older messages and inject the summary.
import anthropic
client = anthropic.Anthropic()
def summarize_conversation(messages: list[dict]) -> str:
"""Ask the model to summarize a conversation segment."""
conversation_text = ""
for msg in messages:
role = msg["role"]
content = msg["content"] if isinstance(msg["content"], str) else str(msg["content"])
conversation_text += f"{role}: {content[:500]}\n"
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Summarize this conversation in 3-5 bullet points. "
f"Focus on: decisions made, facts learned, actions taken, "
f"and current status.\n\n{conversation_text}",
}],
)
return response.content[0].text
class SummarizingMemory:
"""Memory that auto-summarizes when the conversation gets too long."""
def __init__(self, max_turns: int = 30):
self.messages: list[dict] = []
self.summaries: list[str] = []
self.max_turns = max_turns
def add(self, message: dict):
self.messages.append(message)
if len(self.messages) > self.max_turns * 2:
self._compress()
def _compress(self):
# Summarize the oldest half
split = len(self.messages) // 2
old_messages = self.messages[:split]
summary = summarize_conversation(old_messages)
self.summaries.append(summary)
self.messages = self.messages[split:]
def get_messages(self) -> list[dict]:
if not self.summaries:
return self.messages
summary_text = "Previous conversation summary:\n"
for i, s in enumerate(self.summaries):
summary_text += f"\n--- Segment {i+1} ---\n{s}"
return [
{"role": "user", "content": summary_text},
{"role": "assistant", "content": "Understood. I have the context from our previous conversation."},
] + self.messages
Vector-Based Long-Term Memory
Use embeddings and a vector store for semantic retrieval across sessions.
import json
import numpy as np
from pathlib import Path
def get_embedding(text: str) -> list[float]:
"""Get embedding from OpenAI (or swap in any embedding provider)."""
from openai import OpenAI
oai = OpenAI()
response = oai.embeddings.create(
model="text-embedding-3-small",
input=text,
)
return response.data[0].embedding
def cosine_similarity(a: list[float], b: list[float]) -> float:
a_arr = np.array(a)
b_arr = np.array(b)
return float(np.dot(a_arr, b_arr) / (np.linalg.norm(a_arr) * np.linalg.norm(b_arr)))
class VectorMemory:
"""Simple file-backed vector memory. For production, use a proper vector DB."""
def __init__(self, path: str = ".memory/vectors.json"):
self.path = Path(path)
self.path.parent.mkdir(parents=True, exist_ok=True)
self.entries: list[dict] = []
if self.path.exists():
self.entries = json.loads(self.path.read_text())
def store(self, text: str, metadata: dict = None):
embedding = get_embedding(text)
entry = {
"text": text,
"embedding": embedding,
"metadata": metadata or {},
"timestamp": __import__("time").time(),
}
self.entries.append(entry)
self._save()
def search(self, query: str, top_k: int = 5) -> list[dict]:
if not self.entries:
return []
query_embedding = get_embedding(query)
scored = []
for entry in self.entries:
sim = cosine_similarity(query_embedding, entry["embedding"])
scored.append({"text": entry["text"], "score": sim,
"metadata": entry["metadata"]})
scored.sort(key=lambda x: x["score"], reverse=True)
return scored[:top_k]
def _save(self):
self.path.write_text(json.dumps(self.entries))
Using Vector Memory as Agent Tools
def make_memory_tools(memory: VectorMemory) -> list[dict]:
return [
{
"name": "remember",
"description": "Store an important fact or observation for future reference.",
"input_schema": {
"type": "object",
"properties": {
"fact": {"type": "string", "description": "The fact to remember."},
"category": {"type": "string", "description": "Category: preference, fact, decision, instruction."},
},
"required": ["fact"],
},
},
{
"name": "recall",
"description": "Search memory for relevant past information.",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "What to search for."},
},
"required": ["query"],
},
},
]
def execute_memory_tool(name: str, inputs: dict, memory: VectorMemory) -> str:
if name == "remember":
memory.store(inputs["fact"], {"category": inputs.get("category", "general")})
return f"Stored: {inputs['fact'][:80]}..."
elif name == "recall":
results = memory.search(inputs["query"])
if not results:
return "No relevant memories found."
lines = []
for r in results:
lines.append(f"- [{r['score']:.2f}] {r['text']}")
return "Relevant memories:\n" + "\n".join(lines)
Entity Memory
Track structured information about specific entities (people, projects, systems).
class EntityMemory:
"""Track facts about named entities."""
def __init__(self, path: str = ".memory/entities.json"):
self.path = Path(path)
self.path.parent.mkdir(parents=True, exist_ok=True)
self.entities: dict[str, dict] = {}
if self.path.exists():
self.entities = json.loads(self.path.read_text())
def update_entity(self, name: str, facts: dict):
"""Add or update facts about an entity."""
name = name.lower().strip()
if name not in self.entities:
self.entities[name] = {"name": name, "facts": {}, "history": []}
for key, value in facts.items():
old = self.entities[name]["facts"].get(key)
self.entities[name]["facts"][key] = value
self.entities[name]["history"].append({
"key": key, "old": old, "new": value,
"timestamp": __import__("time").time(),
})
self._save()
def get_entity(self, name: str) -> dict:
return self.entities.get(name.lower().strip(), {})
def search_entities(self, query: str) -> list[dict]:
"""Find entities matching a keyword."""
results = []
query_lower = query.lower()
for name, data in self.entities.items():
if query_lower in name:
results.append(data)
continue
for v in data["facts"].values():
if query_lower in str(v).lower():
results.append(data)
break
return results
def format_for_prompt(self, entity_names: list[str]) -> str:
"""Format entity info for injection into a prompt."""
parts = []
for name in entity_names:
entity = self.get_entity(name)
if entity:
facts = entity.get("facts", {})
fact_lines = "\n".join(f" - {k}: {v}" for k, v in facts.items())
parts.append(f"Entity: {name}\n{fact_lines}")
return "\n\n".join(parts) if parts else ""
def _save(self):
self.path.write_text(json.dumps(self.entities, indent=2))
# Usage
em = EntityMemory()
em.update_entity("Alice Chen", {
"role": "Engineering Lead",
"team": "Platform",
"preference": "Prefers async communication",
"timezone": "PST",
})
em.update_entity("Project Atlas", {
"status": "In progress",
"deadline": "2026-06-01",
"owner": "Alice Chen",
})
Episodic Memory
Store complete episodes (task attempts) for the agent to learn from.
class EpisodicMemory:
"""Store and retrieve task episodes for agent learning."""
def __init__(self, path: str = ".memory/episodes.json"):
self.path = Path(path)
self.path.parent.mkdir(parents=True, exist_ok=True)
self.episodes: list[dict] = []
if self.path.exists():
self.episodes = json.loads(self.path.read_text())
def record_episode(self, task: str, steps: list[str],
outcome: str, success: bool):
episode = {
"task": task,
"steps": steps,
"outcome": outcome,
"success": success,
"timestamp": __import__("time").time(),
}
self.episodes.append(episode)
self._save()
def find_similar_episodes(self, task: str, top_k: int = 3) -> list[dict]:
"""Find episodes with similar tasks (keyword overlap)."""
task_words = set(task.lower().split())
scored = []
for ep in self.episodes:
ep_words = set(ep["task"].lower().split())
overlap = len(task_words & ep_words)
if overlap > 0:
scored.append((overlap, ep))
scored.sort(reverse=True, key=lambda x: x[0])
return [ep for _, ep in scored[:top_k]]
def get_lessons(self, task: str) -> str:
"""Format past episodes as lessons for the agent."""
similar = self.find_similar_episodes(task)
if not similar:
return ""
lines = ["Lessons from past similar tasks:"]
for ep in similar:
status = "SUCCESS" if ep["success"] else "FAILED"
lines.append(f"\n[{status}] Task: {ep['task'][:100]}")
lines.append(f" Steps: {', '.join(ep['steps'][:5])}")
lines.append(f" Outcome: {ep['outcome'][:150]}")
return "\n".join(lines)
def _save(self):
self.path.write_text(json.dumps(self.episodes, indent=2))
Memory Retrieval Strategies
Recency + Relevance Scoring
import time
def hybrid_recall(entries: list[dict], query: str,
recency_weight: float = 0.3,
relevance_weight: float = 0.7,
top_k: int = 5) -> list[dict]:
"""Combine semantic relevance with recency for better recall."""
query_embedding = get_embedding(query)
now = time.time()
scored = []
for entry in entries:
# Relevance score (cosine similarity)
relevance = cosine_similarity(query_embedding, entry["embedding"])
# Recency score (exponential decay, half-life = 1 day)
age_hours = (now - entry["timestamp"]) / 3600
recency = 2 ** (-age_hours / 24)
combined = relevance_weight * relevance + recency_weight * recency
scored.append({**entry, "score": combined})
scored.sort(key=lambda x: x["score"], reverse=True)
return scored[:top_k]
Injecting Memory into the Agent Prompt
def build_agent_context(task: str, vector_mem: VectorMemory,
entity_mem: EntityMemory,
episodic_mem: EpisodicMemory) -> str:
"""Assemble relevant memory context for an agent prompt."""
sections = []
# Vector recall
relevant = vector_mem.search(task, top_k=5)
if relevant:
facts = "\n".join(f"- {r['text']}" for r in relevant if r["score"] > 0.3)
if facts:
sections.append(f"## Relevant Facts\n{facts}")
# Entity context
entities = entity_mem.search_entities(task)
if entities:
for ent in entities[:3]:
info = "\n".join(f" - {k}: {v}" for k, v in ent["facts"].items())
sections.append(f"## Entity: {ent['name']}\n{info}")
# Past lessons
lessons = episodic_mem.get_lessons(task)
if lessons:
sections.append(f"## Past Experience\n{lessons}")
return "\n\n".join(sections)
Use this pattern to inject a ## Context block at the top of the user message or as part of the system prompt. Keep total injected memory under 2000 tokens to leave room for the actual task and agent reasoning.
Install this skill directly: skilldb add ai-agent-orchestration-skills
Related Skills
agent-architecture
Core patterns for building AI agent systems: the observe-think-act loop, ReAct pattern implementation, tool-use cycles, memory systems (short-term and long-term), and planning strategies. Covers how to structure an agent's main loop, manage state between iterations, and wire together perception, reasoning, and action into a reliable autonomous system.
agent-error-recovery
Handling failures in AI agent systems: retry strategies with backoff, fallback tools, graceful degradation, human-in-the-loop escalation, stuck-loop detection, and context recovery after crashes. Covers practical patterns for making agents robust against tool failures, API errors, and reasoning dead-ends.
agent-evaluation
Testing and evaluating AI agents: trajectory evaluation, task completion metrics, tool-use accuracy measurement, regression testing, benchmark suites, and A/B testing agent configurations. Covers practical approaches to measuring whether agents are working correctly and improving over time.
agent-frameworks
Comparison of major AI agent frameworks: LangGraph, CrewAI, AutoGen, Semantic Kernel, and Claude Agent SDK. Covers when to use each framework, their trade-offs, core patterns, practical setup examples, and migration strategies between frameworks.
agent-guardrails
Safety and control systems for AI agents: input and output validation, action authorization, rate limiting, cost controls, content filtering, scope restriction, and audit logging. Covers practical implementations for keeping agents within bounds while maintaining their usefulness.
agent-planning
Planning strategies for AI agents: chain-of-thought prompting, tree-of-thought exploration, plan-and-execute patterns, iterative refinement, task decomposition, and goal tracking. Covers practical implementations that make agents more reliable at complex, multi-step tasks by thinking before acting.