Skip to main content
Technology & EngineeringAi Agent Orchestration443 lines

agent-memory

Memory systems for AI agents: conversation history management, summarization strategies, vector-based long-term memory, entity memory, episodic memory, and memory retrieval patterns. Covers practical implementations for giving agents persistent, searchable memory across sessions and within long-running tasks.

Quick Summary3 lines
Give agents persistent, searchable memory that works across turns, sessions, and tasks.
skilldb get ai-agent-orchestration-skills/agent-memoryFull skill: 443 lines
Paste into your CLAUDE.md or agent config

Agent Memory

Give agents persistent, searchable memory that works across turns, sessions, and tasks.


Conversation History Management

The simplest memory: the messages array. The challenge is keeping it within context limits.

Sliding Window

def sliding_window(messages: list[dict], max_turns: int = 20) -> list[dict]:
    """Keep the first message (task) and last N turn pairs."""
    if len(messages) <= max_turns * 2 + 1:
        return messages

    first = messages[0]
    recent = messages[-(max_turns * 2):]
    return [first] + recent

Token-Aware Truncation

def estimate_tokens(text: str) -> int:
    """Rough token estimate: 1 token per 4 characters."""
    return len(str(text)) // 4


def truncate_to_budget(messages: list[dict], token_budget: int = 80000) -> list[dict]:
    """Remove oldest messages (except the first) to fit within budget."""
    total = sum(estimate_tokens(str(m)) for m in messages)

    if total <= token_budget:
        return messages

    # Always keep first and last messages
    first = messages[0]
    trimmed = list(messages[1:])

    while total > token_budget and len(trimmed) > 2:
        removed = trimmed.pop(0)
        total -= estimate_tokens(str(removed))

    return [first] + trimmed

Summarization Memory

When the conversation grows long, summarize older messages and inject the summary.

import anthropic

client = anthropic.Anthropic()


def summarize_conversation(messages: list[dict]) -> str:
    """Ask the model to summarize a conversation segment."""
    conversation_text = ""
    for msg in messages:
        role = msg["role"]
        content = msg["content"] if isinstance(msg["content"], str) else str(msg["content"])
        conversation_text += f"{role}: {content[:500]}\n"

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"Summarize this conversation in 3-5 bullet points. "
                       f"Focus on: decisions made, facts learned, actions taken, "
                       f"and current status.\n\n{conversation_text}",
        }],
    )
    return response.content[0].text


class SummarizingMemory:
    """Memory that auto-summarizes when the conversation gets too long."""

    def __init__(self, max_turns: int = 30):
        self.messages: list[dict] = []
        self.summaries: list[str] = []
        self.max_turns = max_turns

    def add(self, message: dict):
        self.messages.append(message)

        if len(self.messages) > self.max_turns * 2:
            self._compress()

    def _compress(self):
        # Summarize the oldest half
        split = len(self.messages) // 2
        old_messages = self.messages[:split]
        summary = summarize_conversation(old_messages)
        self.summaries.append(summary)
        self.messages = self.messages[split:]

    def get_messages(self) -> list[dict]:
        if not self.summaries:
            return self.messages

        summary_text = "Previous conversation summary:\n"
        for i, s in enumerate(self.summaries):
            summary_text += f"\n--- Segment {i+1} ---\n{s}"

        return [
            {"role": "user", "content": summary_text},
            {"role": "assistant", "content": "Understood. I have the context from our previous conversation."},
        ] + self.messages

Vector-Based Long-Term Memory

Use embeddings and a vector store for semantic retrieval across sessions.

import json
import numpy as np
from pathlib import Path


def get_embedding(text: str) -> list[float]:
    """Get embedding from OpenAI (or swap in any embedding provider)."""
    from openai import OpenAI
    oai = OpenAI()
    response = oai.embeddings.create(
        model="text-embedding-3-small",
        input=text,
    )
    return response.data[0].embedding


def cosine_similarity(a: list[float], b: list[float]) -> float:
    a_arr = np.array(a)
    b_arr = np.array(b)
    return float(np.dot(a_arr, b_arr) / (np.linalg.norm(a_arr) * np.linalg.norm(b_arr)))


class VectorMemory:
    """Simple file-backed vector memory. For production, use a proper vector DB."""

    def __init__(self, path: str = ".memory/vectors.json"):
        self.path = Path(path)
        self.path.parent.mkdir(parents=True, exist_ok=True)
        self.entries: list[dict] = []
        if self.path.exists():
            self.entries = json.loads(self.path.read_text())

    def store(self, text: str, metadata: dict = None):
        embedding = get_embedding(text)
        entry = {
            "text": text,
            "embedding": embedding,
            "metadata": metadata or {},
            "timestamp": __import__("time").time(),
        }
        self.entries.append(entry)
        self._save()

    def search(self, query: str, top_k: int = 5) -> list[dict]:
        if not self.entries:
            return []

        query_embedding = get_embedding(query)
        scored = []
        for entry in self.entries:
            sim = cosine_similarity(query_embedding, entry["embedding"])
            scored.append({"text": entry["text"], "score": sim,
                           "metadata": entry["metadata"]})

        scored.sort(key=lambda x: x["score"], reverse=True)
        return scored[:top_k]

    def _save(self):
        self.path.write_text(json.dumps(self.entries))

Using Vector Memory as Agent Tools

def make_memory_tools(memory: VectorMemory) -> list[dict]:
    return [
        {
            "name": "remember",
            "description": "Store an important fact or observation for future reference.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "fact": {"type": "string", "description": "The fact to remember."},
                    "category": {"type": "string", "description": "Category: preference, fact, decision, instruction."},
                },
                "required": ["fact"],
            },
        },
        {
            "name": "recall",
            "description": "Search memory for relevant past information.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "What to search for."},
                },
                "required": ["query"],
            },
        },
    ]


def execute_memory_tool(name: str, inputs: dict, memory: VectorMemory) -> str:
    if name == "remember":
        memory.store(inputs["fact"], {"category": inputs.get("category", "general")})
        return f"Stored: {inputs['fact'][:80]}..."
    elif name == "recall":
        results = memory.search(inputs["query"])
        if not results:
            return "No relevant memories found."
        lines = []
        for r in results:
            lines.append(f"- [{r['score']:.2f}] {r['text']}")
        return "Relevant memories:\n" + "\n".join(lines)

Entity Memory

Track structured information about specific entities (people, projects, systems).

class EntityMemory:
    """Track facts about named entities."""

    def __init__(self, path: str = ".memory/entities.json"):
        self.path = Path(path)
        self.path.parent.mkdir(parents=True, exist_ok=True)
        self.entities: dict[str, dict] = {}
        if self.path.exists():
            self.entities = json.loads(self.path.read_text())

    def update_entity(self, name: str, facts: dict):
        """Add or update facts about an entity."""
        name = name.lower().strip()
        if name not in self.entities:
            self.entities[name] = {"name": name, "facts": {}, "history": []}

        for key, value in facts.items():
            old = self.entities[name]["facts"].get(key)
            self.entities[name]["facts"][key] = value
            self.entities[name]["history"].append({
                "key": key, "old": old, "new": value,
                "timestamp": __import__("time").time(),
            })
        self._save()

    def get_entity(self, name: str) -> dict:
        return self.entities.get(name.lower().strip(), {})

    def search_entities(self, query: str) -> list[dict]:
        """Find entities matching a keyword."""
        results = []
        query_lower = query.lower()
        for name, data in self.entities.items():
            if query_lower in name:
                results.append(data)
                continue
            for v in data["facts"].values():
                if query_lower in str(v).lower():
                    results.append(data)
                    break
        return results

    def format_for_prompt(self, entity_names: list[str]) -> str:
        """Format entity info for injection into a prompt."""
        parts = []
        for name in entity_names:
            entity = self.get_entity(name)
            if entity:
                facts = entity.get("facts", {})
                fact_lines = "\n".join(f"  - {k}: {v}" for k, v in facts.items())
                parts.append(f"Entity: {name}\n{fact_lines}")
        return "\n\n".join(parts) if parts else ""

    def _save(self):
        self.path.write_text(json.dumps(self.entities, indent=2))


# Usage
em = EntityMemory()
em.update_entity("Alice Chen", {
    "role": "Engineering Lead",
    "team": "Platform",
    "preference": "Prefers async communication",
    "timezone": "PST",
})
em.update_entity("Project Atlas", {
    "status": "In progress",
    "deadline": "2026-06-01",
    "owner": "Alice Chen",
})

Episodic Memory

Store complete episodes (task attempts) for the agent to learn from.

class EpisodicMemory:
    """Store and retrieve task episodes for agent learning."""

    def __init__(self, path: str = ".memory/episodes.json"):
        self.path = Path(path)
        self.path.parent.mkdir(parents=True, exist_ok=True)
        self.episodes: list[dict] = []
        if self.path.exists():
            self.episodes = json.loads(self.path.read_text())

    def record_episode(self, task: str, steps: list[str],
                       outcome: str, success: bool):
        episode = {
            "task": task,
            "steps": steps,
            "outcome": outcome,
            "success": success,
            "timestamp": __import__("time").time(),
        }
        self.episodes.append(episode)
        self._save()

    def find_similar_episodes(self, task: str, top_k: int = 3) -> list[dict]:
        """Find episodes with similar tasks (keyword overlap)."""
        task_words = set(task.lower().split())
        scored = []
        for ep in self.episodes:
            ep_words = set(ep["task"].lower().split())
            overlap = len(task_words & ep_words)
            if overlap > 0:
                scored.append((overlap, ep))
        scored.sort(reverse=True, key=lambda x: x[0])
        return [ep for _, ep in scored[:top_k]]

    def get_lessons(self, task: str) -> str:
        """Format past episodes as lessons for the agent."""
        similar = self.find_similar_episodes(task)
        if not similar:
            return ""

        lines = ["Lessons from past similar tasks:"]
        for ep in similar:
            status = "SUCCESS" if ep["success"] else "FAILED"
            lines.append(f"\n[{status}] Task: {ep['task'][:100]}")
            lines.append(f"  Steps: {', '.join(ep['steps'][:5])}")
            lines.append(f"  Outcome: {ep['outcome'][:150]}")

        return "\n".join(lines)

    def _save(self):
        self.path.write_text(json.dumps(self.episodes, indent=2))

Memory Retrieval Strategies

Recency + Relevance Scoring

import time


def hybrid_recall(entries: list[dict], query: str,
                  recency_weight: float = 0.3,
                  relevance_weight: float = 0.7,
                  top_k: int = 5) -> list[dict]:
    """Combine semantic relevance with recency for better recall."""
    query_embedding = get_embedding(query)
    now = time.time()

    scored = []
    for entry in entries:
        # Relevance score (cosine similarity)
        relevance = cosine_similarity(query_embedding, entry["embedding"])

        # Recency score (exponential decay, half-life = 1 day)
        age_hours = (now - entry["timestamp"]) / 3600
        recency = 2 ** (-age_hours / 24)

        combined = relevance_weight * relevance + recency_weight * recency
        scored.append({**entry, "score": combined})

    scored.sort(key=lambda x: x["score"], reverse=True)
    return scored[:top_k]

Injecting Memory into the Agent Prompt

def build_agent_context(task: str, vector_mem: VectorMemory,
                        entity_mem: EntityMemory,
                        episodic_mem: EpisodicMemory) -> str:
    """Assemble relevant memory context for an agent prompt."""
    sections = []

    # Vector recall
    relevant = vector_mem.search(task, top_k=5)
    if relevant:
        facts = "\n".join(f"- {r['text']}" for r in relevant if r["score"] > 0.3)
        if facts:
            sections.append(f"## Relevant Facts\n{facts}")

    # Entity context
    entities = entity_mem.search_entities(task)
    if entities:
        for ent in entities[:3]:
            info = "\n".join(f"  - {k}: {v}" for k, v in ent["facts"].items())
            sections.append(f"## Entity: {ent['name']}\n{info}")

    # Past lessons
    lessons = episodic_mem.get_lessons(task)
    if lessons:
        sections.append(f"## Past Experience\n{lessons}")

    return "\n\n".join(sections)

Use this pattern to inject a ## Context block at the top of the user message or as part of the system prompt. Keep total injected memory under 2000 tokens to leave room for the actual task and agent reasoning.

Install this skill directly: skilldb add ai-agent-orchestration-skills

Get CLI access →

Related Skills

agent-architecture

Core patterns for building AI agent systems: the observe-think-act loop, ReAct pattern implementation, tool-use cycles, memory systems (short-term and long-term), and planning strategies. Covers how to structure an agent's main loop, manage state between iterations, and wire together perception, reasoning, and action into a reliable autonomous system.

Ai Agent Orchestration368L

agent-error-recovery

Handling failures in AI agent systems: retry strategies with backoff, fallback tools, graceful degradation, human-in-the-loop escalation, stuck-loop detection, and context recovery after crashes. Covers practical patterns for making agents robust against tool failures, API errors, and reasoning dead-ends.

Ai Agent Orchestration470L

agent-evaluation

Testing and evaluating AI agents: trajectory evaluation, task completion metrics, tool-use accuracy measurement, regression testing, benchmark suites, and A/B testing agent configurations. Covers practical approaches to measuring whether agents are working correctly and improving over time.

Ai Agent Orchestration553L

agent-frameworks

Comparison of major AI agent frameworks: LangGraph, CrewAI, AutoGen, Semantic Kernel, and Claude Agent SDK. Covers when to use each framework, their trade-offs, core patterns, practical setup examples, and migration strategies between frameworks.

Ai Agent Orchestration433L

agent-guardrails

Safety and control systems for AI agents: input and output validation, action authorization, rate limiting, cost controls, content filtering, scope restriction, and audit logging. Covers practical implementations for keeping agents within bounds while maintaining their usefulness.

Ai Agent Orchestration564L

agent-planning

Planning strategies for AI agents: chain-of-thought prompting, tree-of-thought exploration, plan-and-execute patterns, iterative refinement, task decomposition, and goal tracking. Covers practical implementations that make agents more reliable at complex, multi-step tasks by thinking before acting.

Ai Agent Orchestration459L