Skip to main content
Technology & EngineeringMulti Agent Orchestration266 lines

Agent Tool Design Principles

Design the tools that an LLM agent uses. Covers naming, parameter

Quick Summary33 lines
The agent's effectiveness is bounded by the quality of its tools. A great LLM with bad tools is a frustrated user; a moderate LLM with great tools gets things done. Tool design is a leverage point that's easy to underinvest in.

## Key Points

- `search_documents` — clear scope, clear action.
- `get_user_profile` — clear input (user_id), clear output (profile).
- `send_email` — clear action, well-known semantics.
- `do_thing` — vague.
- `helper` — too general.
- `process` — what does it process?
- `api_v2_call` — implementation detail leaks into the interface.
- What the tool does.
- What inputs it takes.
- What it returns.
- When to use it.
- What follows naturally (the cross-reference to `get_document`).

## Quick Example

```
search_documents(query: str, max_results: int = 10) -> list[Document]:
    Search the knowledge base for documents matching the query.
    Returns up to max_results documents, each with title, snippet,
    and document_id. Use document_id with get_document() to fetch
    full text. Best for keyword and short-phrase queries.
```

```
search_documents(query) -> list:
    Searches for documents.
```
skilldb get multi-agent-orchestration-skills/Agent Tool Design PrinciplesFull skill: 266 lines
Paste into your CLAUDE.md or agent config

The agent's effectiveness is bounded by the quality of its tools. A great LLM with bad tools is a frustrated user; a moderate LLM with great tools gets things done. Tool design is a leverage point that's easy to underinvest in.

This skill covers the principles that make tools usable by LLMs: clear naming, focused responsibilities, predictable returns, and helpful errors.

Tool Naming

LLMs decide which tool to call partly based on the tool's name. A name that's specific enough to be unambiguous and general enough to be reusable is the goal.

Good names:

  • search_documents — clear scope, clear action.
  • get_user_profile — clear input (user_id), clear output (profile).
  • send_email — clear action, well-known semantics.

Bad names:

  • do_thing — vague.
  • helper — too general.
  • process — what does it process?
  • api_v2_call — implementation detail leaks into the interface.

Each tool has one responsibility. Don't combine "search and summarize" into one tool; split into search and summarize. The agent decides whether to call summarize after search; that flexibility is part of why agents are useful.

Tool Description

Each tool has a description shown to the agent. The description is the tool's documentation from the LLM's perspective.

Good description:

search_documents(query: str, max_results: int = 10) -> list[Document]:
    Search the knowledge base for documents matching the query.
    Returns up to max_results documents, each with title, snippet,
    and document_id. Use document_id with get_document() to fetch
    full text. Best for keyword and short-phrase queries.

The description tells the LLM:

  • What the tool does.
  • What inputs it takes.
  • What it returns.
  • When to use it.
  • What follows naturally (the cross-reference to get_document).

Bad description:

search_documents(query) -> list:
    Searches for documents.

No type info. No semantics. No usage guidance. The LLM has to guess.

Parameter Design

Tool parameters are the interface; design them so they're easy for an LLM to use.

Principles:

  • Required vs. optional clearly separated. Required parameters have no defaults; optional ones do.
  • Types as specific as possible. int is better than number; Literal["en", "fr", "de"] is better than str.
  • Descriptive parameter names. query not q. recipient_email not to. max_results not n.
  • Avoid coupled parameters. Don't make start_date and end_date two separate parameters where one without the other is invalid; make a single date_range parameter.

For complex inputs, prefer structured types over flat lists of parameters:

# good
def schedule_meeting(meeting: MeetingRequest) -> MeetingResult: ...

class MeetingRequest(BaseModel):
    attendees: list[str]
    duration_minutes: int
    earliest_start: datetime
    latest_start: datetime
    title: str
    description: str | None = None

Versus:

# noisier
def schedule_meeting(
    attendees: list[str],
    duration_minutes: int,
    earliest_start: datetime,
    latest_start: datetime,
    title: str,
    description: str | None = None,
) -> MeetingResult: ...

For LLMs, structured types with good field names work well; the LLM fills in a JSON object.

Return Format

Tool returns are part of the agent's context. Format them for both the LLM and the framework.

Good return:

{
  "success": true,
  "documents": [
    {
      "id": "doc-123",
      "title": "Onboarding Guide",
      "snippet": "Welcome to the team...",
      "score": 0.87
    },
    ...
  ],
  "total_results": 47,
  "showing_top": 10
}

The structure tells the LLM:

  • Did the call succeed?
  • What was returned?
  • Are there more results?

For long results, include pagination cues. The LLM can decide whether to call again with offset=10 to get more.

Bad return: a giant string with all the documents serialized. The LLM has to parse; it parses imperfectly.

Token efficiency matters. If the result is 50,000 tokens of search snippets, the LLM struggles. Prefer:

  • Top N results with snippets.
  • A separate get_document(id) for fetching full content.

The agent can call search_documents first, decide which to read, and call get_document only for the relevant ones.

Error Handling

When tools fail, return errors that help the LLM recover.

Good error:

{
  "success": false,
  "error": {
    "code": "permission_denied",
    "message": "User does not have access to this document",
    "details": {
      "document_id": "doc-123",
      "required_role": "admin",
      "user_role": "viewer"
    },
    "suggestion": "Try a different document or request access from the document owner."
  }
}

The LLM sees:

  • The error category (machine-readable).
  • A human-readable message.
  • Specifics that may help.
  • A suggestion for what to do next.

Bad error:

Exception: PermissionError at line 47 of permissions.py

The LLM can't recover from this. It might retry, see the same error, retry again. With a structured error, the LLM can decide to try a different document or report to the user.

Idempotence

Many tool calls should be idempotent. Repeated calls with the same parameters produce the same effect.

Reasons:

  • LLMs sometimes call the same tool twice on the same input (caching mismatch, retry logic, model confusion).
  • Network errors cause retries.
  • Idempotence simplifies error recovery.

For mutation tools (create, update, delete), use idempotency keys:

def send_email(to: str, subject: str, body: str, idempotency_key: str) -> SendResult: ...

The agent generates the key; the underlying service uses it to deduplicate.

Side Effect Confirmation

For tools with side effects (sending email, charging money, deleting data), the design should support confirmation flows.

Pattern: the tool has a confirm: bool = False parameter. When false, the tool returns a "preview" of what it would do, without executing. When true, it executes.

def send_email(
    to: str,
    subject: str,
    body: str,
    confirm: bool = False,
) -> EmailResult:
    if not confirm:
        return EmailResult(
            preview=True,
            would_send_to=to,
            would_subject=subject,
            estimated_cost=0,
        )
    # actually send
    ...

The orchestrator can call without confirm, show the user, and call again with confirm=True after user approval.

Tool Discoverability

For agent systems with many tools (10+), how the agent discovers which tool to use matters.

Strategies:

  • Tool list with descriptions. All tools listed in the prompt. Works for ~10-20 tools; more becomes context-expensive.
  • Tool retrieval. A vector search over tool descriptions retrieves the top relevant ones for the current task. Scales to hundreds of tools.
  • Hierarchical tools. A meta-tool exposes categories; once a category is selected, the specific tools are revealed.

For an MCP-server-style architecture (like SkillDB's), the model is exposed to a curated subset relevant to the current task; the rest are queryable.

Testing Tools

Each tool has tests:

  • Unit tests of the tool's logic with mocked dependencies.
  • Integration tests with real services in a test environment.
  • Agent tests where an LLM is given the tool and a task, and the tool's usability is observed.

Agent tests catch usability issues that unit tests don't. The LLM consistently misuses a parameter? The parameter's name or description is bad. The LLM never calls the tool when it should? The description doesn't surface the right cues.

Iterate on tool design based on agent test results.

Anti-Patterns

Vague tool names. "process," "helper," "do_thing." LLM doesn't know when to call them.

Unstructured return values. A blob of text the LLM has to parse. Use structured returns.

Errors as exceptions or stack traces. Useless to the LLM. Return structured error objects.

Coupled parameters. Two parameters where neither without the other is valid. Combine into a structured input.

No idempotency support. LLM retries cause duplicate side effects. Use idempotency keys for mutations.

Tools that try to do everything. "Search and summarize and reply." Split into focused tools; let the agent compose.

No confirmation pattern for side effects. Every send is final. Use preview-then-confirm.

Install this skill directly: skilldb add multi-agent-orchestration-skills

Get CLI access →