The Tool Boundary
A language model without tools can only generate text. It can write beautiful prose about reading a file, but it can't actually read the file. It can describe perfectly what API call should be made, but it can't execute that call.
Tools transform text generators into actors in the world. The boundary between "thinking about doing something" and "actually doing something" is the tool boundary. Crossing it is what makes AI agents agents.
This transformation isn't incremental--it's categorical. A model with tools is a fundamentally different system than a model without them. The same underlying model becomes capable of entirely different classes of tasks.
Anatomy of a Tool
Every tool has the same basic structure:
Name: How the agent refers to it. Clear names reduce confusion. read_file beats rf.
Description: What it does. The agent uses this to decide when to call the tool. Descriptions matter more than you'd think--a vague description leads to misuse.
Parameters: What inputs it accepts. A schema defines types, required versus optional fields, and constraints.
Returns: What it outputs. This becomes context for the next agent step.
The agent doesn't execute tools--it generates tool calls. The orchestration system receives those calls, executes the actual operations, and returns results. This separation is crucial for security and reliability.
When the agent decides to use a tool, it's predicting that this tool call with these parameters would be helpful. Whether it actually is helpful depends on whether the execution succeeds and whether the agent can interpret the results.
Tool Design Principles
Building effective tools is an interface design problem. The consumer is an AI model that reads descriptions and parameter schemas to understand usage.
Clarity over brevity: Tool descriptions should explain not just what the tool does but when to use it and what results to expect. An extra sentence in the description is worth many failed calls.
Constrained action spaces: Offer specific operations rather than general interfaces. delete_file(path) is clearer than file_operation(action, path). The agent doesn't need to reason about which action to choose.
Meaningful errors: When operations fail, return errors that explain why and suggest recovery. "Permission denied: file is read-only" beats "Error 403".
Predictable outputs: Structure results consistently. If every tool returns JSON with success, result, and error fields, the agent learns the pattern quickly.
Idempotent when possible: Operations that can be safely retried are easier to use in agentic loops. If the agent calls the same tool twice by mistake, nothing breaks.
The Model Context Protocol (MCP)
MCP standardizes how tools are exposed to AI agents. Instead of every application defining its own tool format, MCP provides a common interface.
An MCP server exposes tools with standard schemas. An MCP client (like an AI agent framework) can connect to any server and discover what tools are available. The agent gets a unified view of capabilities regardless of where they're implemented.
This matters for composability. A knowledge graph server, a filesystem server, and an API gateway server can all expose tools through MCP. The agent sees them as a single coherent toolkit.
MCP also standardizes resources (data the agent can read) and prompts (templates the agent can use). The pattern is consistent: expose capabilities through a standard interface so agents can discover and use them without custom integration.
Tool Categories
Different tools serve different purposes in agentic workflows:
Read tools: Access information without changing state. File reading, database queries, API lookups. Low risk--the worst case is getting wrong information.
Write tools: Modify state. File writing, database updates, API calls with side effects. Higher risk--mistakes can't always be undone.
Execution tools: Run code or commands. Shell access, script execution, container operations. Highest risk--the agent can affect the entire system.
Search tools: Find relevant content. Semantic search, keyword search, filtering. Essential for retrieval-augmented workflows.
Communication tools: Interact with external services. Email, messaging, API integrations. May have rate limits, authentication requirements, or cost implications.
Production systems typically tier access to tools. Read tools available freely. Write tools requiring confirmation. Execution tools restricted to specific contexts or requiring human approval.
Failure Modes in Tool Use
Tools introduce specific failure patterns:
Wrong tool selection: The agent picks a tool that doesn't fit the task. Usually caused by ambiguous descriptions or overlapping capabilities.
Wrong parameters: The agent calls the right tool with incorrect inputs. Type mismatches, invalid values, missing required fields.
Misinterpreted results: The tool returns correctly but the agent misunderstands the output. Often happens with complex structured data.
Side effect blindness: The agent doesn't anticipate consequences of tool calls. Deleting a file without checking if anything depends on it.
Rate limit violations: The agent makes too many calls too quickly. External APIs enforce limits that the agent doesn't model.
Cost explosion: Some tools have costs per call. Without limits, an agent can incur significant charges through repeated operations.
Mitigation requires both tool design and agent training. Better descriptions reduce wrong selection. Schema validation catches wrong parameters. Structured outputs reduce misinterpretation. Explicit limits prevent runaway costs.
Capability Expansion Patterns
Tools enable several patterns for extending what agents can accomplish:
Grounding: Tools connect language to reality. Instead of generating plausible-sounding code, the agent can execute it and observe whether it works.
Verification: Tool outputs provide feedback. The agent proposes a hypothesis, a tool tests it, results inform the next step.
Augmentation: Tools provide capabilities the model lacks. A calculator tool handles arithmetic precisely. A search tool accesses current information.
Delegation: Tools can invoke other agents. A "run_specialist" tool hands off to a specialized system and returns results.
State management: Tools can persist and retrieve information beyond the context window. This is how agents maintain memory across sessions.
The most powerful agentic systems combine these patterns. Grounding ensures relevance to reality. Verification enables learning within the session. Augmentation extends base capabilities. Delegation enables scale. State management enables continuity.
The Composability Advantage
Individual tools are useful. Compositions of tools are transformative.
Consider a workflow: search for relevant documents, extract key information, write a summary, save it to a file, and send a notification. Five tools, each simple, combined into a capability that none could provide alone.
The agent becomes an orchestrator of tools rather than a direct implementer. Its job is determining which tools to call in what order with what parameters. The actual work happens in tool implementations.
This is why tool ecosystems matter more than individual tool quality. A rich ecosystem of composable tools enables workflows that no one tool designer anticipated. The agent discovers useful compositions through exploration and instruction.
Design for Human-AI Collaboration
Tools mediate between human intent and machine execution. Designing them well requires considering both sides.
Human-readable outputs: Even if the primary consumer is an agent, make outputs interpretable by humans reviewing what happened.
Confirmation hooks: For risky operations, tools should support human-in-the-loop confirmation. The agent requests an action; a human approves before execution.
Audit trails: Log what tools did, when, with what parameters, and what results. Essential for debugging and accountability.
Graceful limits: When rate limits or budgets are exceeded, fail clearly rather than silently. The agent needs to know why it can't proceed.
The best tool designs enable autonomous operation while maintaining human oversight. Full automation of simple tasks. Assisted automation with confirmation for consequential ones. Clear visibility into what's happening at all times.
Related: A4 explores how tools integrate with memory systems. A5 covers recovery when tool calls fail.