View Post | Keith Tobin - Insights on Public Cloud & AI

MCP—A Better Way?

Why Natural Language is the True Native Interface for AI Agents

The adoption of the Model Context Protocol (MCP) has exploded. It is currently the industry darling, representing a standardized way to connect AI models to data and tools. Nevertheless, in our collective haste toward building agentic systems, we seem to have bypassed a fundamental question: "Is MCP actually the best interface?"

In this post, I argue that MCP—and the broader paradigm of "tool calling"—may not be the correct paradigm for AI agents. While the industry rushes to standardize around rigid protocols, we are overlooking the medium where Large Language Models (LLMs) truly thrive. The ideal interface for interacting with other agents and systems isn't a rigid API—it is the natural language chat interface itself.

The Problematic Legacy of Contractual Interfaces

MCP is a tool interface rooted in the "pre-AI" era of software engineering. It relies on well-defined contracts that strictly defy every aspect of the input and output. This mindset is based on the legacy idea of saying, "Here is a tool; label it as a drill," and then explaining its function solely as "making holes." We then rely on making all possible users aware of this strict information and the specific input/output parameters required to call that tool.

The core issue lies in this specific contract. The code that intends to use the drill—in this case, an AI agent—must now implement the necessary functions to consume this drill contract perfectly. It must:

Implement the correct parameters (inputs and outputs).
Please utilize the capability's name and description.
Act as a brittle "go-between" for the LLM/agent and the tool.

All this work is performed in code, upfront, before the drill is ever used. We write hundreds of lines of code to implement a brittle and rigid contract just so the AI agent can make a hole. This forces the agent to take on the added complexity of controlling the drill via a strict API, when in reality, the AI agent only requires a specific outcome.

This legacy approach forces developers to painstakingly format inputs and outputs. We try to parse the agent conversation into something understood by the code—usually by hunting for specific phrases or words. We ultimately rely on hope: hoping the LLM can provide information during the conversation in a format that meets these rigid requirements. We even go as far as placing flags on the LLM or investing thousands of dollars to fine-tune models using structured inputs and outputs. We undertake this expensive procedure with the expectation that the non-deterministic nature of the LLM will produce results that a deterministic tool can interpret.

I wonder: Is there already an interface that the agent could use that utilizes natural language instead of a rigid contract? Could we just use the agent's natural language capabilities and the existing chat interface?
The "Pre-AI" vs. "Post-AI" Interface
MCP is missing the point of AI. It stems from a world where thinking is centered on well-defined, rigid contracts. This type of implementation belongs to the pre-AI days, when tools were involved in "dumb," static processes. The issue is that pre-AI applications require structure, while AI functions best in conversational, instructed environments.

Post-AI requires a different approach. The interface must be an intelligent conversational one, reflecting how AI actually operates. An AI agent is, at its heart, a chat conversation interface. The content cannot be counted on to be structured; it is a human conversation interface. In reality, relying on forced structure and strict content formats has proven difficult and unreliable.

Yes, models have modes that provide structure (like JSON mode), but using them feels forced because the chat interface is the model's natural way of operating. Even if you manage to get the model to produce a structured output, you still need to request the content in the specific format you require. The recurring theme here is that we are demanding formatting and structure simply because that is how our old pre-AI applications operated, not because it is what the AI needs.
A Better Way: The Capability Agent
But what if the tool was not a static API but an agent with a "hole-making capability"? How would this look?

Rather than creating an upfront contract as outlined for the MCP tool, the AI agent with the hole-making capability would simply listen to the agent's social group chat. It would listen for requests from agents seeking assistance.

The Dialogue Approach: The Capability Agent listens for a request like, "I need a hole." It then responds through its natural language interface:

Capability Agent: "I can help you make a hole. Can you describe the size of the hole you need?"
Requesting Agent: "I need a 4-inch hole."
Capability Agent: "I have converted this to metric and will be making a 101.6 mm hole, which is 4 inches. Is that OK?"
Requesting Agent: "Yes."

We should use a chat interface where two systems engage in a conversation to complete a task. This resolves the legacy issue of brittle contracts and provides an interface that is native to AI.

The Screenshot Scenario: Consider a more complex scenario where you have a tool agent able to take a screenshot. Instead of using an MCP-type interface, the AI agent requesting the action simply says into the chat, "I need a screenshot of the operating system I am connected to."

In the conversation group, a "Screenshot Capability Agent" responds: "I can help you with that. Can you tell me more about what you want? I need from you the following: the name of the captured screenshot and the description text. Furthermore, I can only create and deliver the screenshot as a JPG or as a base64-encoded image. Here is a reference for this task: [Reference ID]."

The agent needing the screenshot replies, "That works super well for me; go ahead. Name it 'debug_01,' the description is 'error state,' and use JPG format." The agent then captures the screenshot and displays a message that says, 'Here is the screenshot: [data].' Please let me know when we are finished. If I don't hear back from you in five minutes, I will close this task and the conversation."
Natural Negotiation Over Structure
The chat conversation interface is the primary means through which the AI agent communicates and comprehends information. Using any other method—like rigid API calls—is incompatible with the nature of the AI agent and leads to performance difficulties. Rather than creating a "tool," we must create an AI agent with capabilities.

I advocate grouping capabilities into a single agent. For example, you might have a "Terraform Agent." It isn't a tool; it is an entity able to have a conversation with you or another agent. It is able to handle chat-type requests without a specific structure. There is no specific order needed for inputs, and if the agent is unsure, it will simply say so.

Through conversation, the requesting agent and the capable agent engage in a dialogue until they fully understand the request and capture all requirements before the capable agent performs the task. Both parties negotiate through conversation to arrive at a consensus.

The Capability Agent knows what data it needs to perform the task. It converts those needs into natural language requests, asks the provider to supply them, and then continues working with the requester until it has what it needs. It does the same with the returned information, the format style, and the gap between what the requester needs versus what the capability agent can provide.

This approach is natural, and it is the way AI agents want to work. It emphasizes communication instead of adhering to a rigid structure. Being "out of order" is natural in a conversation; the agents will figure it out as they go, self-correcting along the way.
Conclusion
The result of this shift is a natural way for a Capability AI Agent to perform a task and for a Requesting Agent to have a task performed, all using natural conversation and playing to the strengths of LLM-based ASI agents.

Contrast this with the tool-call world, where the tool fails to leverage the true strengths of the LLM. In that world, software developers and model developers expend massive effort trying to fit outdated, structured thinking into a landscape dominated by unstructured chat. We are forcing out-of-order inputs into strict boxes with no guarantee that the resulting text will match what was originally requested.

We need to stop building for the past and start building Capability Agents for the future.