The Model Context Protocol (MCP) changed how AI agents interact with tools. Instead of every agent team building custom integrations for Slack, GitHub, databases, and APIs, MCP provides a standard interface: agents speak MCP, tools expose MCP servers, and everyone connects.
Then reality set in. One agent connecting to one MCP server is a demo. Fifty agents connecting to twenty MCP servers across five teams is production. And production needs a gateway.
There's no shortage of MCP gateway guides out there — Docker, Traefik, Composio, and others have published their takes. They cover the fundamentals well: centralized routing, auth translation, tool aggregation. But they all stop at the same point: getting traffic from agents to tools.
This guide goes further. We'll cover the standard gateway architecture, then address the layer that determines whether your MCP deployment stays financially viable: economic governance.
What Is an MCP Gateway?
An MCP gateway sits between AI agents and MCP servers. Instead of each agent maintaining direct connections to every tool server, agents connect to the gateway, and the gateway manages upstream connections.
# Without a gateway:
Agent A → MCP Server (GitHub)
Agent A → MCP Server (Slack)
Agent A → MCP Server (Database)
Agent B → MCP Server (GitHub)
Agent B → MCP Server (Slack)
Agent B → MCP Server (Database)
# 6 connections, each configured separately
# With a gateway:
Agent A → MCP Gateway → MCP Server (GitHub)
Agent B → MCP Gateway → MCP Server (Slack)
→ MCP Server (Database)
# 2 agent connections, gateway manages the restThis centralization solves three immediate problems:
- Configuration sprawl. Without a gateway, each agent needs credentials and connection details for every tool. With a gateway, agents authenticate once.
- Auth translation. MCP servers often need specific credentials (OAuth tokens, API keys, service accounts). The gateway handles credential management so agents don't carry sensitive tokens.
- Tool discovery. The gateway aggregates tool definitions from all upstream servers, presenting agents with a unified catalog of available capabilities.
If you've worked with API gateways (Kong, Envoy, Traefik), MCP gateways serve an analogous role for the MCP protocol. The difference is what flows through them: not HTTP requests, but tool calls with structured inputs and outputs.
MCP Gateway Architecture: The Standard Stack
Most MCP gateway implementations share a common architecture with four layers:
Layer 1: Transport
MCP supports multiple transports: stdio (local processes), SSE (Server-Sent Events over HTTP), and the newer Streamable HTTP transport. A gateway typically accepts connections via SSE or Streamable HTTP on the client side, and connects to upstream servers using whatever transport they support.
# Gateway transport configuration
gateway:
listen:
transport: streamable-http
port: 8080
path: /mcp
upstreams:
- name: github
transport: stdio
command: "npx @modelcontextprotocol/server-github"
- name: database
transport: sse
url: "https://internal.corp/mcp/database"
- name: slack
transport: streamable-http
url: "https://internal.corp/mcp/slack"Transport bridging is table stakes. Every gateway handles this. The key decision is which client-facing transport to expose — Streamable HTTP is the recommended choice for new deployments, with SSE as a fallback for older clients.
Layer 2: Authentication & Authorization
The gateway becomes your authentication boundary. Agents authenticate to the gateway; the gateway authenticates to upstream servers. This is where most guides spend their time, and for good reason — getting auth wrong means either agents can't connect or agents can access everything.
# Standard auth: API keys or OAuth
gateway:
auth:
type: bearer
validate: https://auth.corp/validate
# Per-upstream credentials (managed by gateway)
upstreams:
github:
auth:
type: token
secret: GITHUB_PAT # From vault
database:
auth:
type: service-account
credentials: /etc/creds/db.jsonStandard auth answers one question: is this agent allowed to connect? Binary. Yes or no. We'll come back to why this isn't sufficient.
Layer 3: Tool Aggregation & Filtering
When an agent connects to the gateway, it calls tools/list to discover available tools. The gateway aggregates tool definitions from all upstream servers, optionally filtering based on the agent's role or permissions.
# Tool filtering by agent role
policies:
research-agent:
allowed_tools:
- github.search_code
- github.get_file
- database.query # read-only
denied_tools:
- github.create_issue
- database.execute # no writes
admin-agent:
allowed_tools: ["*"] # full accessTool filtering prevents agents from seeing or calling tools they shouldn't access. It's the authorization complement to authentication — determining not just who can connect but what they can do.
Layer 4: Observability
The gateway is the natural place to instrument MCP traffic. Every tool call passes through it, so you get a complete audit log without modifying agents or servers.
# Structured log entry for every tool call
{
"timestamp": "2026-03-24T14:00:00Z",
"agent_id": "research-agent-v3",
"tool": "github.search_code",
"input": {"query": "authentication handler", "repo": "org/api"},
"duration_ms": 342,
"status": "success",
"tokens_used": 1250
}Structured logging, metrics export (Prometheus, DataDog), and trace correlation are standard gateway capabilities. They tell you what happened. Which brings us to the gap.
The Gap: What Standard MCP Gateways Miss
If you follow Docker's MCP gateway guide, or Traefik's, or Composio's, you'll end up with a working gateway that routes traffic, handles auth, aggregates tools, and logs everything. That's genuinely useful.
It's also incomplete in a way that won't be obvious until the first cost incident.
Here's the scenario: A research agent connects to your MCP gateway. It has access to a code search tool (fast, cheap) and a code analysis tool (slow, expensive — it invokes an LLM under the hood). The agent is tasked with reviewing a large codebase. It calls the analysis tool 800 times in two hours.
Your gateway logged every call. Your metrics show a spike. Your alert fires. But the damage is done — $2,400 in compute costs, triggered by a single agent with a poorly constrained objective.
The standard gateway stack had four opportunities to prevent this. It used zero of them:
- Authentication confirmed the agent was valid. It didn't check whether the agent could afford 800 expensive tool calls.
- Authorization confirmed the agent was allowed to use the analysis tool. It didn't limit how much the agent could spend on it.
- Observability recorded every call. It didn't stop any of them.
- Rate limiting (if configured) counted requests per window. It didn't know that some requests cost $0.01 and others cost $3.00.
This is the economic governance gap. It's not a hypothetical — it's the reason teams who deploy MCP at scale inevitably add a cost control layer, either proactively or after the first surprise bill.
Layer 5: Economic Governance
Economic governance adds three capabilities to your MCP gateway that the standard four layers don't provide:
1. Per-Tool Cost Modeling
Every tool in your MCP catalog has an economic weight. A search_code call that hits a local index costs virtually nothing. A generate_analysis call that invokes Claude costs real money. The gateway needs to know the difference.
# Cost model for MCP tools
tools:
github.search_code:
cost: 1 credit # ~$0.001
github.get_file:
cost: 1 credit
analysis.review_code:
cost: 50 credits # ~$0.50 (invokes LLM)
analysis.generate_report:
cost: 200 credits # ~$2.00 (long-form generation)
database.query:
cost: 2 credits
slack.send_message:
cost: 1 creditWith cost modeling, rate limiting becomes budget limiting. An agent with 500 credits can make 500 searches, or 10 code reviews, or 2 report generations. The agent allocates. The gateway enforces.
2. Budget-Aware Tokens
Standard bearer tokens say "this agent is authenticated." Budget-aware tokens say "this agent is authenticated and has 1,000 credits remaining." The token itself carries the economic context.
SatGate implements this with macaroon tokens — a cryptographic credential format designed at Google that supports embedded caveats. A macaroon can encode:
- Total budget allocation
- Expiration time
- Allowed tools (or tool categories)
- Delegation chain (which parent minted this token)
# Mint a budget-aware token for an agent
satgate mint \
--budget 1000 \
--holder "research-agent" \
--tools "github.search_code,github.get_file,analysis.review_code" \
--expires 24h
# The resulting macaroon encodes all constraints
# No server-side session needed — it's self-contained
# The gateway validates the token on every tool callThe critical property: macaroons support attenuation. A parent token can mint child tokens with fewer permissions, never more. An orchestrator with 10,000 credits can delegate 2,000 to a research sub-agent. That sub-agent can delegate 500 to a search specialist. The total never exceeds the parent. Authority flows downward and diminishes — exactly the pattern multi-agent architectures need.
3. Pre-Call Enforcement
This is the distinction between observability and governance. Observability logs a tool call after it happens. Governance decides whether the call happens at all.
# Gateway decision flow for each tool call:
1. Agent calls tools/call with macaroon token
2. Gateway validates macaroon signature ✓
3. Gateway checks: is this tool allowed? ✓
4. Gateway looks up tool cost: 50 credits
5. Gateway checks remaining budget: 30 credits
6. 30 < 50 → DENY
# Response to agent:
{
"error": "budget_exhausted",
"required": 50,
"remaining": 30,
"alternatives": [
{"tool": "github.search_code", "cost": 1}
],
"topup_url": "https://gateway.corp/budget/request"
}The denial is structured. The agent gets machine-readable context: how much it has, how much it needs, and what cheaper alternatives exist. A well-designed agent can adapt — switch to a cheaper tool, request more budget from its parent, or gracefully inform the user. Compare this to a rate-limit 429, which just says "try again later" and triggers a retry loop.
Setting Up an MCP Gateway with Economic Governance
Here's a practical setup that combines the standard gateway stack with SatGate's economic governance layer.
Step 1: Define Your Tool Catalog
# catalog.yaml — your MCP server inventory
servers:
github:
command: "npx @modelcontextprotocol/server-github"
env:
GITHUB_TOKEN: ${GITHUB_PAT}
costs:
search_code: 1
get_file: 1
create_issue: 5
create_pull_request: 10
postgres:
url: "https://mcp.internal/postgres"
costs:
query: 2
execute: 10
slack:
command: "npx @modelcontextprotocol/server-slack"
env:
SLACK_BOT_TOKEN: ${SLACK_TOKEN}
costs:
send_message: 1
search_messages: 3Step 2: Configure Governance Policies
# policies.yaml — who can do what, and how much
teams:
engineering:
daily_budget: 10000
agents:
code-review-agent:
budget: 3000
tools: [github.*, postgres.query]
deploy-agent:
budget: 1000
tools: [github.create_pull_request, slack.send_message]
research:
daily_budget: 5000
agents:
research-agent:
budget: 5000
tools: ["*"] # all tools
alerts:
- threshold: 80%
notify: [slack:#platform-alerts]
- threshold: 95%
notify: [slack:#platform-alerts, pagerduty]Step 3: Start the Gateway
# Install SatGate
go install github.com/satgate-io/satgate/cmd/satgate-mcp@latest
# Start with your catalog and policies
satgate-mcp serve \
--catalog catalog.yaml \
--policies policies.yaml \
--port 8080
# Mint tokens for your agents
satgate-mcp mint \
--team engineering \
--agent code-review-agent \
--budget 3000 \
--expires 24hStep 4: Point Agents at the Gateway
# In your agent's MCP client config
{
"mcpServers": {
"gateway": {
"url": "https://gateway.corp:8080/mcp",
"transport": "streamable-http",
"headers": {
"Authorization": "Bearer <macaroon_token>"
}
}
}
}The agent connects to one endpoint instead of multiple servers. The gateway handles routing, auth translation, tool aggregation, and budget enforcement. The agent doesn't need to know about any of this — it just makes tool calls, and the gateway either allows or denies them.
Gateway Comparison: Routing vs. Governance
Not all MCP gateways are built for the same job. Here's where the current landscape stands:
Docker's MCP gateway, Traefik Hub's MCP gateway, and Composio's gateway handle the top four rows well. They're solid routing infrastructure. But routing without economics is like a firewall without deny rules — it organizes traffic without controlling what matters.
Multi-Agent Delegation Through the Gateway
The gateway pattern becomes especially powerful in multi-agent architectures. Consider an orchestrator that coordinates three specialist agents:
# Orchestrator has 10,000 credits
# It mints sub-tokens for each specialist
Orchestrator (10,000 credits)
├── Research Agent (3,000 credits)
│ └── tools: search_code, get_file, query
├── Analysis Agent (5,000 credits)
│ └── tools: review_code, generate_report
└── Communication Agent (1,000 credits)
└── tools: send_message, create_issue
# Total delegated: 9,000 ≤ 10,000 ✓
# Each sub-token is cryptographically derived
# Gateway enforces each agent's ceiling independently
# Full audit trail traces back to orchestratorWithout economic delegation, multi-agent systems have two bad options: shared credentials (no attribution, no individual limits) or separate credentials (no relationship between them, orchestrator can't control downstream spend). Macaroon-based delegation gives you the third option: hierarchical authority with diminishing permissions at each level.
Production Considerations
Before deploying an MCP gateway in production, address these operational concerns:
- Latency budget. The gateway adds a hop. For stdio-backed servers, the gateway process also manages server lifecycles. Measure per-call overhead and set latency budgets for each tool. Economic checks (macaroon validation, budget lookup) add sub-millisecond overhead — they're cryptographic operations, not database queries.
- High availability. The gateway is a single point of failure. Run at least two instances behind a load balancer. For budget state, use a shared store (Redis) or a consensus-based approach. SatGate's macaroon approach minimizes shared state requirements since the token itself carries the constraints.
- Server lifecycle management. Stdio-based MCP servers are processes that the gateway spawns and manages. Monitor for server crashes, implement restart policies, and set resource limits (memory, CPU) per server process.
- Graceful degradation. If an upstream MCP server is down, the gateway should remove its tools from the catalog rather than returning errors for every call. Agents adapt to available tools; they don't handle transient server failures well.
The MCP Gateway Maturity Model
Think of MCP gateway deployment as a progression:
- Level 0: Direct connections. Each agent connects to each server. Works for prototypes. Doesn't scale.
- Level 1: Routing gateway. Centralized connections, auth translation, tool aggregation. This is where most guides end.
- Level 2: Observable gateway. Add structured logging, metrics, and alerting. You know what happened. You can't prevent it.
- Level 3: Governed gateway. Add cost modeling, budget enforcement, and hierarchical delegation. You control what happens, in real time, before the cost is incurred.
Most teams in early 2026 are at Level 1 or 2. The cost incidents that push them to Level 3 are predictable and preventable — if you build the economic layer from the start rather than bolting it on after the first surprise bill.
Getting Started
If you're deploying MCP in production, start with whichever routing gateway fits your infrastructure — Docker's gateway for container-heavy environments, Traefik Hub if you're already a Traefik shop, or a custom solution if you need specific transport handling.
Then ask the economic question: when one of our agents burns 10x the expected tool calls, what stops it? If the answer is "an alert that fires after the fact," you're at Level 2. That's fine for dev. It's a risk in production.
Economic governance isn't about distrust — it's about enabling autonomy safely. Agents with clear budget boundaries can operate more independently, because the organization knows the blast radius is contained. The gateway doesn't slow agents down. It lets you give them a longer leash.
SatGate adds economic governance to your MCP gateway. Open source, deploys in minutes:
go install github.com/satgate-io/satgate/cmd/satgate-mcp@latest