In a traditional setup, you guard your API with rate limits: 1000 RPM. Any client exceeding that gets HTTP 429 "Too Many Requests."
In contrast, AI agents auto-retry failed calls. Against a rate limit, many agents will simply retry blocked calls until they get through. In slowdown mode, they wait. In budget exhaustion mode, they fail gracefully.
The problem isn't volume — it's unpredictability.
For agents, you need budget limits — not rate limits. Predictable spending, not just predictable requests.
The Runaway Agent Horror Story
Imagine a user asks a research agent: "Find me all AI startups in California."
The agent is designed to:
- Search Google.
- For every result, visit the website.
- If the website mentions "AI," save it.
What happens when it finds a "List of 1,000 Startups" directory?
The agent dutifully visits all 1,000 links. Each visit requires a browser tool call and a summarization call (GPT-4).
Cost per link: $0.10. Total Links: 1,000. Total Cost: $100.00 for a single query.
{"jsonrpc":"2.0","id":42,"error":{
"code":-32000,
"message":"Budget exhausted",
"data":{
"error":"budget_exhausted",
"tool":"dalle_generate",
"cost_credits":50,
"remaining_credits":0
}
}}The agent gets a structured error it can handle gracefully — not a crashed process or an infinite retry.
Cost Granularity Matters
Not all tool calls cost the same. Our resolver supports exact match and wildcard prefixes:
tools:
defaultCost: 5
costs:
web_search: 5
database_query: 5
gpt4_summarize: 25
gpt4_*: 25 # wildcard: gpt4_analyze, gpt4_translate...
dalle_generate: 50
code_execute: 15Resolution order: exact match → longest wildcard prefix → catch-all * → default.
For Production Teams
Enterprise features like RedisBudgetEnforcer unlock:
- _RedisBudgetEnforcer_: Atomic spend tracking across replicas
- _Postgres Evidence Pack_: Spend attribution for chargebacks
- _paid-rail governance_: paid-rail context for external agent/API monetization
FAQ
AI agent spending limit questions
What are AI agent spending limits?
AI agent spending limits are request-path policies that cap how much an autonomous agent can spend by agent, tool, model, route, workflow, or time window before upstream API or MCP tool calls execute.
Why are rate limits not enough for AI agent cost control?
Rate limits control request volume, not money. AI agents can still choose expensive tools, retry costly calls, or fan out across subtasks while staying under a request-per-minute limit.
Where should teams enforce AI agent spend limits?
Enforce spending limits in the request path at an economic firewall or MCP proxy so budget, revocation, routing, and audit policy are checked before a costly call executes.
Can AI agent spending limits be set per workflow or time window?
Yes. AI agent spending limits can be scoped by workflow, task, agent, sub-agent, model, MCP tool, route, customer, environment, day, week, or token expiry window so each workload receives a precise hard budget.
The code is open source. Try it:
go install github.com/satgate-io/satgate/cmd/satgate-mcp@latest