In a traditional setup, you guard your API with rate limits: 1000 RPM. Any client exceeding that gets HTTP 429 "Too Many Requests."
In contrast, AI agents auto-retry failed calls. Against a rate limit, many agents will simply retry blocked calls until they get through. In slowdown mode, they wait. In budget exhaustion mode, they fail gracefully.
The problem isn't volume — it's unpredictability.
For agents, you need budget limits — not rate limits. Predictable spending, not just predictable requests.
The Runaway Agent Horror Story
Imagine a user asks a research agent: "Find me all AI startups in California."
The agent is designed to:
- Search Google.
- For every result, visit the website.
- If the website mentions "AI," save it.
What happens when it finds a "List of 1,000 Startups" directory?
The agent dutifully visits all 1,000 links. Each visit requires a browser tool call and a summarization call (GPT-4).
Cost per link: $0.10. Total Links: 1,000. Total Cost: $100.00 for a single query.
{"jsonrpc":"2.0","id":42,"error":{
"code":-32000,
"message":"Budget exhausted",
"data":{
"error":"budget_exhausted",
"tool":"dalle_generate",
"cost_credits":50,
"remaining_credits":0
}
}}The agent gets a structured error it can handle gracefully — not a crashed process or an infinite retry.
Cost Granularity Matters
Not all tool calls cost the same. Our resolver supports exact match and wildcard prefixes:
tools:
defaultCost: 5
costs:
web_search: 5
database_query: 5
gpt4_summarize: 25
gpt4_*: 25 # wildcard: gpt4_analyze, gpt4_translate...
dalle_generate: 50
code_execute: 15Resolution order: exact match → longest wildcard prefix → catch-all * → default.
For Production Teams
Enterprise features like RedisBudgetEnforcer unlock:
- _RedisBudgetEnforcer_: Atomic spend tracking across replicas
- _Postgres audit trail_: Spend attribution for chargebacks
- _Fiat402_: Lightning micropayments (L402) for real spend control
The code is open source. Try it:
go install github.com/satgate-io/satgate/cmd/satgate-mcp@latest