Documentation Index
Fetch the complete documentation index at: https://orbit-docs.devotel.io/llms.txt
Use this file to discover all available pages before exploring further.
Agent cost controls
LLM-driven agents can consume budget unboundedly if a single user message triggers an infinite tool-call loop, or if a long-running conversation crosses a cost ceiling you only want to hit once. Orbit gates every agent run on three independent ceilings, all configurable per agent.
The three ceilings
| Ceiling | Scope | Default | What triggers it |
|---|
max_cost_per_run_cents | One chat / invoke / chat-stream call (one user turn → one assistant turn, including tool-call iterations). | unbounded (org-level cap still applies) | Aggregated cost across all LLM calls within a single run, computed from per-model token rates. |
max_cost_per_conversation_cents | The lifetime of a single agent_conversations.id — every run that touches the same conversation row. | unbounded | Probed before every LLM iteration, including iteration 0, so a conversation that hit the cap on a previous turn is also blocked on the next turn. |
max_api_calls_per_run | LLM API calls within one run (tool-iteration loop guard). | 25 (platform max) | Counts every LLM round-trip including tool-loop iterations. |
When any ceiling is exceeded the agent emits a final error event and stops:
{ "type": "error", "code": "COST_LIMIT", "message": "Cost budget exceeded" }
The corresponding codes are COST_LIMIT (per-run), COST_CAP_EXCEEDED (per-conversation), API_CALL_LIMIT (iteration cap), and TOKEN_LIMIT (per-conversation token aggregate).
curl -X POST https://orbit-api.devotel.io/api/v1/agents \
-H "X-API-Key: dv_live_sk_..." \
-H "Content-Type: application/json" \
-d '{
"name": "Support Agent",
"model": "gpt-4o",
"instructions": "You are a customer support agent for Acme Corp.",
"max_cost_per_run_cents": 25,
"max_cost_per_conversation_cents": 500
}'
curl -X PATCH https://orbit-api.devotel.io/api/v1/agents/agent_abc123 \
-H "X-API-Key: dv_live_sk_..." \
-H "Content-Type: application/json" \
-d '{
"max_cost_per_run_cents": 25,
"max_cost_per_conversation_cents": 500
}'
Both fields are persisted under config.max_cost_per_run_cents / config.max_cost_per_conversation_cents on the agent row.
Semantics
| Value | Meaning |
|---|
| Positive integer | The cap (in USD cents). Run / conversation aborts when accumulated cost exceeds this value. |
null (or omitted) | Unbounded at this layer. The platform-wide org-level cap still applies. |
0 | Kill switch. Blocks ALL spend immediately — first LLM call fails with COST_LIMIT. Useful when you want to disable an agent without deleting it. |
How accumulated cost is computed
- Per-run: real-time accumulation across the streaming loop. Estimated from prompt + completion tokens at the model’s per-1K rate, plus a flat per-API-call overhead.
- Per-conversation: tracked in Redis under a key keyed off
agent_conversations.id. Each run debits the counter on completion; the next run’s iteration-0 probe reads the counter and refuses if it’s already over the cap.
- The counter is reset only by deleting the conversation. Re-using a
conversation_id after the cap will keep failing until you provision a fresh one.
These are floor / ceiling values that no agent config can override:
| Limit | Value |
|---|
AGENT_MAX_API_CALLS_PER_RUN | 25 |
AGENT_MAX_TOKENS_PER_CONVERSATION | 200_000 |
MAX_TOOL_ITERATIONS | 10 |
They exist to prevent runaway behaviour even when an agent’s per-run cap is set to a high value.
Observability
Every cost-cap fire emits a structured log line and an event on the agent’s run trace:
{
"event": "agent.run.aborted",
"reason": "cost_cap_per_conversation",
"agent_id": "agent_abc123",
"conversation_id": "conv_def456",
"accumulated_cents": 514,
"cap_cents": 500
}
Subscribe to the agent.run.aborted webhook event to alert on runs that hit the cap.
Recommended starting values
- Tier-1 customer support agent —
max_cost_per_run_cents=25, max_cost_per_conversation_cents=500.
- Voice agent (longer turns, STT/TTS overhead) —
max_cost_per_run_cents=50, max_cost_per_conversation_cents=1000.
- Internal QA / sandbox —
max_cost_per_run_cents=10, max_cost_per_conversation_cents=50 to catch loops fast.
See also