Agent cost controls

LLM-driven agents can consume budget unboundedly if a single user message triggers an infinite tool-call loop, or if a long-running conversation crosses a cost ceiling you only want to hit once. Orbit gates every agent run on three independent ceilings, all configurable per agent.

The three ceilings

Ceiling	Scope	Default	What triggers it
`max_cost_per_run_cents`	One `chat` / `invoke` / `chat-stream` call (one user turn → one assistant turn, including tool-call iterations).	unbounded (org-level cap still applies)	Aggregated cost across all LLM calls within a single run, computed from per-model token rates.
`max_cost_per_conversation_cents`	The lifetime of a single `agent_conversations.id` — every run that touches the same conversation row.	unbounded	Probed before every LLM iteration, including iteration 0, so a conversation that hit the cap on a previous turn is also blocked on the next turn.
`max_api_calls_per_run`	LLM API calls within one run (tool-iteration loop guard).	25 (platform max)	Counts every LLM round-trip including tool-loop iterations.

When any ceiling is exceeded the agent emits a final error event and stops:

{ "type": "error", "code": "COST_LIMIT", "message": "Cost budget exceeded" }

The corresponding codes are COST_LIMIT (per-run), COST_CAP_EXCEEDED (per-conversation), API_CALL_LIMIT (iteration cap), and TOKEN_LIMIT (per-conversation token aggregate).

Configure on agent create

curl -X POST https://orbit-api.devotel.io/api/v1/agents \
  -H "X-API-Key: dv_live_sk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Support Agent",
    "model": "gpt-4o",
    "instructions": "You are a customer support agent for Acme Corp.",
    "max_cost_per_run_cents": 25,
    "max_cost_per_conversation_cents": 500
  }'

Configure on agent update

curl -X PATCH https://orbit-api.devotel.io/api/v1/agents/agent_abc123 \
  -H "X-API-Key: dv_live_sk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "max_cost_per_run_cents": 25,
    "max_cost_per_conversation_cents": 500
  }'

Both fields are persisted under config.max_cost_per_run_cents / config.max_cost_per_conversation_cents on the agent row.

Semantics

Value	Meaning
Positive integer	The cap (in USD cents). Run / conversation aborts when accumulated cost exceeds this value.
`null` (or omitted)	Unbounded at this layer. The platform-wide org-level cap still applies.
`0`	Kill switch. Blocks ALL spend immediately — first LLM call fails with `COST_LIMIT`. Useful when you want to disable an agent without deleting it.

How accumulated cost is computed

Per-run: real-time accumulation across the streaming loop. Estimated from prompt + completion tokens at the model’s per-1K rate, plus a flat per-API-call overhead.
Per-conversation: tracked in Redis under a key keyed off agent_conversations.id. Each run debits the counter on completion; the next run’s iteration-0 probe reads the counter and refuses if it’s already over the cap.
The counter is reset only by deleting the conversation. Re-using a conversation_id after the cap will keep failing until you provision a fresh one.

Hard platform ceilings (always enforced)

These are floor / ceiling values that no agent config can override:

Limit	Value
`AGENT_MAX_API_CALLS_PER_RUN`	25
`AGENT_MAX_TOKENS_PER_CONVERSATION`	200_000
`MAX_TOOL_ITERATIONS`	10

They exist to prevent runaway behaviour even when an agent’s per-run cap is set to a high value.

Observability

Every cost-cap fire emits a structured log line and an event on the agent’s run trace:

{
  "event": "agent.run.aborted",
  "reason": "cost_cap_per_conversation",
  "agent_id": "agent_abc123",
  "conversation_id": "conv_def456",
  "accumulated_cents": 514,
  "cap_cents": 500
}

Subscribe to the agent.run.aborted webhook event to alert on runs that hit the cap.

Recommended starting values

Tier-1 customer support agent — max_cost_per_run_cents=25, max_cost_per_conversation_cents=500.
Voice agent (longer turns, STT/TTS overhead) — max_cost_per_run_cents=50, max_cost_per_conversation_cents=1000.
Internal QA / sandbox — max_cost_per_run_cents=10, max_cost_per_conversation_cents=50 to catch loops fast.

Getting Started

Channels

AI Agents

Flows

Numbers & Verify

Webhooks

Compliance

Changelog

Agent cost controls

Agent cost controls

The three ceilings

Configure on agent create

Configure on agent update

Semantics

How accumulated cost is computed

Hard platform ceilings (always enforced)

Observability

Recommended starting values

See also

Getting Started

Channels

AI Agents

Flows

Numbers & Verify

Webhooks

Compliance

Changelog

Documentation Index

​Agent cost controls

​The three ceilings

​Configure on agent create

​Configure on agent update

​Semantics

​How accumulated cost is computed

​Hard platform ceilings (always enforced)

​Observability

​Recommended starting values

​See also

Agent cost controls

The three ceilings

Configure on agent create

Configure on agent update

Semantics

How accumulated cost is computed

Hard platform ceilings (always enforced)

Observability

Recommended starting values

See also