Skip to main content

Documentation Index

Fetch the complete documentation index at: https://orbit-docs.devotel.io/llms.txt

Use this file to discover all available pages before exploring further.

Agent cost controls

LLM-driven agents can consume budget unboundedly if a single user message triggers an infinite tool-call loop, or if a long-running conversation crosses a cost ceiling you only want to hit once. Orbit gates every agent run on three independent ceilings, all configurable per agent.

The three ceilings

CeilingScopeDefaultWhat triggers it
max_cost_per_run_centsOne chat / invoke / chat-stream call (one user turn → one assistant turn, including tool-call iterations).unbounded (org-level cap still applies)Aggregated cost across all LLM calls within a single run, computed from per-model token rates.
max_cost_per_conversation_centsThe lifetime of a single agent_conversations.id — every run that touches the same conversation row.unboundedProbed before every LLM iteration, including iteration 0, so a conversation that hit the cap on a previous turn is also blocked on the next turn.
max_api_calls_per_runLLM API calls within one run (tool-iteration loop guard).25 (platform max)Counts every LLM round-trip including tool-loop iterations.
When any ceiling is exceeded the agent emits a final error event and stops:
{ "type": "error", "code": "COST_LIMIT", "message": "Cost budget exceeded" }
The corresponding codes are COST_LIMIT (per-run), COST_CAP_EXCEEDED (per-conversation), API_CALL_LIMIT (iteration cap), and TOKEN_LIMIT (per-conversation token aggregate).

Configure on agent create

curl -X POST https://orbit-api.devotel.io/api/v1/agents \
  -H "X-API-Key: dv_live_sk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Support Agent",
    "model": "gpt-4o",
    "instructions": "You are a customer support agent for Acme Corp.",
    "max_cost_per_run_cents": 25,
    "max_cost_per_conversation_cents": 500
  }'

Configure on agent update

curl -X PATCH https://orbit-api.devotel.io/api/v1/agents/agent_abc123 \
  -H "X-API-Key: dv_live_sk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "max_cost_per_run_cents": 25,
    "max_cost_per_conversation_cents": 500
  }'
Both fields are persisted under config.max_cost_per_run_cents / config.max_cost_per_conversation_cents on the agent row.

Semantics

ValueMeaning
Positive integerThe cap (in USD cents). Run / conversation aborts when accumulated cost exceeds this value.
null (or omitted)Unbounded at this layer. The platform-wide org-level cap still applies.
0Kill switch. Blocks ALL spend immediately — first LLM call fails with COST_LIMIT. Useful when you want to disable an agent without deleting it.

How accumulated cost is computed

  • Per-run: real-time accumulation across the streaming loop. Estimated from prompt + completion tokens at the model’s per-1K rate, plus a flat per-API-call overhead.
  • Per-conversation: tracked in Redis under a key keyed off agent_conversations.id. Each run debits the counter on completion; the next run’s iteration-0 probe reads the counter and refuses if it’s already over the cap.
  • The counter is reset only by deleting the conversation. Re-using a conversation_id after the cap will keep failing until you provision a fresh one.

Hard platform ceilings (always enforced)

These are floor / ceiling values that no agent config can override:
LimitValue
AGENT_MAX_API_CALLS_PER_RUN25
AGENT_MAX_TOKENS_PER_CONVERSATION200_000
MAX_TOOL_ITERATIONS10
They exist to prevent runaway behaviour even when an agent’s per-run cap is set to a high value.

Observability

Every cost-cap fire emits a structured log line and an event on the agent’s run trace:
{
  "event": "agent.run.aborted",
  "reason": "cost_cap_per_conversation",
  "agent_id": "agent_abc123",
  "conversation_id": "conv_def456",
  "accumulated_cents": 514,
  "cap_cents": 500
}
Subscribe to the agent.run.aborted webhook event to alert on runs that hit the cap.
  • Tier-1 customer support agentmax_cost_per_run_cents=25, max_cost_per_conversation_cents=500.
  • Voice agent (longer turns, STT/TTS overhead)max_cost_per_run_cents=50, max_cost_per_conversation_cents=1000.
  • Internal QA / sandboxmax_cost_per_run_cents=10, max_cost_per_conversation_cents=50 to catch loops fast.

See also