LLM API¶
The LLM API is the public interface to ARX's frontier-model HA pair. Every call is routed through the platform's LLM router: policy evaluated, audit logged, and automatically failed over when a vendor is down, rate-limited, or suspends an account. Callers reference logical tiers (frontier, fast, cheap) rather than vendor-specific model ids.
All endpoints are scoped to the authenticated user's organization. Provider credentials are stored encrypted (Fernet) in llm_credentials and decrypted only at call time — no credential ever leaves ARX in plaintext.
Chat Completion¶
Executes a chat completion with automatic Claude ↔ OpenAI failover.
- Method:
POST - Path:
/v1/llm/chat - Required Role: Any authenticated user
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
messages |
Message[] |
Yes | Chat history. At least one message required. |
model_tier |
"frontier" \| "fast" \| "cheap" |
No | Tier → model-id resolution happens per-provider. Default: frontier. |
max_tokens |
int |
No | 1 – 200 000. Default: 4096. |
temperature |
float |
No | 0.0 – 2.0. Default: 0.7. |
tools |
Tool[] |
No | Tool definitions. JSON-Schema input_schema; the provider adapter wraps as needed. |
tool_choice |
string |
No | "auto", "any", "none", or a specific tool name. |
stop_sequences |
string[] |
No | Optional stop sequences. |
agent_id |
UUID |
No | The agent making the call. Drives per-agent policy evaluation. |
request_id |
string |
No | Caller-supplied correlation id; written to the audit row. |
metadata |
object |
No | Free-form metadata, forwarded to the policy engine as session_context. |
Message:
| Field | Type | Description |
|---|---|---|
role |
"system" \| "user" \| "assistant" \| "tool" |
Standard chat role. |
content |
string \| object[] |
Text or structured content blocks. |
tool_call_id |
string |
Required when role == "tool". |
name |
string |
Optional message name. |
Example Request¶
curl -X POST https://api.arxsec.io/v1/llm/chat \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "system", "content": "You are a terse SOC analyst."},
{"role": "user", "content": "Summarize the top 3 findings from last hour."}
],
"model_tier": "frontier",
"agent_id": "7a4d…"
}'
Example Response¶
{
"content": "1. …",
"tool_calls": [],
"finish_reason": "stop",
"usage": {
"prompt_tokens": 142,
"completion_tokens": 78,
"total_tokens": 220
},
"provider_used": "openai",
"model_used": "gpt-5"
}
provider_used is the vendor that actually served the response. If it differs from the configured primary, the request was failed over. The same information is persisted on the audit row along with failover_hops and an attempts[] array.
Failover Behavior¶
| Provider response | Action |
|---|---|
401 / 403 |
Failover to next provider (covers vendor-side account suspension). |
429 |
Failover (caller can't fix noisy-neighbor rate limiting). |
5xx, timeout, connection error |
Retry 3× with exponential backoff within the provider, then failover. |
400 invalid_request |
Re-raise as 400 to the caller — do not failover (deterministic). |
| Content-policy refusal | Re-raise as 400 — the secondary would refuse the same prompt. |
Status Codes¶
| Code | Meaning |
|---|---|
200 |
Completion returned. |
400 |
Deterministic caller error or malformed tool call. |
403 |
Policy engine denied the call (DENY verdict). |
502 |
Unexpected LLM error. |
503 |
All providers exhausted (all in the failover order failed or are circuit-broken). |
Credentials Management¶
Encrypted per-org API keys. Plaintext keys are never returned after creation — only a masked preview.
List Credentials¶
- Method:
GET - Path:
/v1/llm/credentials
Returns all credentials scoped to the caller's org, each with a masked key preview (sk-ant-…abcd).
Create Credential¶
- Method:
POST - Path:
/v1/llm/credentials
{
"provider": "anthropic",
"api_key": "sk-ant-…",
"label": "production"
}
Response echoes the stored credential with the key already masked. The plaintext key is encrypted with the platform Fernet key before reaching storage.
Revoke Credential¶
- Method:
DELETE - Path:
/v1/llm/credentials/{credential_id}
Soft-deletes (sets active = false). Existing in-flight requests that already loaded the key continue to work; subsequent lookups skip the row.
Failover Order¶
Each organization may override the platform-default failover order.
Get Order¶
- Method:
GET - Path:
/v1/llm/provider-order
{
"order": ["anthropic", "openai"],
"source": "org",
"default": ["anthropic", "openai"]
}
source is "org" when the org has set an explicit override, "default" when falling back to the platform configuration.
Set Order¶
- Method:
PUT - Path:
/v1/llm/provider-order
{ "order": ["openai", "anthropic"] }
Provider Health¶
Exposes recent probe results and failover events. Probes run every 60 s via Celery beat against a cheap model; failures feed the shared Redis circuit breaker.
- Method:
GET - Path:
/v1/llm/health?hours=24
Response¶
| Field | Type | Description |
|---|---|---|
latest |
ProviderHealthSnapshot[] |
Newest snapshot per provider. |
history |
ProviderHealthSnapshot[] |
All snapshots in the window (max 500). |
recent_failovers |
FailoverEvent[] |
Audit rows from the caller's org where failover_hops > 0. |
ProviderHealthSnapshot fields: provider, status (healthy | degraded | unhealthy), latency_ms, error, probe_kind, checked_at.
FailoverEvent fields: created_at, provider_used, failover_hops, primary_error, duration_ms.
Related¶
- Audit API — per-call rows with
provider_used,failover_hops, and the attempts chain. - Platform overview — Model Continuity — narrative description of the HA pair.