Olava
API referenceChat

POST /v1/chat/completions

OpenAI-compatible chat completions, streaming and non-streaming.

OpenAI-compatible chat completions. If you have code that talks to https://api.openai.com/v1/chat/completions, point the base URL at https://api.olava.dev/v1 and your existing client should work.

POST /v1/chat/completions

Auth

Authorization: Bearer olv_sk_... — see Authentication.

Request body

The body shape mirrors OpenAI's /v1/chat/completions. Fields we care about:

FieldRequiredNotes
modelnoDefaults to olava-extract.
messagesyes (or prompt)Standard OpenAI message list.
promptyes (or messages)Plain prompt; used by some legacy clients.
streamnoWhen true, returns Server-Sent Events.
max_tokensnoClamped to 4,096. Higher values, missing values, zero, or non-integer values are treated as 4,096.
Standard OpenAI params (temperature, top_p, stop, etc.)noHonored.

Size limits

  • Input above 32,768 tokens rejects with 413 input_too_large.
  • Request body above 32 MB rejects with plain 413.

Non-streaming response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "olava-extract",
  "choices": [
    {"message": {"role": "assistant", "content": "Hi!"}, "finish_reason": "stop"}
  ],
  "usage": {"prompt_tokens": 7, "completion_tokens": 3, "total_tokens": 10}
}

usage is always populated.

Streaming response

Set stream: true to receive Server-Sent Events. A final usage chunk arrives before [DONE]:

HTTP/1.1 200 OK
Content-Type: text/event-stream

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hi"}}]}

data: {"id":"chatcmpl-...","choices":[],"usage":{"prompt_tokens":7,"completion_tokens":3,"total_tokens":10}}

data: [DONE]

Mid-stream cap abort

If your spend cap is reached mid-stream, the stream emits an error event and closes:

event: error
data: {"code":"spend_cap_exceeded","message":"Spend cap reached during this request. Raise your cap to continue.","docs_url":"https://docs.olava.dev/billing/spend_cap_exceeded"}

Chunks delivered before the abort are billable. See Spend cap.

Service errors mid-stream

If the service hits an internal error after streaming has started:

data: {"error":"service_unavailable","status":500}
data: [DONE]

In that case you are not billed for the call — only fully-completed calls produce a usage record.

Status codes

StatusCodeNotes
200Success.
400invalid_json_body, body_must_be_objectMalformed request body.
401(various)See Authentication.
402onboarding_incomplete, spend_cap_exceeded, payment_failed, subscription_canceledSee Errors.
413input_too_largeInput token count above 32,768.
413Body above 32 MB.
429(various)See Rate limits.
503Service temporarily unavailable. Honor Retry-After and try again.

Rate limit

Per-key request rate is the limit to plan against — default approximately 10 RPS per key, configurable per key by support. Honor Retry-After on 429. See Rate limits.

Billing guarantees

  • You are billed only for fully-completed responses. If the call errors before any tokens are produced, or fails mid-stream due to a service issue, that call is not billable.
  • A reservation against your spend cap is taken before the call runs and the difference is refunded when the call finishes. You will never be billed past your monthly spend cap within rounding precision.

On this page