POST /v1/chat/completions

OpenAI-compatible chat completions. If you have code that talks to https://api.openai.com/v1/chat/completions, point the base URL at https://api.olava.dev/v1 and your existing client should work.

POST /v1/chat/completions

Auth

Authorization: Bearer olv_sk_... — see Authentication.

Request body

The body shape mirrors OpenAI's /v1/chat/completions. Fields we care about:

Field	Required	Notes
`model`	no	Defaults to `olava-extract`.
`messages`	yes (or `prompt`)	Standard OpenAI message list.
`prompt`	yes (or `messages`)	Plain prompt; used by some legacy clients.
`stream`	no	When `true`, returns Server-Sent Events.
`max_tokens`	no	Clamped to 4,096. Higher values, missing values, zero, or non-integer values are treated as 4,096.
Standard OpenAI params (`temperature`, `top_p`, `stop`, etc.)	no	Honored.

Size limits

Input above 32,768 tokens rejects with 413 input_too_large.
Request body above 32 MB rejects with plain 413.

Non-streaming response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "olava-extract",
  "choices": [
    {"message": {"role": "assistant", "content": "Hi!"}, "finish_reason": "stop"}
  ],
  "usage": {"prompt_tokens": 7, "completion_tokens": 3, "total_tokens": 10}
}

usage is always populated.

Streaming response

Set stream: true to receive Server-Sent Events. A final usage chunk arrives before [DONE]:

HTTP/1.1 200 OK
Content-Type: text/event-stream

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hi"}}]}

data: {"id":"chatcmpl-...","choices":[],"usage":{"prompt_tokens":7,"completion_tokens":3,"total_tokens":10}}

data: [DONE]

Mid-stream cap abort

If your spend cap is reached mid-stream, the stream emits an error event and closes:

event: error
data: {"code":"spend_cap_exceeded","message":"Spend cap reached during this request. Raise your cap to continue.","docs_url":"https://docs.olava.dev/billing/spend_cap_exceeded"}

Chunks delivered before the abort are billable. See Spend cap.

Service errors mid-stream

If the service hits an internal error after streaming has started:

data: {"error":"service_unavailable","status":500}
data: [DONE]

In that case you are not billed for the call — only fully-completed calls produce a usage record.

Status codes

Status	Code	Notes
200	—	Success.
400	`invalid_json_body`, `body_must_be_object`	Malformed request body.
401	(various)	See Authentication.
402	`onboarding_incomplete`, `spend_cap_exceeded`, `payment_failed`, `subscription_canceled`	See Errors.
413	`input_too_large`	Input token count above 32,768.
413	—	Body above 32 MB.
429	(various)	See Rate limits.
503	—	Service temporarily unavailable. Honor `Retry-After` and try again.

Rate limit

Per-key request rate is the limit to plan against — default approximately 10 RPS per key, configurable per key by support. Honor Retry-After on 429. See Rate limits.

Billing guarantees

You are billed only for fully-completed responses. If the call errors before any tokens are produced, or fails mid-stream due to a service issue, that call is not billable.
A reservation against your spend cap is taken before the call runs and the difference is refunded when the call finishes. You will never be billed past your monthly spend cap within rounding precision.