POST /v1/chat/completions
OpenAI-compatible chat completions, streaming and non-streaming.
OpenAI-compatible chat completions. If you have code that talks to
https://api.openai.com/v1/chat/completions, point the base URL at
https://api.olava.dev/v1 and your existing client should work.
POST /v1/chat/completions
Auth
Authorization: Bearer olv_sk_... — see
Authentication.
Request body
The body shape mirrors OpenAI's /v1/chat/completions. Fields we care
about:
| Field | Required | Notes |
|---|---|---|
model | no | Defaults to olava-extract. |
messages | yes (or prompt) | Standard OpenAI message list. |
prompt | yes (or messages) | Plain prompt; used by some legacy clients. |
stream | no | When true, returns Server-Sent Events. |
max_tokens | no | Clamped to 4,096. Higher values, missing values, zero, or non-integer values are treated as 4,096. |
Standard OpenAI params (temperature, top_p, stop, etc.) | no | Honored. |
Size limits
- Input above 32,768 tokens rejects with
413 input_too_large. - Request body above 32 MB rejects with plain
413.
Non-streaming response
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "olava-extract",
"choices": [
{"message": {"role": "assistant", "content": "Hi!"}, "finish_reason": "stop"}
],
"usage": {"prompt_tokens": 7, "completion_tokens": 3, "total_tokens": 10}
}
usage is always populated.
Streaming response
Set stream: true to receive Server-Sent Events. A final usage chunk
arrives before [DONE]:
HTTP/1.1 200 OK
Content-Type: text/event-stream
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hi"}}]}
data: {"id":"chatcmpl-...","choices":[],"usage":{"prompt_tokens":7,"completion_tokens":3,"total_tokens":10}}
data: [DONE]
Mid-stream cap abort
If your spend cap is reached mid-stream, the stream emits an error event
and closes:
event: error
data: {"code":"spend_cap_exceeded","message":"Spend cap reached during this request. Raise your cap to continue.","docs_url":"https://docs.olava.dev/billing/spend_cap_exceeded"}
Chunks delivered before the abort are billable. See Spend cap.
Service errors mid-stream
If the service hits an internal error after streaming has started:
data: {"error":"service_unavailable","status":500}
data: [DONE]
In that case you are not billed for the call — only fully-completed calls produce a usage record.
Status codes
| Status | Code | Notes |
|---|---|---|
| 200 | — | Success. |
| 400 | invalid_json_body, body_must_be_object | Malformed request body. |
| 401 | (various) | See Authentication. |
| 402 | onboarding_incomplete, spend_cap_exceeded, payment_failed, subscription_canceled | See Errors. |
| 413 | input_too_large | Input token count above 32,768. |
| 413 | — | Body above 32 MB. |
| 429 | (various) | See Rate limits. |
| 503 | — | Service temporarily unavailable. Honor Retry-After and try again. |
Rate limit
Per-key request rate is the limit to plan against — default approximately
10 RPS per key, configurable per key by support. Honor Retry-After on
429. See Rate limits.
Billing guarantees
- You are billed only for fully-completed responses. If the call errors before any tokens are produced, or fails mid-stream due to a service issue, that call is not billable.
- A reservation against your spend cap is taken before the call runs and the difference is refunded when the call finishes. You will never be billed past your monthly spend cap within rounding precision.