Rate limits
Per-key rate limits and how to back off correctly.
The chat completions endpoint is rate-limited per-key. When you trip a
limit, the response is 429 Too Many Requests with a Retry-After header
(seconds). Always honor Retry-After — it is the authoritative
back-off signal.
{
"error": {
"type": "rate_limited",
"code": "api_key_rate_limited",
"message": "API key request limit exceeded. Retry after the window rolls.",
"docs_url": "https://docs.olava.dev/api/rate_limits"
}
}
Ceiling
Default per-key ceiling is approximately 10 requests per second, smoothed over a rolling window. If you need a higher ceiling for legitimate batch workloads, contact support — limits are configurable per key.
How to back off
- Read
Retry-After. It's an integer number of seconds. - Sleep that long. Do not retry sooner — the window is sliding, and an early retry just resets your own counter.
- If you keep hitting 429s despite honoring
Retry-After, you are genuinely above the steady-state ceiling. Either spread traffic across multiple keys (one per environment) or contact support to raise the limit on your key.
Multiple keys
You can mint multiple API keys per account, and the rate limit is per-key. Separating production / staging / dev across keys also isolates their limits, which is the recommended pattern. Don't try to game the limit by rotating through many keys for a single workload — abusive rotation patterns trip account-level controls.