Rate limits

The chat completions endpoint is rate-limited per-key. When you trip a limit, the response is 429 Too Many Requests with a Retry-After header (seconds). Always honor Retry-After — it is the authoritative back-off signal.

{
  "error": {
    "type": "rate_limited",
    "code": "api_key_rate_limited",
    "message": "API key request limit exceeded. Retry after the window rolls.",
    "docs_url": "https://docs.olava.dev/api/rate_limits"
  }
}

Ceiling

Default per-key ceiling is approximately 10 requests per second, smoothed over a rolling window. If you need a higher ceiling for legitimate batch workloads, contact support — limits are configurable per key.

How to back off

Read Retry-After. It's an integer number of seconds.
Sleep that long. Do not retry sooner — the window is sliding, and an early retry just resets your own counter.
If you keep hitting 429s despite honoring Retry-After, you are genuinely above the steady-state ceiling. Either spread traffic across multiple keys (one per environment) or contact support to raise the limit on your key.

Multiple keys

You can mint multiple API keys per account, and the rate limit is per-key. Separating production / staging / dev across keys also isolates their limits, which is the recommended pattern. Don't try to game the limit by rotating through many keys for a single workload — abusive rotation patterns trip account-level controls.

Rate limits

Ceiling

How to back off

Multiple keys

On this page