Rate limits & quotas
The API rate limiter, and how MarkDB backs off from provider quota errors.
There are two separate throttles to know about: MarkDB's own HTTP rate limit on the API, and how the background worker handles model-provider quota errors.
API rate limits
The Memory API (api.markdb.cloud/v1/*) is rate-limited
per tenant on a fixed one-minute window:
| Tier | Default |
|---|---|
| Personal | 60 requests/min (MARKDB_RATE_LIMIT_PERSONAL_PER_MINUTE) |
| Team | 600 requests/min (MARKDB_RATE_LIMIT_TEAM_PER_MINUTE) |
When you exceed the window you get 429 Too Many Requests with:
Retry-After-- seconds until the window resetsX-RateLimit-Limit-- your limitX-RateLimit-Remaining--0
What is NOT rate-limited
Only the /v1 data-plane API is limited. The LLM proxy
(proxy.markdb.cloud) and the MCP server (mcp.markdb.cloud) are not rate-
limited by MarkDB -- their throughput is governed by your upstream model
provider's own limits. The limiter also requires Redis; if Redis is down it fails
open (requests pass through).
Provider quotas & backoff
Enrichment and embeddings call your model provider, which enforces its own
per-minute and per-day quotas. When an upstream returns a quota/rate-limit error
(e.g. Gemini RESOURCE_EXHAUSTED, a spending cap, or a rate_limit_exceeded),
MarkDB classifies it and backs off instead of hammering the provider:
- Escalating retry. A quota-blocked enrichment job is requeued with a
growing delay -- roughly 5 min, then 30 min, then 2 h, then 6 h (or the
provider's own
retryDelayhint if it's longer). - Per-pass circuit breaker. If several jobs in one worker pass hit quota errors, the pass stops scheduling the rest and tries again on the next tick, rather than burning the whole batch against a capped provider.
- No data loss. Quota-blocked jobs are retried indefinitely; they are never dropped. Once quota frees up, they complete.
Search indexing behaves similarly, retrying quota-blocked pages on subsequent passes.
What you'll observe
Capture is never blocked by enrichment quota -- your agent's proxy requests still succeed and are mirrored. Under provider quota pressure, only summaries and the search index lag until quota recovers. If summaries stop appearing, check your provider key and its quota, and consider a higher-throughput enrichment model (see Models and Processing).