Models & routing
How MarkDB picks a provider from a model name, and the models it uses for enrichment and embeddings.
Models show up in MarkDB in two places: the model your client asks the proxy for, and the models MarkDB uses internally to enrich and embed your memory.
How routing works
The proxy chooses the upstream provider from the model name, matching a set of prefix rules in order (first match wins, case-insensitive):
| Model prefix | Provider | Upstream |
|---|---|---|
claude- | anthropic | api.anthropic.com |
gpt-, chatgpt-, o1, o3, o4 | openai | api.openai.com |
gemini- | google | Gemini (OpenAI-compat endpoint) |
grok- | xai | api.x.ai |
The provider name is also the credential bucket: it's the key MarkDB looks up
in your LLM keys to call that
upstream. For claude-*, gpt-*/o*, and gemini-*, MarkDB dispatches through
the native provider SDK so the Chat Completions, Responses, and Messages surfaces
all work regardless of which provider actually serves the model.
Any matching model routes
You are not limited to the curated list below. Any model name that matches a
prefix rule routes to that provider -- send claude-haiku-4-5, gpt-4o,
gemini-2.5-pro, etc. Operators can replace the routing table entirely with
MARKDB_PROXY_ROUTING_RULES (inline JSON) or MARKDB_PROXY_ROUTING_RULES_FILE.
The curated catalog
The dashboard exposes a curated catalog for choosing your enrichment and embedding models under Settings -> Processing. These are the entries validated for internal use; the proxy itself will still route any prefix-matching model.
Chat models
| Model | Provider | Notes |
|---|---|---|
claude-haiku-4-5 | Anthropic | Cheap, fast. Default for Anthropic. |
claude-sonnet-4-6 | Anthropic | Balanced; higher-quality summaries. |
claude-opus-4-7 | Anthropic | Highest quality, slowest. |
gpt-4o-mini | OpenAI | Cheap, fast. Default for OpenAI. |
gpt-4o | OpenAI | Balanced quality + cost. |
gpt-5 | OpenAI | Frontier model. |
o3-mini | OpenAI | Reasoning-tuned; slower. |
gemini-3.1-flash-lite | Cheapest, fastest. Default enrichment model. | |
gemini-2.5-flash | Balanced; legacy default. | |
gemini-2.5-flash-lite | Legacy budget tier. | |
gemini-2.5-pro | Higher quality, more expensive. |
Embedding models
| Model | Provider | Dimensions |
|---|---|---|
gemini-embedding-001 | 768 / 1536 / 3072 (default) | |
text-embedding-3-large | OpenAI | 256 / 512 / 1024 / 2048 / 3072 |
text-embedding-3-small | OpenAI | 256 / 512 / 1024 / 1536 |
Embedding dimensions use a Matryoshka grid -- a single model can emit several sizes, and MarkDB indexes at one fixed size (the default is 3072).
Defaults
When a tenant hasn't configured anything, MarkDB falls back to:
| Role | Model | Provider |
|---|---|---|
| Enrichment (summaries) | gemini-3.1-flash-lite | |
| Embeddings (search index) | gemini-embedding-001 (3072-d) |
Both default to Google so a tenant only needs one provider key to run the whole
pipeline. gemini-3.1-flash-lite is the enrichment default because it's cheaper
than gemini-2.5-flash, has a much higher requests-per-day ceiling, and scores
better on benchmarks despite the "Lite" name.
Keep the embedder consistent
Query-time embeddings must use the same model and dimension as the indexed pages, or vector search can't compare them. If you change the embedding model, the search index is rebuilt against the new signature. See Hybrid search.
Configuring per tenant
Set your enrichment and embedding models -- and toggle processing on or off -- under Settings -> Processing. Add the matching provider credential under Settings -> LLM keys so MarkDB can call it.