Models & routing

How MarkDB picks a provider from a model name, and the models it uses for enrichment and embeddings.

Models show up in MarkDB in two places: the model your client asks the proxy for, and the models MarkDB uses internally to enrich and embed your memory.

How routing works

The proxy chooses the upstream provider from the model name, matching a set of prefix rules in order (first match wins, case-insensitive):

Model prefix	Provider	Upstream
`claude-`	`anthropic`	`api.anthropic.com`
`gpt-`, `chatgpt-`, `o1`, `o3`, `o4`	`openai`	`api.openai.com`
`gemini-`	`google`	Gemini (OpenAI-compat endpoint)
`grok-`	`xai`	`api.x.ai`

The provider name is also the credential bucket: it's the key MarkDB looks up in your LLM keys to call that upstream. For claude-*, gpt-*/o*, and gemini-*, MarkDB dispatches through the native provider SDK so the Chat Completions, Responses, and Messages surfaces all work regardless of which provider actually serves the model.

Any matching model routes

You are not limited to the curated list below. Any model name that matches a prefix rule routes to that provider -- send claude-haiku-4-5, gpt-4o, gemini-2.5-pro, etc. Operators can replace the routing table entirely with MARKDB_PROXY_ROUTING_RULES (inline JSON) or MARKDB_PROXY_ROUTING_RULES_FILE.

The curated catalog

The dashboard exposes a curated catalog for choosing your enrichment and embedding models under Settings -> Processing. These are the entries validated for internal use; the proxy itself will still route any prefix-matching model.

Chat models

Model	Provider	Notes
`claude-haiku-4-5`	Anthropic	Cheap, fast. Default for Anthropic.
`claude-sonnet-4-6`	Anthropic	Balanced; higher-quality summaries.
`claude-opus-4-7`	Anthropic	Highest quality, slowest.
`gpt-4o-mini`	OpenAI	Cheap, fast. Default for OpenAI.
`gpt-4o`	OpenAI	Balanced quality + cost.
`gpt-5`	OpenAI	Frontier model.
`o3-mini`	OpenAI	Reasoning-tuned; slower.
`gemini-3.1-flash-lite`	Google	Cheapest, fastest. Default enrichment model.
`gemini-2.5-flash`	Google	Balanced; legacy default.
`gemini-2.5-flash-lite`	Google	Legacy budget tier.
`gemini-2.5-pro`	Google	Higher quality, more expensive.

Embedding models

Model	Provider	Dimensions
`gemini-embedding-001`	Google	768 / 1536 / 3072 (default)
`text-embedding-3-large`	OpenAI	256 / 512 / 1024 / 2048 / 3072
`text-embedding-3-small`	OpenAI	256 / 512 / 1024 / 1536

Embedding dimensions use a Matryoshka grid -- a single model can emit several sizes, and MarkDB indexes at one fixed size (the default is 3072).

Defaults

When a tenant hasn't configured anything, MarkDB falls back to:

Role	Model	Provider
Enrichment (summaries)	`gemini-3.1-flash-lite`	Google
Embeddings (search index)	`gemini-embedding-001` (3072-d)	Google

Both default to Google so a tenant only needs one provider key to run the whole pipeline. gemini-3.1-flash-lite is the enrichment default because it's cheaper than gemini-2.5-flash, has a much higher requests-per-day ceiling, and scores better on benchmarks despite the "Lite" name.

Keep the embedder consistent

Query-time embeddings must use the same model and dimension as the indexed pages, or vector search can't compare them. If you change the embedding model, the search index is rebuilt against the new signature. See Hybrid search.

Configuring per tenant

Set your enrichment and embedding models -- and toggle processing on or off -- under Settings -> Processing. Add the matching provider credential under Settings -> LLM keys so MarkDB can call it.