MarkDB

Models & routing

How MarkDB picks a provider from a model name, and the models it uses for enrichment and embeddings.

Models show up in MarkDB in two places: the model your client asks the proxy for, and the models MarkDB uses internally to enrich and embed your memory.

How routing works

The proxy chooses the upstream provider from the model name, matching a set of prefix rules in order (first match wins, case-insensitive):

Model prefixProviderUpstream
claude-anthropicapi.anthropic.com
gpt-, chatgpt-, o1, o3, o4openaiapi.openai.com
gemini-googleGemini (OpenAI-compat endpoint)
grok-xaiapi.x.ai

The provider name is also the credential bucket: it's the key MarkDB looks up in your LLM keys to call that upstream. For claude-*, gpt-*/o*, and gemini-*, MarkDB dispatches through the native provider SDK so the Chat Completions, Responses, and Messages surfaces all work regardless of which provider actually serves the model.

Any matching model routes

You are not limited to the curated list below. Any model name that matches a prefix rule routes to that provider -- send claude-haiku-4-5, gpt-4o, gemini-2.5-pro, etc. Operators can replace the routing table entirely with MARKDB_PROXY_ROUTING_RULES (inline JSON) or MARKDB_PROXY_ROUTING_RULES_FILE.

The curated catalog

The dashboard exposes a curated catalog for choosing your enrichment and embedding models under Settings -> Processing. These are the entries validated for internal use; the proxy itself will still route any prefix-matching model.

Chat models

ModelProviderNotes
claude-haiku-4-5AnthropicCheap, fast. Default for Anthropic.
claude-sonnet-4-6AnthropicBalanced; higher-quality summaries.
claude-opus-4-7AnthropicHighest quality, slowest.
gpt-4o-miniOpenAICheap, fast. Default for OpenAI.
gpt-4oOpenAIBalanced quality + cost.
gpt-5OpenAIFrontier model.
o3-miniOpenAIReasoning-tuned; slower.
gemini-3.1-flash-liteGoogleCheapest, fastest. Default enrichment model.
gemini-2.5-flashGoogleBalanced; legacy default.
gemini-2.5-flash-liteGoogleLegacy budget tier.
gemini-2.5-proGoogleHigher quality, more expensive.

Embedding models

ModelProviderDimensions
gemini-embedding-001Google768 / 1536 / 3072 (default)
text-embedding-3-largeOpenAI256 / 512 / 1024 / 2048 / 3072
text-embedding-3-smallOpenAI256 / 512 / 1024 / 1536

Embedding dimensions use a Matryoshka grid -- a single model can emit several sizes, and MarkDB indexes at one fixed size (the default is 3072).

Defaults

When a tenant hasn't configured anything, MarkDB falls back to:

RoleModelProvider
Enrichment (summaries)gemini-3.1-flash-liteGoogle
Embeddings (search index)gemini-embedding-001 (3072-d)Google

Both default to Google so a tenant only needs one provider key to run the whole pipeline. gemini-3.1-flash-lite is the enrichment default because it's cheaper than gemini-2.5-flash, has a much higher requests-per-day ceiling, and scores better on benchmarks despite the "Lite" name.

Keep the embedder consistent

Query-time embeddings must use the same model and dimension as the indexed pages, or vector search can't compare them. If you change the embedding model, the search index is rebuilt against the new signature. See Hybrid search.

Configuring per tenant

Set your enrichment and embedding models -- and toggle processing on or off -- under Settings -> Processing. Add the matching provider credential under Settings -> LLM keys so MarkDB can call it.