Architecture
One daemon, four cooperating systems. Local-first by design — runs on your own machine or server, no cloud dependency required.
Gateway
A FastAPI app exposing a WebSocket endpoint (/ws) for real-time chat and a
REST API (/api/v1/...) for status, session, channel, memory, and metrics
operations — see the REST API page. The same process also hosts
the channel adapters, so a single cortex start brings the whole system up.
Model Router
Every request is classified into a task type, and each task type maps to an ordered fallback chain — if the first model fails or is unavailable, the router automatically tries the next:
| Task type | Fallback chain (in order) |
|---|---|
complex_reasoning | Claude Opus → GPT-4o → Gemini Pro → Ollama |
code_generation | DeepSeek Coder → Claude Sonnet → GPT-4o → Gemini Flash |
code_review | DeepSeek Coder → GPT-4o → Gemini Flash → Ollama |
task_decomposition | Claude Sonnet → GPT-4o → Gemini Pro → Ollama |
summarization / intent_extraction / reflection / validation | Gemini Flash → GPT-4o mini → Ollama |
cheap_inference | Ollama → GPT-4o mini → Gemini Flash |
general (default) | Gemini Flash → GPT-4o mini → Ollama |
Setting local = "ollama/..." as the primary model in
configuration enables full privacy mode — zero calls
ever leave your machine.
Memory Pipeline
Three tiers, queried in order, merged into one ranked context for the LLM prompt:
- Short-term (Redis) — recent conversation turns, TTL-expired automatically
- Semantic (Qdrant) — vector similarity search over past conversations
- Long-term (SQLite) — durable storage with importance scoring, auto-pruning, tagging, and cross-session sharing
Reflection Engine
After the model router returns a response, the reflection engine scores it 0–100 on relevance, completeness, accuracy, and tone using a cheap model. Responses below the configured threshold are regenerated once with corrective guidance before being sent.
Plugin System
Plugins are discovered via Python entry points and run in the same process as typed
Plugin subclasses contributing tools, channel adapters, or lifecycle hooks —
see Plugins & SDK.
Observability
Structured JSON logs (or human-readable via rich in a TTY) and Prometheus
metrics are exposed at GET /api/v1/metrics, with a JSON snapshot at
GET /api/v1/metrics/snapshot for the web UI.