CortexFlow-AI connects a single AI agent to all your messaging platforms — with smarter 3-tier memory, task-aware LLM routing, token-by-token streaming replies, and voice that actually works. Self-hosted. Privacy-first. No cloud required.
Built from the ground up to be smarter, more private, and more extensible than the alternatives.
Telegram, Discord, Slack, WhatsApp, Email, SMS, Matrix, IRC, Signal, Webhook, Mastodon, Teams, Mattermost, Nextcloud — one agent, every platform.
Redis short-term context, Qdrant semantic search, SQLite long-term persistence — shared across every channel, auto-pruned by importance.
Claude for deep reasoning, Gemini Flash for speed, DeepSeek for code, GPT-4, or Ollama for fully offline/private — with automatic fallback chains.
Local Whisper STT, ElevenLabs/Kokoro/system TTS, open-source wake-word detection, and full voice-note round trips on Telegram and Discord.
Every response is quality-scored before it reaches you — low-quality answers are automatically regenerated with corrective guidance.
A dependency-free cortexflow-sdk package for building tools, channel adapters, and plugins — sandboxed, typed, pip-installable.
Token-by-token streaming for all 5 LLM providers — Claude, Gemini, DeepSeek, GPT-4, and Ollama each stream natively, no chunking-after-the-fact.
A real Windows/macOS/Linux app — system tray, global hotkey, auto-start on login, and native notifications. Same dashboard, wrapped in Tauri.
The same dashboard you'd run in a browser, wrapped as a real desktop app — system tray, a global hotkey, auto-start on login, and native notifications.
Every channel — Telegram, Discord, Slack, or the REST API — flows through the same gateway pipeline.
A channel adapter (Telegram, Discord, Slack, …) receives an inbound message and normalizes it.
The session pipeline pulls recent context from Redis, relevant facts from Qdrant, and history from SQLite.
The task-aware router picks Claude, Gemini, DeepSeek, GPT-4, or local Ollama based on complexity and privacy mode.
The response is quality-scored before delivery; low-quality answers are regenerated with corrective guidance.
The reply streams back token-by-token through the originating channel as it's generated, and the exchange is written back into memory.
Same agent, same memory, three ways in: a chat channel, the REST API, or your own plugin.
// Connect to the gateway and send a chat message — replies stream // in token-by-token, not as one final blob const ws = new WebSocket("ws://127.0.0.1:7432/ws"); ws.onopen = () => { ws.send(JSON.stringify({ type: "message", id: "msg-1", text: "Summarize my last 3 conversations" })); }; ws.onmessage = (event) => { const frame = JSON.parse(event.data); // frame.type: "hello" | "message_chunk" | "message_done" | "error" if (frame.type === "message_chunk") process.stdout.write(frame.delta); if (frame.type === "message_done") console.log("\n[done]", frame.text); };
# Search memory across all three tiers curl -s "http://127.0.0.1:7432/api/v1/memory/search?q=portfolio&limit=5" | jq # Check gateway + channel status curl -s http://127.0.0.1:7432/api/v1/status | jq curl -s http://127.0.0.1:7432/api/v1/channels | jq
from cortexflow_sdk import Tool, ToolResult class WeatherTool(Tool): name = "get_weather" description = "Look up current weather for a city." async def run(self, city: str) -> ToolResult: try: data = await self._fetch(city) return ToolResult.ok(data) except Exception as exc: return ToolResult.error(str(exc)) # pip install cortexflow-sdk
OpenClaw popularized the personal-AI-gateway idea. CortexFlow-AI goes further on the dimensions that matter most.
| Dimension | OpenClaw | CortexFlow-AI |
|---|---|---|
| Memory | LanceDB only (flat vector) | Redis + Qdrant + SQLite (3-tier) |
| LLM Routing | Manual model config | Auto task-aware routing + fallback chains |
| Voice | macOS/iOS wake-word only | Cross-platform STT/TTS + open-source wake word |
| Web UI | Static WebChat widget | Full dashboard: memory explorer, history, metrics |
| Configuration | Complex YAML (~50 keys) | Simple TOML, works in 3 lines |
| Observability | Stdout logs only | Structured JSON logs + Prometheus metrics |
| Plugin Security | In-process, no sandboxing | Subprocess-sandboxed, typed SDK |
Local-first by design — runs as one daemon on your own machine or server, no required cloud dependency.
The gateway itself, the plugin SDK, and three example plugins are all real, installable packages.
The full gateway — multi-channel daemon, 3-tier memory, model routing, voice, and the cortex CLI. Business Source License 1.1 (free for non-production use).
pip install cortexflow-ai
View on PyPI →
Typed Plugin, Tool, and ChannelAdapter base classes — zero gateway dependencies.
pip install cortexflow-sdk
View on PyPI →
Lists recent GitHub repository events (pushes, PRs, issues) via the public REST API.
pip install cortexflow-github
View on PyPI →
Searches Notion pages and databases shared with your integration.
pip install cortexflow-notion
View on PyPI →
Lists upcoming Google Calendar events for the connected account.
pip install cortexflow-google-calendar
View on PyPI →
No SDK required for the gateway itself — every route is plain JSON over HTTP and one WebSocket connection.
Every route above — request/response shapes, auth (there is none; it's a local single-user daemon), and WebSocket frame types — is documented with real examples cross-checked against the gateway source.
Read the API docs →Works with three lines of config: a model and one channel token.
# 1. Install — published on PyPI pip install cortexflow-ai # 2. Guided setup wizard — model + channel + voice test cortex init # 3. Start the gateway daemon cortex start --background # 4. Or just talk to it right now from the terminal cortex chat
# No Python setup needed — pull the public image
docker pull ghcr.io/theamitchandra/cortexflow-ai:latest
docker run -d --name cortexflow-ai -p 7432:7432 \
-e ANTHROPIC_API_KEY=sk-ant-... \
-v cortexflow-data:/root/.cortexflow \
ghcr.io/theamitchandra/cortexflow-ai:latest
curl http://localhost:7432/health