π§ Cerebro
Enterprise Knowledge Extraction & Distributed RAG Platform
| Language | Python / TypeScript (React) |
| LoC | ~139k |
| Repo | ~/master/cerebro |
Cerebro is a standalone, hermetic knowledge and repository intelligence platform. It natively combines deep static analysis (ASTs), local RAG workflows, and production-grade Cloud integration adapters (Azure, GCP).
ποΈ Architecture Designβ
Cerebro is built with a highly modular architecture that separates concerns between compute, storage, and interfaces.
The Triple-Interfaceβ
- CLI (
cerebro): Built with Typer. Centralizes system capabilities logically into commands likeknowledge,ops,rag,setup,metrics. Ideal for CI/CD and automation. - TUI (
ctui): A rich Textual-based terminal UI for deep-dives without leaving the shell. - Dashboard (
cdash): A React/Vite web application for visual management, analytics, and interactive chat workflows.
Core Capabilitiesβ
- RigorousRAGEngine: The core component (located in
src/cerebro/core/rag/) handles context retrieval, prompt assembly, and response generation with a focus on grounded, factual answers. - Intelligence & Analytics: Codebase analysis tools (
analyzer), reporting (briefing), and zero-token repository metrics.
π Pluggable Providersβ
The core intelligence engine uses a strict Factory pattern to dynamically load LLMs and Vector databases based on configuration.
Vector Storesβ
- PGVector (PostgreSQL) - Default for Enterprise scaling.
- ChromaDB - Used for local or ephemeral workloads.
- Elasticsearch - Used for Hybrid Reciprocal Rank Fusion (RRF) search.
LLM Backendsβ
- Google Gemini
- Anthropic Claude
- OpenAI (and compatible endpoints)
- Groq
- Llama.cpp (for local/offline execution)
π Deployment & DevOpsβ
Cerebro is "Cloud-Native" from the ground up:
- Kubernetes (K8s): Full deployment manifests for
Service,Deployment, andIngressin/kubernetes. Optimized for AKS (Azure Kubernetes Service). - Nix: Hermetic and reproducible environments via
flake.nix. - CI/CD Pipelines: Automated workflows available for GitHub Actions, GitLab CI, and Azure Pipelines.
- Automation Scripts: A vast collection of utilities in
/scriptsto manage billing, benchmark metrics, ETL documentation, and synchronize vector stores.
β‘ Quickstartβ
To interact with the project directly via its Nix shell:
cd ~/master/cerebro
nix develop
# Initial configuration wizard
cerebro setup
# Indexing the current repository
cerebro knowledge analyze . "General codebase review"
cerebro rag ingest ./data/analyzed/all_artifacts.jsonl
# Querying your local instance
cerebro rag query "Explain the architecture of the Core Services"