Skip to main content

🧠 Cerebro

Enterprise Knowledge Extraction & Distributed RAG Platform

LanguagePython / TypeScript (React)
LoC~139k
Repo~/master/cerebro

Cerebro is a standalone, hermetic knowledge and repository intelligence platform. It natively combines deep static analysis (ASTs), local RAG workflows, and production-grade Cloud integration adapters (Azure, GCP).

πŸ—οΈ Architecture Design​

Cerebro is built with a highly modular architecture that separates concerns between compute, storage, and interfaces.

The Triple-Interface​

  1. CLI (cerebro): Built with Typer. Centralizes system capabilities logically into commands like knowledge, ops, rag, setup, metrics. Ideal for CI/CD and automation.
  2. TUI (ctui): A rich Textual-based terminal UI for deep-dives without leaving the shell.
  3. Dashboard (cdash): A React/Vite web application for visual management, analytics, and interactive chat workflows.

Core Capabilities​

  • RigorousRAGEngine: The core component (located in src/cerebro/core/rag/) handles context retrieval, prompt assembly, and response generation with a focus on grounded, factual answers.
  • Intelligence & Analytics: Codebase analysis tools (analyzer), reporting (briefing), and zero-token repository metrics.

πŸ”Œ Pluggable Providers​

The core intelligence engine uses a strict Factory pattern to dynamically load LLMs and Vector databases based on configuration.

Vector Stores​

  • PGVector (PostgreSQL) - Default for Enterprise scaling.
  • ChromaDB - Used for local or ephemeral workloads.
  • Elasticsearch - Used for Hybrid Reciprocal Rank Fusion (RRF) search.

LLM Backends​

  • Google Gemini
  • Anthropic Claude
  • OpenAI (and compatible endpoints)
  • Groq
  • Llama.cpp (for local/offline execution)

πŸš€ Deployment & DevOps​

Cerebro is "Cloud-Native" from the ground up:

  • Kubernetes (K8s): Full deployment manifests for Service, Deployment, and Ingress in /kubernetes. Optimized for AKS (Azure Kubernetes Service).
  • Nix: Hermetic and reproducible environments via flake.nix.
  • CI/CD Pipelines: Automated workflows available for GitHub Actions, GitLab CI, and Azure Pipelines.
  • Automation Scripts: A vast collection of utilities in /scripts to manage billing, benchmark metrics, ETL documentation, and synchronize vector stores.

⚑ Quickstart​

To interact with the project directly via its Nix shell:

cd ~/master/cerebro
nix develop

# Initial configuration wizard
cerebro setup

# Indexing the current repository
cerebro knowledge analyze . "General codebase review"
cerebro rag ingest ./data/analyzed/all_artifacts.jsonl

# Querying your local instance
cerebro rag query "Explain the architecture of the Core Services"