SPECTRE Framework - Architecture Decision Records
Document Type: Technical ADRs with Trade-Off Analysis Purpose: Support informed decision-making for SPECTRE integration strategy Audience: System Architect (kernelcore) Status: Draft for Review Last Updated: 2026-01-09
Table of Contentsβ
- Executive Summary
- ADR-001: Phase 1 Project Integration Priority
- ADR-002: Service Integration Pattern
- ADR-003: Python Service Deployment Strategy
- ADR-004: Secrets Management Architecture
- ADR-005: LLM Request Routing Strategy
- ADR-006: Observability Architecture
- ADR-007: Event Schema Versioning Strategy
- Project Prioritization Framework
- Integration Scenarios
- Risk Analysis
- References
Executive Summaryβ
This document provides rigorous trade-off analysis for critical architectural decisions in the SPECTRE framework integration. Each ADR presents 3-4 viable alternatives with quantitative and qualitative evaluation across 5 dimensions:
- Performance (latency, throughput, resource usage)
- Maintainability (code complexity, debugging, evolution)
- Complexity (implementation effort, learning curve)
- Cost (development time, operational overhead, cloud spend)
- Risk (technical debt, failure modes, rollback difficulty)
Key Principle: This document presents options, not prescriptions. Final decisions rest with the architect based on business priorities and constraints.
ADR-001: Phase 1 Project Integration Priorityβ
Contextβ
Phase 1 (Security Infrastructure) requires choosing 1-2 projects to integrate first. This decision establishes patterns for subsequent integrations and validates the event-driven architecture.
Constraints:
- Must complete within 2 weeks (Phase 1 timeline)
- Should demonstrate value quickly (proof of concept)
- Should validate core SPECTRE capabilities (event bus, proxy, secrets)
Available Projects (Ranked by Maturity)β
| Project | Maturity | Tech Stack | Integration Complexity | Value |
|---|---|---|---|---|
| securellm-bridge | Production β | Rust (Axum, TLS, audit) | Medium | High |
| cognitive-vault | Production β | Rust+Go (FFI, crypto) | Low | Medium |
| ml-offload-api | Alpha π§ | Rust (Axum, NVIDIA) | Medium | High |
| ai-agent-os | Alpha π§ | Rust (6 crates, Hyprland) | High | Medium |
| intelagent | Alpha π§ | Rust (complex orchestration) | Very High | Very High |
| ragtex | Beta π§ | Python (LangChain, Vertex) | Medium | High |
| arch-analyzer | Beta π§ | Python (async, caching) | Low | Low |
Alternative 1: Start with cognitive-vault (Conservative)β
Approach: Extract crypto primitives from cognitive-vault into spectre-secrets
Pros:
- β Lowest complexity: Crypto library extraction is straightforward
- β No external dependencies: Self-contained Rust crate
- β Foundation for Phase 1: Secrets management is prerequisite for proxy auth
- β Low risk: Well-tested crypto stack (AES-256-GCM, Argon2id)
Cons:
- β Low immediate value: Secrets alone don't demo end-to-end flows
- β No event bus validation: Doesn't stress NATS architecture
- β Limited observability: Secrets operations are infrequent
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 9/10 | In-memory crypto, no network I/O |
| Maintainability | 9/10 | Small, focused crate |
| Complexity | 10/10 | Direct code extraction |
| Cost | 10/10 | 2-3 days development |
| Risk | 10/10 | Proven crypto, no new patterns |
Estimated Effort: 2-3 days Risk Level: Very Low Demo Value: Low (internal component)
Alternative 2: Start with securellm-bridge (Balanced)β
Approach: Wrap securellm-bridge HTTP endpoints to publish NATS events, keep existing API
Pros:
- β High demo value: End-to-end LLM request β response flow
- β Validates event bus: Real-world request/reply pattern
- β Production-ready code: Mature codebase with TLS, rate limiting, audit
- β FinOps integration: Cost tracking already implemented
- β Clear success criteria: Measure latency overhead (target: <50ms)
Cons:
- β Dual-write complexity: Must maintain HTTP API + NATS simultaneously
- β Requires secrets management: Needs API key rotation (dependency on cognitive-vault)
- β Performance overhead: Additional hop through NATS (5-10ms latency)
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 7/10 | +5-10ms latency from NATS hop |
| Maintainability | 7/10 | Dual-write increases complexity |
| Complexity | 6/10 | HTTPβNATS bridge pattern new |
| Cost | 6/10 | 5-7 days development |
| Risk | 7/10 | Backward compatibility concerns |
Estimated Effort: 5-7 days Risk Level: Medium Demo Value: Very High (LLM requests demo'd)
Alternative 3: Start with ml-offload-api (Aggressive)β
Approach: Full NATS integration with VRAM monitoring events
Pros:
- β High technical value: Validates hardware monitoring β event stream
- β FinOps critical: Local inference cost = $0, major selling point
- β Unique capabilities: VRAM-aware routing demonstrates intelligence
- β Event-rich: Publishes many event types (VRAM, inference, cost)
Cons:
- β Hardware dependency: Requires NVIDIA GPU (not in CI/CD)
- β Alpha maturity: Less stable than securellm-bridge
- β Complex integration: Model registry + backend abstraction
- β Testing challenges: GPU mocking for unit tests
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 8/10 | Local inference, no API costs |
| Maintainability | 6/10 | Hardware-specific code complex |
| Complexity | 5/10 | GPU monitoring + NATS integration |
| Cost | 5/10 | 7-10 days development |
| Risk | 5/10 | Alpha code, hardware dependency |
Estimated Effort: 7-10 days Risk Level: Medium-High Demo Value: High (local AI demo'd)
Alternative 4: Dual Integration - cognitive-vault + securellm-bridge (Recommended)β
Approach:
- Days 1-3: Extract crypto from cognitive-vault β spectre-secrets
- Days 4-10: Integrate securellm-bridge with NATS events + spectre-secrets auth
Pros:
- β Complete Phase 1: Both spectre-secrets and real integration
- β Sequential dependency: Secrets ready before securellm needs it
- β High demo value: End-to-end secure LLM gateway
- β Validates full stack: Crypto, events, proxy, observability
- β Clear milestone: Each sub-project has testable output
Cons:
- β Tight timeline: 10 days for both (risk of scope creep)
- β Sequential risk: Delay in secrets blocks securellm integration
- β Higher complexity: Two integrations simultaneously
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 8/10 | Efficient crypto + acceptable NATS overhead |
| Maintainability | 8/10 | Two clean integrations |
| Complexity | 6/10 | Sequential execution reduces risk |
| Cost | 5/10 | 10 days total |
| Risk | 6/10 | Tight timeline but sequential dependencies |
Estimated Effort: 10 days (2 weeks) Risk Level: Medium Demo Value: Very High (secure LLM gateway with cost tracking)
Recommendation Matrixβ
| Alternative | Effort | Risk | Demo Value | Strategic Fit | Overall Score |
|---|---|---|---|---|---|
| Alt 1: vault only | 2-3d | Very Low | Low | 6/10 | 6.5/10 |
| Alt 2: securellm | 5-7d | Medium | Very High | 9/10 | 8/10 |
| Alt 3: ml-offload | 7-10d | Medium-High | High | 8/10 | 7/10 |
| Alt 4: vault + securellm | 10d | Medium | Very High | 10/10 | 8.5/10 β |
Recommended Decision: Alternative 4 (Dual Integration)β
Rationale:
- Completes entire Phase 1 scope (secrets + proxy validation)
- Demonstrates complete value chain: secure request β LLM β cost tracking
- Sequential execution reduces risk (secrets first, then securellm)
- Fits 2-week Phase 1 timeline with buffer
Decision Criteria for Architect:
- If timeline is critical β Choose Alt 1 (cognitive-vault only)
- If demo value paramount β Choose Alt 2 (securellm-bridge)
- If FinOps is priority β Choose Alt 3 (ml-offload-api)
- If complete Phase 1 β Choose Alt 4 (recommended)
Success Metrics:
- spectre-secrets: Encrypt/decrypt with <1ms latency, rotation in <5s
- securellm-bridge: NATS overhead <50ms p99, zero data loss
- Integration tests: 95% coverage, CI/CD green
ADR-002: Service Integration Patternβ
Contextβ
Domain services (Rust/Python) must integrate with SPECTRE event bus. The integration pattern affects performance, maintainability, and evolution of the architecture.
Problem Statementβ
How should external services communicate with SPECTRE?
Requirements:
- Support both Rust and Python services
- Minimal latency overhead (<50ms p99)
- Backward compatibility with existing HTTP APIs
- Observable (request tracing, metrics)
- Testable (unit + integration tests)
Alternative 1: Direct NATS Integration (Native Pattern)β
Approach: Services directly use NATS client libraries
// Rust service
use spectre_events::EventBus;
#[tokio::main]
async fn main() {
let bus = EventBus::connect("nats://localhost:4222").await?;
let mut sub = bus.subscribe("llm.request.v1").await?;
while let Some(msg) = sub.next().await {
let response = handle_request(msg).await?;
bus.publish(&response).await?;
}
}
Pros:
- β Lowest latency: Direct connection, no middleware
- β Simplest architecture: No additional components
- β Best observability: Native NATS tracing
- β Scales horizontally: Queue groups for load balancing
Cons:
- β Requires code changes: Must refactor existing services
- β No HTTP backward compatibility: Breaking change for HTTP clients
- β Language-specific clients: Different APIs for Rust vs Python
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 10/10 | No middleware, native NATS |
| Maintainability | 8/10 | Simple but requires refactor |
| Complexity | 7/10 | NATS learning curve |
| Cost | 7/10 | Refactor effort per service |
| Risk | 8/10 | Breaking changes for HTTP clients |
Latency: 5-10ms (NATS pub/sub) Effort: Medium (per service refactor) Backward Compatibility: β None
Alternative 2: HTTP Bridge Pattern (Hybrid)β
Approach: Keep existing HTTP APIs, add NATS event publishing
// Existing HTTP handler
async fn handle_completion(req: ChatRequest) -> Result<ChatResponse> {
// NEW: Publish event
bus.publish(&Event::new(
EventType::LlmRequest,
service_id,
serde_json::to_value(&req)?
)).await?;
// Existing logic
let response = provider.complete(req).await?;
// NEW: Publish response event
bus.publish(&Event::new(
EventType::LlmResponse,
service_id,
serde_json::to_value(&response)?
)).await?;
Ok(response)
}
Pros:
- β Backward compatible: HTTP API unchanged
- β Incremental migration: Can dual-write during transition
- β Low risk: Existing clients unaffected
- β Observability added: Events for monitoring without breaking API
Cons:
- β Dual-write complexity: Maintain two interfaces
- β Event/HTTP inconsistency risk: Can diverge over time
- β Performance overhead: Serialize twice (HTTP + NATS)
- β Technical debt: Eventually need to migrate away from HTTP
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 7/10 | +2-5ms for event publishing |
| Maintainability | 6/10 | Dual-write increases complexity |
| Complexity | 7/10 | Familiar HTTP + new events |
| Cost | 8/10 | Minimal refactor per service |
| Risk | 9/10 | Low risk, backward compatible |
Latency: HTTP baseline + 2-5ms Effort: Low (add event publishing) Backward Compatibility: β Full
Alternative 3: Sidecar Proxy Pattern (Enterprise)β
Approach: Deploy spectre-proxy as sidecar, intercepts HTTP and publishes events
βββββββββββββββββββββββββββββββββββββββ
β Service Container β
β ββββββββββββββ ββββββββββββββββ β
β β Service βββββΆβ Sidecar β β
β β (HTTP) β β spectre- β β
β β ββββββ proxy β β
β ββββββββββββββ ββββββββ¬ββββββββ β
β β β
ββββββββββββββββββββββββββββββΌββββββββββ
β
βΌ
NATS Event Bus
Pros:
- β Zero code changes: Service unaware of NATS
- β Polyglot support: Works with any HTTP service
- β Centralized policy: Auth, rate limiting in proxy
- β Ops-friendly: Sidecar managed independently
Cons:
- β High complexity: Requires Kubernetes/Docker Compose orchestration
- β Performance overhead: Extra network hop (5-15ms)
- β Operational burden: More containers to manage
- β Debugging harder: Network issues obscured by sidecar
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 6/10 | +5-15ms from network hop |
| Maintainability | 7/10 | Centralized but more moving parts |
| Complexity | 4/10 | Sidecar orchestration complex |
| Cost | 5/10 | Infrastructure overhead |
| Risk | 6/10 | Network failure modes |
Latency: 5-15ms (localhost sidecar) Effort: High (infrastructure setup) Backward Compatibility: β Full
Alternative 4: Subprocess Wrapper Pattern (Simple)β
Approach: SPECTRE spawns services as subprocesses, captures stdio events
// spectre-orchestrator
let mut child = Command::new("python")
.arg("ragtex/main.py")
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn()?;
// Send request via stdin
writeln!(child.stdin, "{}", request_json)?;
// Read response from stdout
let response = BufReader::new(child.stdout).lines().next()?;
bus.publish(&Event::new(EventType::RagResponse, service_id, response)).await?;
Pros:
- β Simplest integration: No service code changes
- β Language agnostic: Works with any executable
- β Process isolation: Crashes don't affect SPECTRE
Cons:
- β Poor performance: Process spawn overhead (50-100ms)
- β Limited scalability: Can't horizontally scale
- β No request-reply: Async only (can't handle sync requests well)
- β Debugging nightmare: Stdio piping fragile
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 3/10 | 50-100ms process spawn |
| Maintainability | 5/10 | Simple but fragile |
| Complexity | 8/10 | Just spawn processes |
| Cost | 9/10 | No integration code needed |
| Risk | 4/10 | Process management issues |
Latency: 50-100ms (process spawn) Effort: Very Low Backward Compatibility: β Full
Recommendation Matrixβ
| Pattern | Performance | Maintainability | Complexity | Cost | Risk | Overall |
|---|---|---|---|---|---|---|
| Native NATS | 10/10 | 8/10 | 7/10 | 7/10 | 8/10 | 8/10 β |
| HTTP Bridge | 7/10 | 6/10 | 7/10 | 8/10 | 9/10 | 7.4/10 |
| Sidecar Proxy | 6/10 | 7/10 | 4/10 | 5/10 | 6/10 | 5.6/10 |
| Subprocess | 3/10 | 5/10 | 8/10 | 9/10 | 4/10 | 5.8/10 |
Recommended Decision: Hybrid Approachβ
Phase 1-2 (Short-term): Use HTTP Bridge Pattern for initial integration
- Minimal disruption, backward compatible
- Validate event schemas and observability
Phase 3+ (Long-term): Migrate to Native NATS Integration
- Optimal performance and simplicity
- Remove HTTP dual-write technical debt
Rationale:
- De-risks Phase 1 with incremental approach
- Allows validating event patterns before full commitment
- Provides clear migration path
Decision Criteria:
- If performance critical β Use Native NATS immediately
- If backward compat critical β Use HTTP Bridge
- If polyglot at scale β Use Sidecar Proxy
- If quick prototype β Use Subprocess (temp only)
ADR-003: Python Service Deployment Strategyβ
Contextβ
Two Python services exist: ragtex (RAG system) and arch-analyzer (NixOS analysis). They must integrate with Rust-based SPECTRE infrastructure.
Problem Statementβ
How should Python services be deployed and managed?
Requirements:
- Support async Python (asyncio)
- Integrate with NATS event bus
- Reproducible deployments
- Observable and debuggable
- Cost-effective (no unnecessary Docker overhead)
Alternative 1: Docker Containers (Industry Standard)β
Approach: Containerize Python services with nats-py client
# Dockerfile
FROM python:3.13-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "main.py"]
# docker-compose.yml
services:
ragtex:
build: ~/dev/low-level/ragtex
environment:
- NATS_URL=nats://nats:4222
depends_on:
- nats
Pros:
- β Industry standard: Well-understood pattern
- β Isolation: Dependencies don't conflict
- β Portability: Runs anywhere Docker runs
- β Resource limits: Can cap CPU/memory via Docker
Cons:
- β Overhead: 50-100MB per container + startup time
- β Dev friction: Rebuild image on every code change
- β Debugging harder: Need to exec into container
- β Not Nix-native: Doesn't leverage NixOS reproducibility
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 7/10 | Container overhead minimal |
| Maintainability | 8/10 | Standard tooling |
| Complexity | 7/10 | Docker learning curve |
| Cost | 7/10 | Disk space + build time |
| Risk | 9/10 | Proven pattern |
Resource Usage: ~100MB RAM per service Startup Time: 2-5 seconds Development Friction: Medium (rebuild on changes)
Alternative 2: Nix Flake + systemd Service (NixOS Native) ββ
Approach: Define Python environment in flake.nix, run as systemd service
# ragtex/flake.nix
{
outputs = { self, nixpkgs }: {
packages.x86_64-linux.default = nixpkgs.legacyPackages.x86_64-linux.python3Packages.buildPythonApplication {
pname = "ragtex";
version = "0.1.0";
src = ./.;
propagatedBuildInputs = with nixpkgs.legacyPackages.x86_64-linux.python3Packages; [
nats-py
langchain
# ...
];
};
nixosModules.ragtex = { config, lib, pkgs, ... }: {
systemd.services.ragtex = {
description = "RAGTeX NATS Service";
wantedBy = [ "multi-user.target" ];
after = [ "nats.service" ];
serviceConfig = {
ExecStart = "${self.packages.x86_64-linux.default}/bin/ragtex";
Restart = "always";
Environment = "NATS_URL=nats://localhost:4222";
};
};
};
};
}
Pros:
- β Nix-native: Leverages NixOS declarative config
- β Zero overhead: Native process, no container
- β Reproducible: Exact dependencies via Nix
- β systemd integration: Logs via journalctl, auto-restart
Cons:
- β NixOS-specific: Won't work on non-NixOS systems
- β Learning curve: Nix ecosystem complex
- β Slower builds: Nix builds can be slow initially
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 10/10 | Native process, no overhead |
| Maintainability | 9/10 | Declarative, version-controlled |
| Complexity | 6/10 | Nix learning curve steep |
| Cost | 9/10 | No container overhead |
| Risk | 7/10 | NixOS-specific, less portable |
Resource Usage: ~30MB RAM (native Python) Startup Time: <1 second Development Friction: Low (nix develop)
Alternative 3: Embedded PyO3 (Rust-Python Bridge)β
Approach: Embed Python interpreter in Rust using PyO3
// spectre-python-bridge/src/lib.rs
use pyo3::prelude::*;
pub fn run_ragtex(query: &str) -> PyResult<String> {
Python::with_gil(|py| {
let ragtex = PyModule::import(py, "ragtex")?;
let result = ragtex.getattr("query")?.call1((query,))?;
result.extract()
})
}
Pros:
- β Single binary: No separate deployment
- β Low latency: No IPC overhead
- β Type safety: Rust types at boundary
Cons:
- β GIL contention: Python Global Interpreter Lock limits parallelism
- β Crash risk: Python crash kills Rust process
- β Complex builds: PyO3 + Python deps in Nix
- β Debugging nightmare: Mixed-language stack traces
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 6/10 | GIL limits concurrency |
| Maintainability | 4/10 | Very complex to debug |
| Complexity | 3/10 | PyO3 + build complexity high |
| Cost | 5/10 | High initial effort |
| Risk | 4/10 | Crash isolation poor |
Resource Usage: Shared with Rust process Startup Time: Instant (already embedded) Development Friction: Very High
Alternative 4: Hybrid - Docker for Python, Native for Rustβ
Approach: Python services in Docker, Rust services native
# docker-compose.yml
services:
# Python services (Docker)
ragtex:
build: ~/dev/low-level/ragtex
networks: [spectre-net]
arch-analyzer:
build: ~/dev/low-level/arch-analyzer
networks: [spectre-net]
# Rust services (native systemd)
# Managed by NixOS configuration.nix
Pros:
- β Best of both worlds: Docker where it makes sense, native for Rust
- β Pragmatic: Doesn't force Nix on Python devs
- β Flexible: Can migrate Python to Nix later
Cons:
- β Inconsistent: Two deployment methods to maintain
- β Docker still required: Can't eliminate Docker entirely
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 8/10 | Rust native + acceptable Python overhead |
| Maintainability | 7/10 | Two systems to maintain |
| Complexity | 6/10 | Manageable split |
| Cost | 8/10 | Good pragmatic balance |
| Risk | 8/10 | Standard patterns for each |
Recommendation Matrixβ
| Alternative | Performance | Maintainability | Complexity | Cost | Risk | Overall |
|---|---|---|---|---|---|---|
| Docker | 7/10 | 8/10 | 7/10 | 7/10 | 9/10 | 7.6/10 |
| Nix + systemd | 10/10 | 9/10 | 6/10 | 9/10 | 7/10 | 8.2/10 β |
| PyO3 Embedded | 6/10 | 4/10 | 3/10 | 5/10 | 4/10 | 4.4/10 |
| Hybrid | 8/10 | 7/10 | 6/10 | 8/10 | 8/10 | 7.4/10 |
Recommended Decision: Nix + systemd (Alternative 2)β
Rationale:
- NixOS-native: You're already on NixOS, leverage it fully
- Best performance: Native processes, no container overhead
- Declarative: Entire stack in configuration.nix
- Reproducible: Exact versions pinned in flake.lock
Migration Path:
- Phase 1: Start with Docker (faster to prototype)
- Phase 2: Migrate to Nix+systemd once patterns validated
Decision Criteria:
- If NixOS environment β Use Nix + systemd (recommended)
- If need portability β Use Docker
- If single binary important β Consider PyO3 (carefully)
- If mixed team β Use Hybrid approach
ADR-004: Secrets Management Architectureβ
Contextβ
SPECTRE requires secure credential storage and automatic rotation for:
- LLM API keys (OpenAI, Anthropic, Vertex AI)
- Database passwords (TimescaleDB, Neo4j)
- Service authentication tokens
- TLS certificates
Existing Asset: cognitive-vault has production-ready crypto (AES-256-GCM, Argon2id)
Problem Statementβ
Should we extract cognitive-vault crypto into spectre-secrets, or integrate vault directly?
Alternative 1: Extract Crypto Library (Minimal)β
Approach: Copy crypto primitives from cognitive-vault into spectre-secrets
// spectre-secrets/src/crypto.rs (extracted)
pub struct CryptoEngine {
key: SecretKey,
}
impl CryptoEngine {
pub fn new(password: &str, salt: &[u8]) -> Result<Self> {
let key = argon2_derive_key(password, salt)?;
Ok(Self { key })
}
pub fn encrypt(&self, plaintext: &[u8]) -> Result<Vec<u8>> {
aes_gcm_encrypt(&self.key, plaintext)
}
pub fn decrypt(&self, ciphertext: &[u8]) -> Result<Vec<u8>> {
aes_gcm_decrypt(&self.key, ciphertext)
}
}
Pros:
- β Clean separation: No dependency on cognitive-vault crate
- β SPECTRE-specific: Can optimize for SPECTRE use case
- β No FFI: Pure Rust, no Go dependencies
Cons:
- β Code duplication: cognitive-vault and spectre-secrets diverge
- β Security risk: Might miss security updates in cognitive-vault
- β Lost features: cognitive-vault has CLI, backup, etc.
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 10/10 | Optimized for SPECTRE |
| Maintainability | 6/10 | Code duplication |
| Complexity | 8/10 | Simple extraction |
| Cost | 8/10 | 2-3 days extraction |
| Risk | 7/10 | Security divergence risk |
Effort: 2-3 days Ongoing Maintenance: Medium (sync security fixes)
Alternative 2: Depend on cognitive-vault Crate (DRY)β
Approach: Add cognitive-vault as Git dependency
# spectre-secrets/Cargo.toml
[dependencies]
vault_core = { git = "https://github.com/kernelcore/cognitive-vault", branch = "main" }
Pros:
- β No duplication: Single source of truth
- β Automatic updates: Benefit from cognitive-vault improvements
- β Proven code: Battle-tested crypto
Cons:
- β External dependency: SPECTRE depends on separate repo
- β Version coupling: Breaking changes in vault break SPECTRE
- β Go FFI complexity: Brings Go build into SPECTRE (if using CLI)
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 10/10 | Same as extraction |
| Maintainability | 9/10 | DRY principle |
| Complexity | 7/10 | External dependency management |
| Cost | 10/10 | Minimal integration work |
| Risk | 6/10 | Coupling to external repo |
Effort: 1 day integration Ongoing Maintenance: Low (upstream does work)
Alternative 3: Secrets Service Pattern (Microservice)β
Approach: Run cognitive-vault as separate NATS service
Client β NATS secrets.retrieve.v1 β cognitive-vault service β NATS secrets.response.v1 β Client
Pros:
- β Loose coupling: cognitive-vault evolves independently
- β Centralized secrets: Single service manages all credentials
- β Polyglot access: Any language can request secrets via NATS
Cons:
- β Network latency: +5-10ms per secret retrieval
- β Single point of failure: If vault down, all services blocked
- β Operational overhead: Another service to deploy/monitor
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 6/10 | Network round-trip overhead |
| Maintainability | 8/10 | Clean separation |
| Complexity | 6/10 | Service orchestration |
| Cost | 7/10 | 4-5 days integration |
| Risk | 5/10 | Single point of failure |
Effort: 4-5 days Ongoing Maintenance: Medium (service management)
Alternative 4: Hybrid - Extract Crypto, Keep CLI Separateβ
Approach:
- Create
spectre-cryptolib crate (extracted primitives) - cognitive-vault CLI uses
spectre-crypto(shared library) - spectre-secrets uses
spectre-cryptodirectly
ββββββββββββββββββββββββββββ
β spectre-crypto (lib) β β Shared crypto primitives
βββββββββββββ¬βββββββββββββββ
β
ββββββββ΄βββββββ
β β
βΌ βΌ
βββββββββββ ββββββββββββββββββ
β spectre-β β cognitive-vaultβ
β secrets β β CLI (Go FFI) β
βββββββββββ ββββββββββββββββββ
Pros:
- β Best of both worlds: Shared crypto, independent tools
- β No duplication: Both use same library
- β Backward compat: cognitive-vault CLI unchanged
Cons:
- β Refactor cognitive-vault: Needs restructuring
- β More crates: Adds
spectre-cryptoto workspace
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 10/10 | Direct library access |
| Maintainability | 9/10 | Shared code, clear boundaries |
| Complexity | 6/10 | Requires refactoring |
| Cost | 6/10 | 5-6 days refactor + integration |
| Risk | 8/10 | Clean architecture |
Effort: 5-6 days Ongoing Maintenance: Low (shared crate)
Recommendation Matrixβ
| Alternative | Performance | Maintainability | Complexity | Cost | Risk | Overall |
|---|---|---|---|---|---|---|
| Extract Crypto | 10/10 | 6/10 | 8/10 | 8/10 | 7/10 | 7.8/10 |
| Depend on Vault | 10/10 | 9/10 | 7/10 | 10/10 | 6/10 | 8.4/10 |
| Secrets Service | 6/10 | 8/10 | 6/10 | 7/10 | 5/10 | 6.4/10 |
| Hybrid (Shared Lib) | 10/10 | 9/10 | 6/10 | 6/10 | 8/10 | 7.8/10 β |
Recommended Decision: Hybrid Approach (Alternative 4)β
Phase 1 (Quick Start): Use Alternative 2 (Depend on vault_core)
- Fast integration (1 day)
- Validate secrets management patterns
Phase 2 (Long-term): Refactor to Alternative 4 (Shared Lib)
- Extract
spectre-cryptocrate - Both SPECTRE and cognitive-vault use it
Rationale:
- De-risks Phase 1: Get secrets working quickly
- Future-proof: Shared library is clean long-term architecture
- Incremental: Can refactor after validating patterns
Decision Criteria:
- If time critical β Use Alt 2 (depend on vault)
- If long-term architecture β Use Alt 4 (shared lib)
- If polyglot important β Consider Alt 3 (service)
ADR-005: LLM Request Routing Strategyβ
Contextβ
Multiple LLM providers available:
- Vertex AI (Gemini) - Cloud, powerful, expensive ($0.05-0.20 per request)
- ml-offload-api (llama.cpp) - Local, fast, free, limited by VRAM
- securellm-bridge (OpenAI/DeepSeek/Anthropic) - Proxy with unified API
Goal: Minimize cost (FinOps) while maintaining quality
Problem Statementβ
How should SPECTRE route LLM requests to optimize cost vs quality?
Alternative 1: Simple Priority (Local-First)β
Approach: Try local first, failover to cloud
async fn route_llm_request(req: LlmRequest) -> Result<LlmResponse> {
// Try ml-offload-api (local, free)
match ml_offload.complete(req.clone()).await {
Ok(response) => return Ok(response),
Err(VramExhausted) => {
// Failover to Vertex AI
return vertex_ai.complete(req).await;
}
}
}
Pros:
- β Maximizes cost savings: Local inference free
- β Simple logic: Easy to understand
- β Fast for simple queries: Local has lower latency
Cons:
- β Quality issues: Local models less capable for complex tasks
- β No load balancing: All requests hit local until exhausted
- β No quality feedback: Can't learn from failures
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 8/10 | Local fast, cloud slower |
| Maintainability | 9/10 | Simple code |
| Complexity | 9/10 | Trivial logic |
| Cost | 9/10 | Maximal local usage |
| Risk | 6/10 | Quality unpredictable |
Cost Savings: ~80% (if local handles 80% of requests) Latency: Local 100-500ms, Cloud 1-3s Quality: Variable
Alternative 2: Intelligent Routing (Complexity-Based)β
Approach: Route based on request complexity
async fn route_llm_request(req: LlmRequest) -> Result<LlmResponse> {
let complexity = analyze_complexity(&req);
match complexity {
Complexity::Simple => ml_offload.complete(req).await,
Complexity::Moderate if ml_offload.vram_available() > 2GB => {
ml_offload.complete(req).await
},
Complexity::Moderate | Complexity::Complex => {
vertex_ai.complete(req).await
}
}
}
fn analyze_complexity(req: &LlmRequest) -> Complexity {
if req.prompt.len() > 2000 { return Complexity::Complex; }
if req.requires_reasoning { return Complexity::Complex; }
if req.context_length > 8000 { return Complexity::Moderate; }
Complexity::Simple
}
Pros:
- β Cost-quality balance: Routes appropriately
- β VRAM-aware: Checks availability before routing
- β Measurable: Can track routing decisions
Cons:
- β Heuristic tuning: analyze_complexity() needs calibration
- β More complex: Additional logic to maintain
- β False positives: May route simple tasks to cloud unnecessarily
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 9/10 | Optimal latency per complexity |
| Maintainability | 7/10 | Heuristics need tuning |
| Complexity | 6/10 | Non-trivial routing logic |
| Cost | 8/10 | Good balance (60-70% local) |
| Risk | 7/10 | Heuristics can be wrong |
Cost Savings: ~60-70% Latency: Optimized per complexity Quality: High (complex β cloud)
Alternative 3: ML-Based Routing (Advanced)β
Approach: Train small classifier to predict best provider
async fn route_llm_request(req: LlmRequest) -> Result<LlmResponse> {
let features = extract_features(&req); // prompt length, keywords, etc.
let prediction = routing_model.predict(features); // local vs cloud
match prediction {
Provider::Local => ml_offload.complete(req).await,
Provider::Cloud => vertex_ai.complete(req).await,
}
}
// Background task: Train model on historical requests
async fn train_routing_model() {
let history = load_historical_requests().await;
let labels = history.iter().map(|r| {
if r.local_succeeded && r.quality_score > 0.8 { Provider::Local }
else { Provider::Cloud }
});
routing_model.train(history, labels).await;
}
Pros:
- β Optimal routing: Learns from data
- β Adaptive: Improves over time
- β Cost-aware: Can optimize for cost vs quality tradeoff
Cons:
- β High complexity: Requires ML pipeline (training, inference)
- β Cold start problem: Needs historical data
- β Operational overhead: Model training, versioning, deployment
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 10/10 | Optimal learned routing |
| Maintainability | 5/10 | ML pipeline complex |
| Complexity | 3/10 | Requires ML infrastructure |
| Cost | 9/10 | Best cost optimization |
| Risk | 5/10 | Model drift, cold start issues |
Cost Savings: ~70-80% (optimal learned) Latency: Optimized Quality: High (learned from feedback)
Alternative 4: Cost-Capped Round Robinβ
Approach: Use local until daily cost budget, then round-robin cloud
async fn route_llm_request(req: LlmRequest) -> Result<LlmResponse> {
if daily_cloud_cost() < DAILY_BUDGET {
// Still in budget, prefer cloud for quality
return vertex_ai.complete(req).await;
}
// Over budget, use local only
ml_offload.complete(req).await
}
Pros:
- β Budget guarantee: Never exceed cost
- β Simple logic: Easy to implement
- β Quality-first: Uses cloud while budget allows
Cons:
- β End-of-day degradation: All local at month-end
- β No optimization: Doesn't minimize cost intelligently
- β Poor UX: Quality drops suddenly when budget hit
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 7/10 | Cloud until budget, then local |
| Maintainability | 9/10 | Simple budget check |
| Complexity | 9/10 | Trivial logic |
| Cost | 7/10 | Guarantees budget but not optimal |
| Risk | 6/10 | Poor UX at budget limit |
Cost Savings: Fixed by budget Latency: Cloud baseline until budget Quality: High until budget, then degrades
Recommendation Matrixβ
| Alternative | Performance | Maintainability | Complexity | Cost | Risk | Overall |
|---|---|---|---|---|---|---|
| Local-First | 8/10 | 9/10 | 9/10 | 9/10 | 6/10 | 8.2/10 |
| Intelligent Routing | 9/10 | 7/10 | 6/10 | 8/10 | 7/10 | 7.4/10 β |
| ML-Based | 10/10 | 5/10 | 3/10 | 9/10 | 5/10 | 6.4/10 |
| Cost-Capped | 7/10 | 9/10 | 9/10 | 7/10 | 6/10 | 7.6/10 |
Recommended Decision: Phased Approachβ
Phase 1 (Immediate): Start with Alternative 1 (Local-First)
- Simple, fast to implement
- Validate cost savings hypothesis
Phase 2 (Month 2): Upgrade to Alternative 2 (Intelligent Routing)
- Add complexity heuristics
- Measure cost vs quality tradeoff
Phase 3 (Future): Consider Alternative 3 (ML-Based) if data justifies
- Only if historical data shows clear patterns
- Requires dedicated ML engineer
Rationale:
- Incremental complexity: Start simple, add sophistication based on data
- Data-driven: Phase 1 generates data to inform Phase 2 heuristics
- Risk mitigation: Avoid premature optimization (ML-based routing)
Decision Criteria:
- If cost paramount β Use Alt 1 (local-first) aggressively
- If quality paramount β Use Alt 4 (cost-capped) with high budget
- If balanced β Use Alt 2 (intelligent routing) β
- If ML expertise available β Consider Alt 3 long-term
ADR-006: Observability Architectureβ
Contextβ
SPECTRE must provide:
- Real-time metrics (request latency, throughput, error rates)
- Cost tracking (per service, per user, per request)
- Distributed tracing (correlation IDs across services)
- Anomaly detection (ML-based alerts)
- Dependency visualization (service graph)
Infrastructure Available: TimescaleDB (time-series), Neo4j (graph), NATS (event stream)
Problem Statementβ
How should observability data flow through the system?
Alternative 1: Centralized Collector (Simple)β
Approach: Single spectre-observability service subscribes to all events
All Services β NATS (wildcard *.*.v1) β spectre-observability β TimescaleDB/Neo4j
Pros:
- β Simple architecture: One service handles all observability
- β Centralized logic: Anomaly detection in one place
- β Easy to reason about: Clear data flow
Cons:
- β Single point of failure: If observability down, lose all metrics
- β Bottleneck: All events funnel through one service
- β Scaling challenges: Hard to horizontally scale
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 7/10 | Potential bottleneck at scale |
| Maintainability | 9/10 | Simple, single service |
| Complexity | 9/10 | Straightforward |
| Cost | 9/10 | Low operational overhead |
| Risk | 6/10 | Single point of failure |
Throughput: ~10K events/sec (single instance) Latency: 5-10ms (async processing) Operational Complexity: Low
Alternative 2: Distributed Collectors (Scalable)β
Approach: Multiple observability workers with queue groups
All Services β NATS (wildcard) β [Collector 1, Collector 2, Collector 3] β Databases
(Queue Group: load balanced)
Pros:
- β Horizontal scaling: Add workers as load increases
- β High availability: Workers can fail without data loss
- β Load balanced: NATS queue groups distribute work
Cons:
- β Consistency challenges: Multiple writers to databases
- β More complex: Worker coordination needed
- β Anomaly detection harder: State distributed across workers
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 10/10 | Linear scaling with workers |
| Maintainability | 7/10 | Worker coordination complexity |
| Complexity | 6/10 | Distributed system challenges |
| Cost | 7/10 | More infrastructure |
| Risk | 8/10 | Better fault tolerance |
Throughput: ~100K events/sec (10 workers) Latency: 5-10ms (unchanged) Operational Complexity: Medium
Alternative 3: Embedded Observability (Performance)β
Approach: Each service writes directly to TimescaleDB/Neo4j
Each Service β TimescaleDB (metrics) + Neo4j (dependencies) directly
β NATS (for real-time dashboard only)
Pros:
- β Lowest latency: No intermediary
- β No SPOF: Observability failures don't cascade
- β Simple per service: Each service manages own metrics
Cons:
- β Tight coupling: Services depend on observability DBs
- β Credentials everywhere: Every service needs DB passwords
- β Anomaly detection fragmented: No central view
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 10/10 | Direct writes, no middleware |
| Maintainability | 5/10 | Fragmented, tight coupling |
| Complexity | 6/10 | Every service has DB logic |
| Cost | 8/10 | No observability service |
| Risk | 5/10 | Coupling risk high |
Throughput: Limited by DB capacity Latency: 1-2ms (direct write) Operational Complexity: High (credentials mgmt)
Alternative 4: Hybrid - Central Collector + Service-Level Cachingβ
Approach: Services emit events, observability caches hot metrics
Services β NATS β spectre-observability (caches hot metrics in Redis) β TimescaleDB
β
Real-time Dashboard (reads from Redis)
Pros:
- β Real-time dashboard: Hot metrics in Redis (ms latency)
- β Historical analysis: Full data in TimescaleDB
- β Decoupled: Services don't know about observability storage
Cons:
- β Redis dependency: Another component to manage
- β Cache consistency: Hot vs cold data can diverge
- β More complex: Two storage tiers
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 9/10 | Redis for hot data, DB for cold |
| Maintainability | 7/10 | Two-tier storage |
| Complexity | 6/10 | Cache invalidation complexity |
| Cost | 7/10 | Redis + TimescaleDB |
| Risk | 7/10 | Cache consistency risks |
Throughput: ~50K events/sec Latency: <1ms (Redis reads), 10ms (DB writes) Operational Complexity: Medium
Recommendation Matrixβ
| Alternative | Performance | Maintainability | Complexity | Cost | Risk | Overall |
|---|---|---|---|---|---|---|
| Centralized | 7/10 | 9/10 | 9/10 | 9/10 | 6/10 | 8/10 β |
| Distributed | 10/10 | 7/10 | 6/10 | 7/10 | 8/10 | 7.6/10 |
| Embedded | 10/10 | 5/10 | 6/10 | 8/10 | 5/10 | 6.8/10 |
| Hybrid Cache | 9/10 | 7/10 | 6/10 | 7/10 | 7/10 | 7.2/10 |
Recommended Decision: Start Centralized, Scale to Distributedβ
Phase 1-2: Use Alternative 1 (Centralized Collector)
- Simplest to implement and debug
- Sufficient for initial load (< 10K events/sec)
- Single service to monitor
Phase 3+: Migrate to Alternative 2 (Distributed Collectors) if needed
- Only if load exceeds 10K events/sec
- Horizontal scaling when necessary
Rationale:
- YAGNI: Don't optimize for scale you don't have yet
- Incremental: Easy migration path (just add workers)
- Data-driven: Phase 1 will reveal actual load
Decision Criteria:
- If load < 10K events/sec β Use Alt 1 (centralized) β
- If load > 10K events/sec β Use Alt 2 (distributed)
- If real-time dashboard critical β Consider Alt 4 (hybrid cache)
- If zero observability overhead β Consider Alt 3 (embedded)
ADR-007: Event Schema Versioning Strategyβ
Contextβ
Event schemas will evolve over time. Breaking changes must not disrupt services.
Problem Statementβ
How to handle event schema evolution?
Alternative 1: Semantic Versioning in Subjectβ
Approach: Include version in NATS subject
llm.request.v1 β Current production
llm.request.v2 β New version with breaking changes
Migration: Services subscribe to both during transition
// Publisher (new)
bus.publish("llm.request.v2", new_schema).await;
// Subscriber (during migration)
let mut sub_v1 = bus.subscribe("llm.request.v1").await?;
let mut sub_v2 = bus.subscribe("llm.request.v2").await?;
tokio::select! {
msg = sub_v1.next() => handle_v1(msg),
msg = sub_v2.next() => handle_v2(msg),
}
Pros:
- β Explicit versioning: Clear which version used
- β Backward compatible: Old services keep working
- β Gradual migration: Can dual-subscribe during transition
Cons:
- β Subject proliferation: Many v1/v2/v3 subjects
- β Cleanup burden: Must deprecate old versions
Recommended: β This is the standard approach
Project Prioritization Frameworkβ
Evaluation Matrixβ
| Project | Maturity | Complexity | Value | Risk | Dependencies | Priority Score |
|---|---|---|---|---|---|---|
| cognitive-vault | 10/10 | 2/10 | 6/10 | 2/10 | None | 7.5/10 π₯ |
| securellm-bridge | 10/10 | 6/10 | 10/10 | 4/10 | vault (auth) | 7.2/10 π₯ |
| ml-offload-api | 6/10 | 6/10 | 9/10 | 6/10 | GPU hardware | 6.0/10 π₯ |
| ragtex | 7/10 | 5/10 | 8/10 | 5/10 | Vertex AI | 5.8/10 |
| arch-analyzer | 7/10 | 3/10 | 4/10 | 3/10 | None | 5.0/10 |
| ai-agent-os | 6/10 | 7/10 | 5/10 | 5/10 | Hyprland (opt) | 4.8/10 |
| intelagent | 6/10 | 10/10 | 10/10 | 8/10 | DAO, ZK circuits | 4.4/10 |
Scoring Methodologyβ
Maturity (10 = production, 0 = prototype):
- Code quality, test coverage, documentation
- Production usage history
Complexity (10 = trivial, 0 = very complex):
- Integration effort (person-days)
- Dependencies and prerequisites
Value (10 = critical, 0 = nice-to-have):
- Business impact
- Architectural demonstration value
Risk (10 = no risk, 0 = high risk):
- Technical debt
- Failure blast radius
- Rollback difficulty
Priority Score: Weighted average (Maturity: 30%, Complexity: 20%, Value: 30%, Risk: 20%)
Integration Scenariosβ
Scenario A: Conservative (Low Risk)β
Timeline: 4 weeks Projects: cognitive-vault β arch-analyzer β ai-agent-os
Rationale:
- Start with simplest integrations
- Build confidence before complex projects
- Minimal external dependencies
Pros: Low risk, steady progress Cons: Low demo value early
Scenario B: Balanced (Recommended) β β
Timeline: 6 weeks Projects: cognitive-vault + securellm-bridge β ml-offload-api β ragtex
Rationale:
- Phase 1: Secrets + LLM gateway (high value)
- Phase 2: Local inference (cost savings demo)
- Phase 3: RAG system (AI capabilities)
Pros: Good balance of risk and value Cons: Moderate complexity
Scenario C: Aggressive (High Value)β
Timeline: 8 weeks Projects: cognitive-vault + securellm-bridge β intelagent β ml-offload-api
Rationale:
- Tackle most valuable project (intelagent) early
- Accept higher risk for higher reward
- DAO governance showcases innovation
Pros: Maximum value demonstrated Cons: High complexity, technical risk
Risk Analysisβ
Technical Risksβ
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| NATS performance bottleneck | Low | High | Load test Phase 0, distributed collectors |
| Event schema changes break services | Medium | Medium | Semantic versioning (ADR-007) |
| Python NATS integration issues | Medium | Low | Prototype with nats-py early |
| cognitive-vault crypto bugs | Low | Critical | Extensive security audit + testing |
| GPU availability (ml-offload) | High | Medium | Fallback to cloud (Vertex AI) |
Schedule Risksβ
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Phase 1 overruns 2 weeks | Medium | Medium | Reduce scope to vault-only if needed |
| intelagent complexity underestimated | High | High | Defer to Phase 4 (Scenario B) |
| Integration testing takes longer | Medium | Low | Automated test harness from Phase 0 |
Referencesβ
Related Documentsβ
STATUS.md- Current project statusNEXT_STEPS.md- Phase 1 detailed roadmapINTEGRATION.md- Integration guide for servicesREADME.md- Architecture overview
External Resourcesβ
Document Status: Draft for Architect Review Next Action: Review ADRs, select Phase 1 integration strategy Approval Required: Phase 1 project selection (ADR-001), Integration pattern (ADR-002)
ADR-0040: Service Mesh Adoption β Linkerd over Istio/Ciliumβ
Formal ADR: ADR-0040 in adr-ledger
Status: Accepted Date: 2026-02-17 Classification: Major Project: SPECTRE (spectre-proxy) Issue: #45 Service Mesh Evaluation
Contextβ
SPECTRE operates under a zero-trust network model where east-west traffic between services (spectre-proxy β neutron, spectre-proxy β NATS) must be encrypted and mutually authenticated. Phase 3 validation exposed three concrete requirements:
- mTLS between all service-to-service calls β prevent MITM on cluster networks
- L7 observability β per-route latency/error-rate without code changes to spectre-proxy
- Traffic policies β timeouts and retry budgets per endpoint (e.g.
/ingestvs/health)
A service mesh was chosen over application-level TLS because:
- Application TLS requires managing certificates in every service (cognitive-vault, neutron, proxy)
- L7 metrics would need per-service Prometheus instrumentation
- Retry/timeout logic is already implemented in spectre-proxy but a mesh makes it auditable externally
Decisionβ
Linkerd stable-2.14.x was selected as the service mesh for SPECTRE.
Linkerd is deployed on the kind cluster (spectre-dev) with:
- Automatic sidecar injection for
spectre-proxy(2/2 containers confirmed) - Stub neutron (
ghcr.io/mccutchen/go-httpbin:v2.14.0) injected for mTLS validation - ServiceProfile CRD defining per-route policies for spectre-proxy
- NATS outbound ports (4222) excluded from proxy interception to avoid protocol misdetection
Alternatives Consideredβ
Alternative 1: Istio (Rejected)β
Pros:
- β Feature-rich: traffic splitting, fault injection, Wasm filters
- β Large community, extensive documentation
- β Native Kubernetes Gateway API support
Cons:
- β Heavy control plane: ~300MB memory (istiod + proxies) vs ~15MB for Linkerd
- β Complex CRD surface: VirtualService, DestinationRule, Gateway, PeerAuthentication (25+ CRDs)
- β Envoy proxy per sidecar: larger attack surface, harder to audit
- β SPIFFE/SPIRE cert rotation has had CVEs (CVE-2022-24752)
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 6/10 | Envoy overhead ~2-5ms p99 |
| Maintainability | 5/10 | CRD sprawl, complex upgrades |
| Complexity | 3/10 | 25+ CRDs to understand |
| Resource Usage | 4/10 | ~300MB control plane |
| Security | 7/10 | Mature but large attack surface |
Overall: 5/10 β Eliminated; SPECTRE does not need traffic splitting or Wasm.
Alternative 2: Cilium Service Mesh (Rejected)β
Pros:
- β eBPF-based: kernel-level enforcement, no sidecar overhead
- β Network policy + mesh in one agent
- β Excellent performance: near-zero latency overhead
Cons:
- β Requires Linux kernel β₯ 5.10 (kind nodes run 5.15+ but production constraint)
- β eBPF maps need
CAP_BPF/CAP_NET_ADMINβ restricted in hardened clusters - β Mutual TLS via WireGuard (node-level, not pod-level) β cannot enforce per-pod identity
- β L7 policies require Hubble which adds ~100MB overhead
- β kind cluster eBPF support requires privileged containers (security regression)
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 10/10 | eBPF near-zero overhead |
| Maintainability | 7/10 | Single agent, unified network+mesh |
| Complexity | 5/10 | eBPF debugging requires kernel expertise |
| Resource Usage | 8/10 | No sidecars, one DaemonSet |
| Security | 6/10 | Node-level mTLS, not pod-level identity |
Overall: 7/10 β Strong candidate for Phase 5 when running on bare-metal NixOS nodes. Deferred: kind environment and current kernel constraints make it premature.
Alternative 3: Linkerd (Selected) β β
Pros:
- β Lightweight: ~15MB Rust proxy (linkerd2-proxy) per sidecar
- β Zero-config mTLS: automatic SPIFFE certificate rotation via linkerd-identity
- β Simple mental model: ~5 CRDs total (ServiceProfile, Server, HTTPRoute, etc.)
- β Rust-based data plane β shares safety properties with spectre-proxy's Rust codebase
- β ServiceProfile CRD: per-route timeout + retry budget without application changes
- β
linkerd vizgolden metrics (success rate, RPS, p50/p95/p99) per deployment
Cons:
- β No traffic splitting without SMI adaptor (not needed for SPECTRE currently)
- β UDP not proxied (NATS uses TCP so this is irrelevant)
- β Smaller community than Istio
Trade-Offs:
| Dimension | Score | Rationale |
|---|---|---|
| Performance | 9/10 | Rust proxy +~0.5ms p50, <2ms p99 |
| Maintainability | 9/10 | Minimal CRDs, clean upgrade path |
| Complexity | 9/10 | linkerd install + inject annotation |
| Resource Usage | 9/10 | ~15MB sidecar, ~50MB control plane |
| Security | 9/10 | SPIFFE identity, automatic mTLS, RBAC |
Overall: 9/10 β Best fit for SPECTRE's current requirements.
Recommendation Matrixβ
| Mesh | Performance | Maintainability | Complexity | Resources | Security | Overall |
|---|---|---|---|---|---|---|
| Istio | 6/10 | 5/10 | 3/10 | 4/10 | 7/10 | 5.0/10 |
| Cilium | 10/10 | 7/10 | 5/10 | 8/10 | 6/10 | 7.2/10 |
| Linkerd | 9/10 | 9/10 | 9/10 | 9/10 | 9/10 | 9.0/10 β |
Consequencesβ
Positiveβ
- mTLS is automatic for all meshed pods β no application code changes required
linkerd viz tapprovides real-time L7 request inspection for debugging- ServiceProfile enables per-route SLO enforcement (timeouts, retries) externally from app code
- Benchmark shows +~0.5ms p50 / +~1.5ms p99 overhead (acceptable for async workloads)
- Zero-trust posture achieved: all spectre-proxy β neutron traffic encrypted + authenticated
Negative / Trade-offsβ
- Each meshed pod uses ~15MB additional RAM (linkerd2-proxy sidecar)
- NATS port 4222 must be excluded from interception (
skip-outbound-ports: 4222) because Linkerd cannot proxy the NATS binary protocol - Linkerd does not proxy UDP β irrelevant now but limits future UDP-based protocols
Migration Pathβ
- Phase 4: Deploy production neutron behind Linkerd mesh for real mTLS validation
- Phase 5: Evaluate Cilium as replacement if eBPF kernel constraints are met on bare metal
Validationβ
| Check | Command | Expected Result |
|---|---|---|
| mTLS active | linkerd viz edges deployment | TLS column = true for spectre-proxy β neutron |
| Traffic visible | linkerd viz tap deployment/spectre-proxy --to deployment/neutron | Requests visible with mTLS |
| Routes active | linkerd viz routes deployment/spectre-proxy | POST /ingest and GET /health listed |
| Overhead | wrk2 with vs without sidecar | p50 delta β€ 1ms, p99 delta β€ 2ms |
Referencesβ
- Linkerd stable-2.14 docs
- Linkerd ServiceProfile spec
- Linkerd vs Istio benchmark (2024)
- SPECTRE #45: Service Mesh Evaluation
nix/kubernetes/neutron-stub.nixβ stub neutron Deployment + Servicenix/kubernetes/service-profile.nixβ ServiceProfile CRD for spectre-proxy
ADR-0043: Phase 3β4 Transition β Stub Neutron to Production Backendβ
Formal ADR: ADR-0043 in adr-ledger
Status: Proposed Date: 2026-02-17 Classification: Major Project: SPECTRE (spectre-proxy, neutron/NEXUS) Supersedes: β Related: ADR-0040 (Service Mesh Adoption)
Contextβ
Phase 3 validated the service mesh infrastructure using a stub neutron β a stateless HTTP
echo server (ghcr.io/mccutchen/go-httpbin:v2.14.0) deployed under the same Kubernetes service
name (neutron.default.svc.cluster.local:8000) that the real backend will use.
This stub was sufficient to prove:
- mTLS between spectre-proxy β neutron (SECURED = β via
linkerd viz edges) - Linkerd sidecar injection on both ends (2/2 containers)
- ServiceProfile route classification (POST /ingest, GET /health)
- Golden metrics collection (success rate, RPS, p50/p95/p99)
However, the stub does not exercise:
- Actual inference latency (CUDA, model loading, GPU scheduling)
- NATS event flow end-to-end (proxy β NATS β neutron β response)
- Circuit breaker under real failure modes (OOM, GPU exhaustion, model timeout)
- Trace propagation across the full proxy β neutron path (W3C traceparent)
- Realistic payload sizes (LLM request/response bodies: 1KBβ100KB)
Phase 4 requires replacing the stub with a production-capable neutron backend.
Decisionβ
Adopt a graduated replacement strategy with three stages, each independently deployable and validated before proceeding to the next.
Stage 1: Lightweight Neutron Shim (Phase 4 entry)β
Image: Custom minimal container (Python/FastAPI or Rust/Axum) Scope: HTTP API contract + NATS consumer β no CUDA, no real inference
spectre-proxy β neutron-shim (HTTP :8000)
βββ POST /ingest β fake inference (random latency 50-500ms)
βββ GET /health β 200
βββ NATS subscriber: spectre.ingest.v1 β ack
What it validates:
- Full NATS round-trip (proxy publishes β neutron consumes β response)
- Realistic HTTP response structure (JSON with model output fields)
- Circuit breaker under simulated failures (shim returns 500 at configurable rate)
- Trace propagation: spectre-proxy span β neutron-shim span in Jaeger
- Mesh overhead under sustained load (wrk2 benchmark with real payload)
Nix integration:
nix/kubernetes/neutron-shim.nixreplacesneutron-stub.nix- Same Service name, same port β zero changes to spectre-proxy config
- Linkerd injection annotation preserved
Exit criteria:
- NATS publish β consume β response latency < 50ms p99
- Circuit breaker trips at 50% error rate, recovers after 30s
- Trace spans visible in Jaeger:
spectre-proxyβneutron-shim - wrk2 benchmark: β₯ 10K RPS on /ingest with mesh (p99 < 20ms)
Stage 2: NEXUS Lite (Phase 4 mid)β
Image: Stripped NEXUS build β Python + FastAPI + Ray Serve, no CUDA Scope: Real inference API with CPU-only models (e.g., distilbert, small LLMs via llama.cpp CPU)
spectre-proxy β neutron-lite (HTTP :8000)
βββ POST /ingest β real inference (CPU model, 200ms-2s)
βββ GET /health β 200 + model status
βββ GET /metrics β Prometheus (inference_latency, queue_depth)
βββ NATS: spectre.ingest.v1 β inference β spectre.result.v1
What it validates:
- Real inference latency profiles under mesh (not fake random)
- Memory pressure from model loading (1-4GB for CPU models)
- Ray Serve autoscaling interaction with Linkerd load balancing
- Request queuing behavior (burst β queue β timeout β circuit break)
- pgvector integration for RAG context retrieval (if applicable)
Image build strategy:
- Multi-stage Dockerfile: Python deps in layer 1, model weights volume-mounted
- No CUDA toolkit β image < 2GB (vs 8-12GB for full NEXUS)
- Nix:
nix build .#neutron-lite-imageusingpkgs.dockerTools.buildLayeredImage
Exit criteria:
- Inference round-trip (proxy β NATS β neutron β response) < 3s p99
- Model hot-reload without pod restart
- Memory stays < 4GB under sustained load
- Mesh overhead negligible vs inference latency (< 5% of total p99)
Stage 3: Full NEXUS (Phase 4 exit / Phase 5 entry)β
Image: Full NEXUS with CUDA, Ray, pgvector Scope: Production deployment β GPU inference, vector search, multi-model routing
spectre-proxy β neutron (HTTP :8000)
βββ POST /ingest β GPU inference (50ms-5s depending on model)
βββ POST /embed β vector embedding
βββ GET /health β 200 + GPU/VRAM status
βββ GET /metrics β inference_latency, vram_usage, queue_depth
βββ NATS: full event schema (ingest, result, error, cost)
Prerequisites:
- Bare-metal or GPU-enabled nodes (NVIDIA runtime, device plugin)
- Persistent volume for model weights (50-200GB)
- NATS JetStream for durable delivery (exactly-once on inference requests)
What changes from mesh perspective:
- ServiceProfile updated: POST /ingest timeout 10s β 30s (GPU inference slower)
- RetryBudget reduced: 20% β 5% (inference is expensive, don't retry aggressively)
- Linkerd load balancing: EWMA for latency-aware routing across GPU replicas
Exit criteria:
- GPU inference < 5s p99 through mesh
- VRAM-aware routing via custom metrics (HPA + Prometheus adapter)
- mTLS maintained with zero application changes from Stage 1
- ServiceProfile routes updated for new endpoints (/embed, /models)
Alternatives Consideredβ
Alt 1: Skip Directly to Full NEXUS (Rejected)β
Build and deploy the complete NEXUS image immediately.
Pros: No intermediate steps, production-ready sooner Cons:
- β CUDA image is 8-12GB β slow build, slow kind load
- β Requires GPU node (not available in kind cluster)
- β Debugging mesh issues mixed with GPU/CUDA issues
- β Blocks Phase 4 progress until GPU infrastructure ready
Verdict: Rejected. Coupling infrastructure validation with GPU complexity adds risk.
Alt 2: Keep Stub Forever, Test Real Backend Outside Mesh (Rejected)β
Keep go-httpbin as stub, validate NEXUS separately without mesh.
Pros: Simple, no new containers to build Cons:
- β Never validates NATS end-to-end through mesh
- β Mesh overhead under real payloads unknown until production
- β Circuit breaker behavior with real failure modes untested
Verdict: Rejected. Defeats the purpose of Phase 3β4 graduation.
Alt 3: Graduated Replacement (Selected) β β
Three stages: shim β lite β full (this ADR).
Pros:
- β Each stage independently deployable and testable
- β Zero changes to spectre-proxy between stages (same Service name/port)
- β Mesh infrastructure validated incrementally under increasing realism
- β Can run in kind (stages 1-2) or bare metal (stage 3)
Verdict: Selected. De-risks transition, validates mesh under progressively realistic conditions.
Consequencesβ
Positiveβ
- spectre-proxy is completely decoupled from the neutron backend implementation β
any container exposing HTTP :8000 with the
neutronService name works - Each stage produces measurable validation data (latency, throughput, failure modes) before committing to the next
- Stub β shim β lite β full path means Phase 4 can start immediately without GPU hardware
- Mesh configuration (ServiceProfile, mTLS, viz) is validated once and carried forward
Negative / Trade-offsβ
- Three container images to maintain during transition (stub, shim, lite)
- ServiceProfile timeouts must be updated per stage (10s β 30s for GPU)
- Stage 2 CPU inference is not representative of GPU latency β benchmarks are directional, not production baselines
What Stays the Same Across All Stagesβ
| Component | Value |
|---|---|
| Service name | neutron.default.svc.cluster.local |
| Service port | 8000 |
| Linkerd injection | linkerd.io/inject: enabled |
| mTLS | Automatic (SPIFFE identity) |
| spectre-proxy config | NEUTRON_URL=http://neutron.default.svc.cluster.local:8000 |
| Nix package | nix build .#neutron-{stub,shim,lite}-manifests |
Timelineβ
| Stage | Target | Blockers |
|---|---|---|
| Stage 1 (Shim) | Phase 4 start (Mar 2026) | None β can start now |
| Stage 2 (Lite) | Phase 4 mid (Apr 2026) | NEXUS Python deps in Nix |
| Stage 3 (Full) | Phase 4 exit (Jun 2026) | GPU nodes, NVIDIA runtime |
Referencesβ
- ADR-0040: Service Mesh Adoption (Linkerd)
- SPECTRE ROADMAP.md: Phase 4 Enterprise Features
nix/kubernetes/neutron-stub.nixβ current Stage 0 (go-httpbin)nix/kubernetes/default.nixβ Kubernetes manifest composition- NEXUS/neutron:
~/dev/low-level/neutron/(external repo)