Skip to main content

SPECTRE Framework - Architecture Decision Records

Document Type: Technical ADRs with Trade-Off Analysis Purpose: Support informed decision-making for SPECTRE integration strategy Audience: System Architect (kernelcore) Status: Draft for Review Last Updated: 2026-01-09


Table of Contents​

  1. Executive Summary
  2. ADR-001: Phase 1 Project Integration Priority
  3. ADR-002: Service Integration Pattern
  4. ADR-003: Python Service Deployment Strategy
  5. ADR-004: Secrets Management Architecture
  6. ADR-005: LLM Request Routing Strategy
  7. ADR-006: Observability Architecture
  8. ADR-007: Event Schema Versioning Strategy
  9. Project Prioritization Framework
  10. Integration Scenarios
  11. Risk Analysis
  12. References

Executive Summary​

This document provides rigorous trade-off analysis for critical architectural decisions in the SPECTRE framework integration. Each ADR presents 3-4 viable alternatives with quantitative and qualitative evaluation across 5 dimensions:

  1. Performance (latency, throughput, resource usage)
  2. Maintainability (code complexity, debugging, evolution)
  3. Complexity (implementation effort, learning curve)
  4. Cost (development time, operational overhead, cloud spend)
  5. Risk (technical debt, failure modes, rollback difficulty)

Key Principle: This document presents options, not prescriptions. Final decisions rest with the architect based on business priorities and constraints.


ADR-001: Phase 1 Project Integration Priority​

Context​

Phase 1 (Security Infrastructure) requires choosing 1-2 projects to integrate first. This decision establishes patterns for subsequent integrations and validates the event-driven architecture.

Constraints:

  • Must complete within 2 weeks (Phase 1 timeline)
  • Should demonstrate value quickly (proof of concept)
  • Should validate core SPECTRE capabilities (event bus, proxy, secrets)

Available Projects (Ranked by Maturity)​

ProjectMaturityTech StackIntegration ComplexityValue
securellm-bridgeProduction βœ…Rust (Axum, TLS, audit)MediumHigh
cognitive-vaultProduction βœ…Rust+Go (FFI, crypto)LowMedium
ml-offload-apiAlpha 🚧Rust (Axum, NVIDIA)MediumHigh
ai-agent-osAlpha 🚧Rust (6 crates, Hyprland)HighMedium
intelagentAlpha 🚧Rust (complex orchestration)Very HighVery High
ragtexBeta 🚧Python (LangChain, Vertex)MediumHigh
arch-analyzerBeta 🚧Python (async, caching)LowLow

Alternative 1: Start with cognitive-vault (Conservative)​

Approach: Extract crypto primitives from cognitive-vault into spectre-secrets

Pros:

  • βœ… Lowest complexity: Crypto library extraction is straightforward
  • βœ… No external dependencies: Self-contained Rust crate
  • βœ… Foundation for Phase 1: Secrets management is prerequisite for proxy auth
  • βœ… Low risk: Well-tested crypto stack (AES-256-GCM, Argon2id)

Cons:

  • ❌ Low immediate value: Secrets alone don't demo end-to-end flows
  • ❌ No event bus validation: Doesn't stress NATS architecture
  • ❌ Limited observability: Secrets operations are infrequent

Trade-Offs:

DimensionScoreRationale
Performance9/10In-memory crypto, no network I/O
Maintainability9/10Small, focused crate
Complexity10/10Direct code extraction
Cost10/102-3 days development
Risk10/10Proven crypto, no new patterns

Estimated Effort: 2-3 days Risk Level: Very Low Demo Value: Low (internal component)


Alternative 2: Start with securellm-bridge (Balanced)​

Approach: Wrap securellm-bridge HTTP endpoints to publish NATS events, keep existing API

Pros:

  • βœ… High demo value: End-to-end LLM request β†’ response flow
  • βœ… Validates event bus: Real-world request/reply pattern
  • βœ… Production-ready code: Mature codebase with TLS, rate limiting, audit
  • βœ… FinOps integration: Cost tracking already implemented
  • βœ… Clear success criteria: Measure latency overhead (target: <50ms)

Cons:

  • ❌ Dual-write complexity: Must maintain HTTP API + NATS simultaneously
  • ❌ Requires secrets management: Needs API key rotation (dependency on cognitive-vault)
  • ❌ Performance overhead: Additional hop through NATS (5-10ms latency)

Trade-Offs:

DimensionScoreRationale
Performance7/10+5-10ms latency from NATS hop
Maintainability7/10Dual-write increases complexity
Complexity6/10HTTP→NATS bridge pattern new
Cost6/105-7 days development
Risk7/10Backward compatibility concerns

Estimated Effort: 5-7 days Risk Level: Medium Demo Value: Very High (LLM requests demo'd)


Alternative 3: Start with ml-offload-api (Aggressive)​

Approach: Full NATS integration with VRAM monitoring events

Pros:

  • βœ… High technical value: Validates hardware monitoring β†’ event stream
  • βœ… FinOps critical: Local inference cost = $0, major selling point
  • βœ… Unique capabilities: VRAM-aware routing demonstrates intelligence
  • βœ… Event-rich: Publishes many event types (VRAM, inference, cost)

Cons:

  • ❌ Hardware dependency: Requires NVIDIA GPU (not in CI/CD)
  • ❌ Alpha maturity: Less stable than securellm-bridge
  • ❌ Complex integration: Model registry + backend abstraction
  • ❌ Testing challenges: GPU mocking for unit tests

Trade-Offs:

DimensionScoreRationale
Performance8/10Local inference, no API costs
Maintainability6/10Hardware-specific code complex
Complexity5/10GPU monitoring + NATS integration
Cost5/107-10 days development
Risk5/10Alpha code, hardware dependency

Estimated Effort: 7-10 days Risk Level: Medium-High Demo Value: High (local AI demo'd)


Approach:

  1. Days 1-3: Extract crypto from cognitive-vault β†’ spectre-secrets
  2. Days 4-10: Integrate securellm-bridge with NATS events + spectre-secrets auth

Pros:

  • βœ… Complete Phase 1: Both spectre-secrets and real integration
  • βœ… Sequential dependency: Secrets ready before securellm needs it
  • βœ… High demo value: End-to-end secure LLM gateway
  • βœ… Validates full stack: Crypto, events, proxy, observability
  • βœ… Clear milestone: Each sub-project has testable output

Cons:

  • ❌ Tight timeline: 10 days for both (risk of scope creep)
  • ❌ Sequential risk: Delay in secrets blocks securellm integration
  • ❌ Higher complexity: Two integrations simultaneously

Trade-Offs:

DimensionScoreRationale
Performance8/10Efficient crypto + acceptable NATS overhead
Maintainability8/10Two clean integrations
Complexity6/10Sequential execution reduces risk
Cost5/1010 days total
Risk6/10Tight timeline but sequential dependencies

Estimated Effort: 10 days (2 weeks) Risk Level: Medium Demo Value: Very High (secure LLM gateway with cost tracking)


Recommendation Matrix​

AlternativeEffortRiskDemo ValueStrategic FitOverall Score
Alt 1: vault only2-3dVery LowLow6/106.5/10
Alt 2: securellm5-7dMediumVery High9/108/10
Alt 3: ml-offload7-10dMedium-HighHigh8/107/10
Alt 4: vault + securellm10dMediumVery High10/108.5/10 βœ…

Rationale:

  1. Completes entire Phase 1 scope (secrets + proxy validation)
  2. Demonstrates complete value chain: secure request β†’ LLM β†’ cost tracking
  3. Sequential execution reduces risk (secrets first, then securellm)
  4. Fits 2-week Phase 1 timeline with buffer

Decision Criteria for Architect:

  • If timeline is critical β†’ Choose Alt 1 (cognitive-vault only)
  • If demo value paramount β†’ Choose Alt 2 (securellm-bridge)
  • If FinOps is priority β†’ Choose Alt 3 (ml-offload-api)
  • If complete Phase 1 β†’ Choose Alt 4 (recommended)

Success Metrics:

  • spectre-secrets: Encrypt/decrypt with <1ms latency, rotation in <5s
  • securellm-bridge: NATS overhead <50ms p99, zero data loss
  • Integration tests: 95% coverage, CI/CD green

ADR-002: Service Integration Pattern​

Context​

Domain services (Rust/Python) must integrate with SPECTRE event bus. The integration pattern affects performance, maintainability, and evolution of the architecture.

Problem Statement​

How should external services communicate with SPECTRE?

Requirements:

  • Support both Rust and Python services
  • Minimal latency overhead (<50ms p99)
  • Backward compatibility with existing HTTP APIs
  • Observable (request tracing, metrics)
  • Testable (unit + integration tests)

Alternative 1: Direct NATS Integration (Native Pattern)​

Approach: Services directly use NATS client libraries

// Rust service
use spectre_events::EventBus;

#[tokio::main]
async fn main() {
let bus = EventBus::connect("nats://localhost:4222").await?;
let mut sub = bus.subscribe("llm.request.v1").await?;

while let Some(msg) = sub.next().await {
let response = handle_request(msg).await?;
bus.publish(&response).await?;
}
}

Pros:

  • βœ… Lowest latency: Direct connection, no middleware
  • βœ… Simplest architecture: No additional components
  • βœ… Best observability: Native NATS tracing
  • βœ… Scales horizontally: Queue groups for load balancing

Cons:

  • ❌ Requires code changes: Must refactor existing services
  • ❌ No HTTP backward compatibility: Breaking change for HTTP clients
  • ❌ Language-specific clients: Different APIs for Rust vs Python

Trade-Offs:

DimensionScoreRationale
Performance10/10No middleware, native NATS
Maintainability8/10Simple but requires refactor
Complexity7/10NATS learning curve
Cost7/10Refactor effort per service
Risk8/10Breaking changes for HTTP clients

Latency: 5-10ms (NATS pub/sub) Effort: Medium (per service refactor) Backward Compatibility: ❌ None


Alternative 2: HTTP Bridge Pattern (Hybrid)​

Approach: Keep existing HTTP APIs, add NATS event publishing

// Existing HTTP handler
async fn handle_completion(req: ChatRequest) -> Result<ChatResponse> {
// NEW: Publish event
bus.publish(&Event::new(
EventType::LlmRequest,
service_id,
serde_json::to_value(&req)?
)).await?;

// Existing logic
let response = provider.complete(req).await?;

// NEW: Publish response event
bus.publish(&Event::new(
EventType::LlmResponse,
service_id,
serde_json::to_value(&response)?
)).await?;

Ok(response)
}

Pros:

  • βœ… Backward compatible: HTTP API unchanged
  • βœ… Incremental migration: Can dual-write during transition
  • βœ… Low risk: Existing clients unaffected
  • βœ… Observability added: Events for monitoring without breaking API

Cons:

  • ❌ Dual-write complexity: Maintain two interfaces
  • ❌ Event/HTTP inconsistency risk: Can diverge over time
  • ❌ Performance overhead: Serialize twice (HTTP + NATS)
  • ❌ Technical debt: Eventually need to migrate away from HTTP

Trade-Offs:

DimensionScoreRationale
Performance7/10+2-5ms for event publishing
Maintainability6/10Dual-write increases complexity
Complexity7/10Familiar HTTP + new events
Cost8/10Minimal refactor per service
Risk9/10Low risk, backward compatible

Latency: HTTP baseline + 2-5ms Effort: Low (add event publishing) Backward Compatibility: βœ… Full


Alternative 3: Sidecar Proxy Pattern (Enterprise)​

Approach: Deploy spectre-proxy as sidecar, intercepts HTTP and publishes events

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Service Container β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Service │───▢│ Sidecar β”‚ β”‚
β”‚ β”‚ (HTTP) β”‚ β”‚ spectre- β”‚ β”‚
β”‚ β”‚ │◀───│ proxy β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
NATS Event Bus

Pros:

  • βœ… Zero code changes: Service unaware of NATS
  • βœ… Polyglot support: Works with any HTTP service
  • βœ… Centralized policy: Auth, rate limiting in proxy
  • βœ… Ops-friendly: Sidecar managed independently

Cons:

  • ❌ High complexity: Requires Kubernetes/Docker Compose orchestration
  • ❌ Performance overhead: Extra network hop (5-15ms)
  • ❌ Operational burden: More containers to manage
  • ❌ Debugging harder: Network issues obscured by sidecar

Trade-Offs:

DimensionScoreRationale
Performance6/10+5-15ms from network hop
Maintainability7/10Centralized but more moving parts
Complexity4/10Sidecar orchestration complex
Cost5/10Infrastructure overhead
Risk6/10Network failure modes

Latency: 5-15ms (localhost sidecar) Effort: High (infrastructure setup) Backward Compatibility: βœ… Full


Alternative 4: Subprocess Wrapper Pattern (Simple)​

Approach: SPECTRE spawns services as subprocesses, captures stdio events

// spectre-orchestrator
let mut child = Command::new("python")
.arg("ragtex/main.py")
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn()?;

// Send request via stdin
writeln!(child.stdin, "{}", request_json)?;

// Read response from stdout
let response = BufReader::new(child.stdout).lines().next()?;
bus.publish(&Event::new(EventType::RagResponse, service_id, response)).await?;

Pros:

  • βœ… Simplest integration: No service code changes
  • βœ… Language agnostic: Works with any executable
  • βœ… Process isolation: Crashes don't affect SPECTRE

Cons:

  • ❌ Poor performance: Process spawn overhead (50-100ms)
  • ❌ Limited scalability: Can't horizontally scale
  • ❌ No request-reply: Async only (can't handle sync requests well)
  • ❌ Debugging nightmare: Stdio piping fragile

Trade-Offs:

DimensionScoreRationale
Performance3/1050-100ms process spawn
Maintainability5/10Simple but fragile
Complexity8/10Just spawn processes
Cost9/10No integration code needed
Risk4/10Process management issues

Latency: 50-100ms (process spawn) Effort: Very Low Backward Compatibility: βœ… Full


Recommendation Matrix​

PatternPerformanceMaintainabilityComplexityCostRiskOverall
Native NATS10/108/107/107/108/108/10 βœ…
HTTP Bridge7/106/107/108/109/107.4/10
Sidecar Proxy6/107/104/105/106/105.6/10
Subprocess3/105/108/109/104/105.8/10

Phase 1-2 (Short-term): Use HTTP Bridge Pattern for initial integration

  • Minimal disruption, backward compatible
  • Validate event schemas and observability

Phase 3+ (Long-term): Migrate to Native NATS Integration

  • Optimal performance and simplicity
  • Remove HTTP dual-write technical debt

Rationale:

  • De-risks Phase 1 with incremental approach
  • Allows validating event patterns before full commitment
  • Provides clear migration path

Decision Criteria:

  • If performance critical β†’ Use Native NATS immediately
  • If backward compat critical β†’ Use HTTP Bridge
  • If polyglot at scale β†’ Use Sidecar Proxy
  • If quick prototype β†’ Use Subprocess (temp only)

ADR-003: Python Service Deployment Strategy​

Context​

Two Python services exist: ragtex (RAG system) and arch-analyzer (NixOS analysis). They must integrate with Rust-based SPECTRE infrastructure.

Problem Statement​

How should Python services be deployed and managed?

Requirements:

  • Support async Python (asyncio)
  • Integrate with NATS event bus
  • Reproducible deployments
  • Observable and debuggable
  • Cost-effective (no unnecessary Docker overhead)

Alternative 1: Docker Containers (Industry Standard)​

Approach: Containerize Python services with nats-py client

# Dockerfile
FROM python:3.13-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "main.py"]
# docker-compose.yml
services:
ragtex:
build: ~/dev/low-level/ragtex
environment:
- NATS_URL=nats://nats:4222
depends_on:
- nats

Pros:

  • βœ… Industry standard: Well-understood pattern
  • βœ… Isolation: Dependencies don't conflict
  • βœ… Portability: Runs anywhere Docker runs
  • βœ… Resource limits: Can cap CPU/memory via Docker

Cons:

  • ❌ Overhead: 50-100MB per container + startup time
  • ❌ Dev friction: Rebuild image on every code change
  • ❌ Debugging harder: Need to exec into container
  • ❌ Not Nix-native: Doesn't leverage NixOS reproducibility

Trade-Offs:

DimensionScoreRationale
Performance7/10Container overhead minimal
Maintainability8/10Standard tooling
Complexity7/10Docker learning curve
Cost7/10Disk space + build time
Risk9/10Proven pattern

Resource Usage: ~100MB RAM per service Startup Time: 2-5 seconds Development Friction: Medium (rebuild on changes)


Alternative 2: Nix Flake + systemd Service (NixOS Native) ⭐​

Approach: Define Python environment in flake.nix, run as systemd service

# ragtex/flake.nix
{
outputs = { self, nixpkgs }: {
packages.x86_64-linux.default = nixpkgs.legacyPackages.x86_64-linux.python3Packages.buildPythonApplication {
pname = "ragtex";
version = "0.1.0";
src = ./.;
propagatedBuildInputs = with nixpkgs.legacyPackages.x86_64-linux.python3Packages; [
nats-py
langchain
# ...
];
};

nixosModules.ragtex = { config, lib, pkgs, ... }: {
systemd.services.ragtex = {
description = "RAGTeX NATS Service";
wantedBy = [ "multi-user.target" ];
after = [ "nats.service" ];
serviceConfig = {
ExecStart = "${self.packages.x86_64-linux.default}/bin/ragtex";
Restart = "always";
Environment = "NATS_URL=nats://localhost:4222";
};
};
};
};
}

Pros:

  • βœ… Nix-native: Leverages NixOS declarative config
  • βœ… Zero overhead: Native process, no container
  • βœ… Reproducible: Exact dependencies via Nix
  • βœ… systemd integration: Logs via journalctl, auto-restart

Cons:

  • ❌ NixOS-specific: Won't work on non-NixOS systems
  • ❌ Learning curve: Nix ecosystem complex
  • ❌ Slower builds: Nix builds can be slow initially

Trade-Offs:

DimensionScoreRationale
Performance10/10Native process, no overhead
Maintainability9/10Declarative, version-controlled
Complexity6/10Nix learning curve steep
Cost9/10No container overhead
Risk7/10NixOS-specific, less portable

Resource Usage: ~30MB RAM (native Python) Startup Time: <1 second Development Friction: Low (nix develop)


Alternative 3: Embedded PyO3 (Rust-Python Bridge)​

Approach: Embed Python interpreter in Rust using PyO3

// spectre-python-bridge/src/lib.rs
use pyo3::prelude::*;

pub fn run_ragtex(query: &str) -> PyResult<String> {
Python::with_gil(|py| {
let ragtex = PyModule::import(py, "ragtex")?;
let result = ragtex.getattr("query")?.call1((query,))?;
result.extract()
})
}

Pros:

  • βœ… Single binary: No separate deployment
  • βœ… Low latency: No IPC overhead
  • βœ… Type safety: Rust types at boundary

Cons:

  • ❌ GIL contention: Python Global Interpreter Lock limits parallelism
  • ❌ Crash risk: Python crash kills Rust process
  • ❌ Complex builds: PyO3 + Python deps in Nix
  • ❌ Debugging nightmare: Mixed-language stack traces

Trade-Offs:

DimensionScoreRationale
Performance6/10GIL limits concurrency
Maintainability4/10Very complex to debug
Complexity3/10PyO3 + build complexity high
Cost5/10High initial effort
Risk4/10Crash isolation poor

Resource Usage: Shared with Rust process Startup Time: Instant (already embedded) Development Friction: Very High


Alternative 4: Hybrid - Docker for Python, Native for Rust​

Approach: Python services in Docker, Rust services native

# docker-compose.yml
services:
# Python services (Docker)
ragtex:
build: ~/dev/low-level/ragtex
networks: [spectre-net]

arch-analyzer:
build: ~/dev/low-level/arch-analyzer
networks: [spectre-net]

# Rust services (native systemd)
# Managed by NixOS configuration.nix

Pros:

  • βœ… Best of both worlds: Docker where it makes sense, native for Rust
  • βœ… Pragmatic: Doesn't force Nix on Python devs
  • βœ… Flexible: Can migrate Python to Nix later

Cons:

  • ❌ Inconsistent: Two deployment methods to maintain
  • ❌ Docker still required: Can't eliminate Docker entirely

Trade-Offs:

DimensionScoreRationale
Performance8/10Rust native + acceptable Python overhead
Maintainability7/10Two systems to maintain
Complexity6/10Manageable split
Cost8/10Good pragmatic balance
Risk8/10Standard patterns for each

Recommendation Matrix​

AlternativePerformanceMaintainabilityComplexityCostRiskOverall
Docker7/108/107/107/109/107.6/10
Nix + systemd10/109/106/109/107/108.2/10 βœ…
PyO3 Embedded6/104/103/105/104/104.4/10
Hybrid8/107/106/108/108/107.4/10

Rationale:

  1. NixOS-native: You're already on NixOS, leverage it fully
  2. Best performance: Native processes, no container overhead
  3. Declarative: Entire stack in configuration.nix
  4. Reproducible: Exact versions pinned in flake.lock

Migration Path:

  • Phase 1: Start with Docker (faster to prototype)
  • Phase 2: Migrate to Nix+systemd once patterns validated

Decision Criteria:

  • If NixOS environment β†’ Use Nix + systemd (recommended)
  • If need portability β†’ Use Docker
  • If single binary important β†’ Consider PyO3 (carefully)
  • If mixed team β†’ Use Hybrid approach

ADR-004: Secrets Management Architecture​

Context​

SPECTRE requires secure credential storage and automatic rotation for:

  • LLM API keys (OpenAI, Anthropic, Vertex AI)
  • Database passwords (TimescaleDB, Neo4j)
  • Service authentication tokens
  • TLS certificates

Existing Asset: cognitive-vault has production-ready crypto (AES-256-GCM, Argon2id)

Problem Statement​

Should we extract cognitive-vault crypto into spectre-secrets, or integrate vault directly?

Alternative 1: Extract Crypto Library (Minimal)​

Approach: Copy crypto primitives from cognitive-vault into spectre-secrets

// spectre-secrets/src/crypto.rs (extracted)
pub struct CryptoEngine {
key: SecretKey,
}

impl CryptoEngine {
pub fn new(password: &str, salt: &[u8]) -> Result<Self> {
let key = argon2_derive_key(password, salt)?;
Ok(Self { key })
}

pub fn encrypt(&self, plaintext: &[u8]) -> Result<Vec<u8>> {
aes_gcm_encrypt(&self.key, plaintext)
}

pub fn decrypt(&self, ciphertext: &[u8]) -> Result<Vec<u8>> {
aes_gcm_decrypt(&self.key, ciphertext)
}
}

Pros:

  • βœ… Clean separation: No dependency on cognitive-vault crate
  • βœ… SPECTRE-specific: Can optimize for SPECTRE use case
  • βœ… No FFI: Pure Rust, no Go dependencies

Cons:

  • ❌ Code duplication: cognitive-vault and spectre-secrets diverge
  • ❌ Security risk: Might miss security updates in cognitive-vault
  • ❌ Lost features: cognitive-vault has CLI, backup, etc.

Trade-Offs:

DimensionScoreRationale
Performance10/10Optimized for SPECTRE
Maintainability6/10Code duplication
Complexity8/10Simple extraction
Cost8/102-3 days extraction
Risk7/10Security divergence risk

Effort: 2-3 days Ongoing Maintenance: Medium (sync security fixes)


Alternative 2: Depend on cognitive-vault Crate (DRY)​

Approach: Add cognitive-vault as Git dependency

# spectre-secrets/Cargo.toml
[dependencies]
vault_core = { git = "https://github.com/kernelcore/cognitive-vault", branch = "main" }

Pros:

  • βœ… No duplication: Single source of truth
  • βœ… Automatic updates: Benefit from cognitive-vault improvements
  • βœ… Proven code: Battle-tested crypto

Cons:

  • ❌ External dependency: SPECTRE depends on separate repo
  • ❌ Version coupling: Breaking changes in vault break SPECTRE
  • ❌ Go FFI complexity: Brings Go build into SPECTRE (if using CLI)

Trade-Offs:

DimensionScoreRationale
Performance10/10Same as extraction
Maintainability9/10DRY principle
Complexity7/10External dependency management
Cost10/10Minimal integration work
Risk6/10Coupling to external repo

Effort: 1 day integration Ongoing Maintenance: Low (upstream does work)


Alternative 3: Secrets Service Pattern (Microservice)​

Approach: Run cognitive-vault as separate NATS service

Client β†’ NATS secrets.retrieve.v1 β†’ cognitive-vault service β†’ NATS secrets.response.v1 β†’ Client

Pros:

  • βœ… Loose coupling: cognitive-vault evolves independently
  • βœ… Centralized secrets: Single service manages all credentials
  • βœ… Polyglot access: Any language can request secrets via NATS

Cons:

  • ❌ Network latency: +5-10ms per secret retrieval
  • ❌ Single point of failure: If vault down, all services blocked
  • ❌ Operational overhead: Another service to deploy/monitor

Trade-Offs:

DimensionScoreRationale
Performance6/10Network round-trip overhead
Maintainability8/10Clean separation
Complexity6/10Service orchestration
Cost7/104-5 days integration
Risk5/10Single point of failure

Effort: 4-5 days Ongoing Maintenance: Medium (service management)


Alternative 4: Hybrid - Extract Crypto, Keep CLI Separate​

Approach:

  1. Create spectre-crypto lib crate (extracted primitives)
  2. cognitive-vault CLI uses spectre-crypto (shared library)
  3. spectre-secrets uses spectre-crypto directly
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ spectre-crypto (lib) β”‚ ← Shared crypto primitives
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚
β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ spectre-β”‚ β”‚ cognitive-vaultβ”‚
β”‚ secrets β”‚ β”‚ CLI (Go FFI) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Pros:

  • βœ… Best of both worlds: Shared crypto, independent tools
  • βœ… No duplication: Both use same library
  • βœ… Backward compat: cognitive-vault CLI unchanged

Cons:

  • ❌ Refactor cognitive-vault: Needs restructuring
  • ❌ More crates: Adds spectre-crypto to workspace

Trade-Offs:

DimensionScoreRationale
Performance10/10Direct library access
Maintainability9/10Shared code, clear boundaries
Complexity6/10Requires refactoring
Cost6/105-6 days refactor + integration
Risk8/10Clean architecture

Effort: 5-6 days Ongoing Maintenance: Low (shared crate)


Recommendation Matrix​

AlternativePerformanceMaintainabilityComplexityCostRiskOverall
Extract Crypto10/106/108/108/107/107.8/10
Depend on Vault10/109/107/1010/106/108.4/10
Secrets Service6/108/106/107/105/106.4/10
Hybrid (Shared Lib)10/109/106/106/108/107.8/10 βœ…

Phase 1 (Quick Start): Use Alternative 2 (Depend on vault_core)

  • Fast integration (1 day)
  • Validate secrets management patterns

Phase 2 (Long-term): Refactor to Alternative 4 (Shared Lib)

  • Extract spectre-crypto crate
  • Both SPECTRE and cognitive-vault use it

Rationale:

  1. De-risks Phase 1: Get secrets working quickly
  2. Future-proof: Shared library is clean long-term architecture
  3. Incremental: Can refactor after validating patterns

Decision Criteria:

  • If time critical β†’ Use Alt 2 (depend on vault)
  • If long-term architecture β†’ Use Alt 4 (shared lib)
  • If polyglot important β†’ Consider Alt 3 (service)

ADR-005: LLM Request Routing Strategy​

Context​

Multiple LLM providers available:

  • Vertex AI (Gemini) - Cloud, powerful, expensive ($0.05-0.20 per request)
  • ml-offload-api (llama.cpp) - Local, fast, free, limited by VRAM
  • securellm-bridge (OpenAI/DeepSeek/Anthropic) - Proxy with unified API

Goal: Minimize cost (FinOps) while maintaining quality

Problem Statement​

How should SPECTRE route LLM requests to optimize cost vs quality?

Alternative 1: Simple Priority (Local-First)​

Approach: Try local first, failover to cloud

async fn route_llm_request(req: LlmRequest) -> Result<LlmResponse> {
// Try ml-offload-api (local, free)
match ml_offload.complete(req.clone()).await {
Ok(response) => return Ok(response),
Err(VramExhausted) => {
// Failover to Vertex AI
return vertex_ai.complete(req).await;
}
}
}

Pros:

  • βœ… Maximizes cost savings: Local inference free
  • βœ… Simple logic: Easy to understand
  • βœ… Fast for simple queries: Local has lower latency

Cons:

  • ❌ Quality issues: Local models less capable for complex tasks
  • ❌ No load balancing: All requests hit local until exhausted
  • ❌ No quality feedback: Can't learn from failures

Trade-Offs:

DimensionScoreRationale
Performance8/10Local fast, cloud slower
Maintainability9/10Simple code
Complexity9/10Trivial logic
Cost9/10Maximal local usage
Risk6/10Quality unpredictable

Cost Savings: ~80% (if local handles 80% of requests) Latency: Local 100-500ms, Cloud 1-3s Quality: Variable


Alternative 2: Intelligent Routing (Complexity-Based)​

Approach: Route based on request complexity

async fn route_llm_request(req: LlmRequest) -> Result<LlmResponse> {
let complexity = analyze_complexity(&req);

match complexity {
Complexity::Simple => ml_offload.complete(req).await,
Complexity::Moderate if ml_offload.vram_available() > 2GB => {
ml_offload.complete(req).await
},
Complexity::Moderate | Complexity::Complex => {
vertex_ai.complete(req).await
}
}
}

fn analyze_complexity(req: &LlmRequest) -> Complexity {
if req.prompt.len() > 2000 { return Complexity::Complex; }
if req.requires_reasoning { return Complexity::Complex; }
if req.context_length > 8000 { return Complexity::Moderate; }
Complexity::Simple
}

Pros:

  • βœ… Cost-quality balance: Routes appropriately
  • βœ… VRAM-aware: Checks availability before routing
  • βœ… Measurable: Can track routing decisions

Cons:

  • ❌ Heuristic tuning: analyze_complexity() needs calibration
  • ❌ More complex: Additional logic to maintain
  • ❌ False positives: May route simple tasks to cloud unnecessarily

Trade-Offs:

DimensionScoreRationale
Performance9/10Optimal latency per complexity
Maintainability7/10Heuristics need tuning
Complexity6/10Non-trivial routing logic
Cost8/10Good balance (60-70% local)
Risk7/10Heuristics can be wrong

Cost Savings: ~60-70% Latency: Optimized per complexity Quality: High (complex β†’ cloud)


Alternative 3: ML-Based Routing (Advanced)​

Approach: Train small classifier to predict best provider

async fn route_llm_request(req: LlmRequest) -> Result<LlmResponse> {
let features = extract_features(&req); // prompt length, keywords, etc.
let prediction = routing_model.predict(features); // local vs cloud

match prediction {
Provider::Local => ml_offload.complete(req).await,
Provider::Cloud => vertex_ai.complete(req).await,
}
}

// Background task: Train model on historical requests
async fn train_routing_model() {
let history = load_historical_requests().await;
let labels = history.iter().map(|r| {
if r.local_succeeded && r.quality_score > 0.8 { Provider::Local }
else { Provider::Cloud }
});
routing_model.train(history, labels).await;
}

Pros:

  • βœ… Optimal routing: Learns from data
  • βœ… Adaptive: Improves over time
  • βœ… Cost-aware: Can optimize for cost vs quality tradeoff

Cons:

  • ❌ High complexity: Requires ML pipeline (training, inference)
  • ❌ Cold start problem: Needs historical data
  • ❌ Operational overhead: Model training, versioning, deployment

Trade-Offs:

DimensionScoreRationale
Performance10/10Optimal learned routing
Maintainability5/10ML pipeline complex
Complexity3/10Requires ML infrastructure
Cost9/10Best cost optimization
Risk5/10Model drift, cold start issues

Cost Savings: ~70-80% (optimal learned) Latency: Optimized Quality: High (learned from feedback)


Alternative 4: Cost-Capped Round Robin​

Approach: Use local until daily cost budget, then round-robin cloud

async fn route_llm_request(req: LlmRequest) -> Result<LlmResponse> {
if daily_cloud_cost() < DAILY_BUDGET {
// Still in budget, prefer cloud for quality
return vertex_ai.complete(req).await;
}

// Over budget, use local only
ml_offload.complete(req).await
}

Pros:

  • βœ… Budget guarantee: Never exceed cost
  • βœ… Simple logic: Easy to implement
  • βœ… Quality-first: Uses cloud while budget allows

Cons:

  • ❌ End-of-day degradation: All local at month-end
  • ❌ No optimization: Doesn't minimize cost intelligently
  • ❌ Poor UX: Quality drops suddenly when budget hit

Trade-Offs:

DimensionScoreRationale
Performance7/10Cloud until budget, then local
Maintainability9/10Simple budget check
Complexity9/10Trivial logic
Cost7/10Guarantees budget but not optimal
Risk6/10Poor UX at budget limit

Cost Savings: Fixed by budget Latency: Cloud baseline until budget Quality: High until budget, then degrades


Recommendation Matrix​

AlternativePerformanceMaintainabilityComplexityCostRiskOverall
Local-First8/109/109/109/106/108.2/10
Intelligent Routing9/107/106/108/107/107.4/10 βœ…
ML-Based10/105/103/109/105/106.4/10
Cost-Capped7/109/109/107/106/107.6/10

Phase 1 (Immediate): Start with Alternative 1 (Local-First)

  • Simple, fast to implement
  • Validate cost savings hypothesis

Phase 2 (Month 2): Upgrade to Alternative 2 (Intelligent Routing)

  • Add complexity heuristics
  • Measure cost vs quality tradeoff

Phase 3 (Future): Consider Alternative 3 (ML-Based) if data justifies

  • Only if historical data shows clear patterns
  • Requires dedicated ML engineer

Rationale:

  1. Incremental complexity: Start simple, add sophistication based on data
  2. Data-driven: Phase 1 generates data to inform Phase 2 heuristics
  3. Risk mitigation: Avoid premature optimization (ML-based routing)

Decision Criteria:

  • If cost paramount β†’ Use Alt 1 (local-first) aggressively
  • If quality paramount β†’ Use Alt 4 (cost-capped) with high budget
  • If balanced β†’ Use Alt 2 (intelligent routing) βœ…
  • If ML expertise available β†’ Consider Alt 3 long-term

ADR-006: Observability Architecture​

Context​

SPECTRE must provide:

  • Real-time metrics (request latency, throughput, error rates)
  • Cost tracking (per service, per user, per request)
  • Distributed tracing (correlation IDs across services)
  • Anomaly detection (ML-based alerts)
  • Dependency visualization (service graph)

Infrastructure Available: TimescaleDB (time-series), Neo4j (graph), NATS (event stream)

Problem Statement​

How should observability data flow through the system?

Alternative 1: Centralized Collector (Simple)​

Approach: Single spectre-observability service subscribes to all events

All Services β†’ NATS (wildcard *.*.v1) β†’ spectre-observability β†’ TimescaleDB/Neo4j

Pros:

  • βœ… Simple architecture: One service handles all observability
  • βœ… Centralized logic: Anomaly detection in one place
  • βœ… Easy to reason about: Clear data flow

Cons:

  • ❌ Single point of failure: If observability down, lose all metrics
  • ❌ Bottleneck: All events funnel through one service
  • ❌ Scaling challenges: Hard to horizontally scale

Trade-Offs:

DimensionScoreRationale
Performance7/10Potential bottleneck at scale
Maintainability9/10Simple, single service
Complexity9/10Straightforward
Cost9/10Low operational overhead
Risk6/10Single point of failure

Throughput: ~10K events/sec (single instance) Latency: 5-10ms (async processing) Operational Complexity: Low


Alternative 2: Distributed Collectors (Scalable)​

Approach: Multiple observability workers with queue groups

All Services β†’ NATS (wildcard) β†’ [Collector 1, Collector 2, Collector 3] β†’ Databases
(Queue Group: load balanced)

Pros:

  • βœ… Horizontal scaling: Add workers as load increases
  • βœ… High availability: Workers can fail without data loss
  • βœ… Load balanced: NATS queue groups distribute work

Cons:

  • ❌ Consistency challenges: Multiple writers to databases
  • ❌ More complex: Worker coordination needed
  • ❌ Anomaly detection harder: State distributed across workers

Trade-Offs:

DimensionScoreRationale
Performance10/10Linear scaling with workers
Maintainability7/10Worker coordination complexity
Complexity6/10Distributed system challenges
Cost7/10More infrastructure
Risk8/10Better fault tolerance

Throughput: ~100K events/sec (10 workers) Latency: 5-10ms (unchanged) Operational Complexity: Medium


Alternative 3: Embedded Observability (Performance)​

Approach: Each service writes directly to TimescaleDB/Neo4j

Each Service β†’ TimescaleDB (metrics) + Neo4j (dependencies) directly
β†’ NATS (for real-time dashboard only)

Pros:

  • βœ… Lowest latency: No intermediary
  • βœ… No SPOF: Observability failures don't cascade
  • βœ… Simple per service: Each service manages own metrics

Cons:

  • ❌ Tight coupling: Services depend on observability DBs
  • ❌ Credentials everywhere: Every service needs DB passwords
  • ❌ Anomaly detection fragmented: No central view

Trade-Offs:

DimensionScoreRationale
Performance10/10Direct writes, no middleware
Maintainability5/10Fragmented, tight coupling
Complexity6/10Every service has DB logic
Cost8/10No observability service
Risk5/10Coupling risk high

Throughput: Limited by DB capacity Latency: 1-2ms (direct write) Operational Complexity: High (credentials mgmt)


Alternative 4: Hybrid - Central Collector + Service-Level Caching​

Approach: Services emit events, observability caches hot metrics

Services β†’ NATS β†’ spectre-observability (caches hot metrics in Redis) β†’ TimescaleDB
↓
Real-time Dashboard (reads from Redis)

Pros:

  • βœ… Real-time dashboard: Hot metrics in Redis (ms latency)
  • βœ… Historical analysis: Full data in TimescaleDB
  • βœ… Decoupled: Services don't know about observability storage

Cons:

  • ❌ Redis dependency: Another component to manage
  • ❌ Cache consistency: Hot vs cold data can diverge
  • ❌ More complex: Two storage tiers

Trade-Offs:

DimensionScoreRationale
Performance9/10Redis for hot data, DB for cold
Maintainability7/10Two-tier storage
Complexity6/10Cache invalidation complexity
Cost7/10Redis + TimescaleDB
Risk7/10Cache consistency risks

Throughput: ~50K events/sec Latency: <1ms (Redis reads), 10ms (DB writes) Operational Complexity: Medium


Recommendation Matrix​

AlternativePerformanceMaintainabilityComplexityCostRiskOverall
Centralized7/109/109/109/106/108/10 βœ…
Distributed10/107/106/107/108/107.6/10
Embedded10/105/106/108/105/106.8/10
Hybrid Cache9/107/106/107/107/107.2/10

Phase 1-2: Use Alternative 1 (Centralized Collector)

  • Simplest to implement and debug
  • Sufficient for initial load (< 10K events/sec)
  • Single service to monitor

Phase 3+: Migrate to Alternative 2 (Distributed Collectors) if needed

  • Only if load exceeds 10K events/sec
  • Horizontal scaling when necessary

Rationale:

  1. YAGNI: Don't optimize for scale you don't have yet
  2. Incremental: Easy migration path (just add workers)
  3. Data-driven: Phase 1 will reveal actual load

Decision Criteria:

  • If load < 10K events/sec β†’ Use Alt 1 (centralized) βœ…
  • If load > 10K events/sec β†’ Use Alt 2 (distributed)
  • If real-time dashboard critical β†’ Consider Alt 4 (hybrid cache)
  • If zero observability overhead β†’ Consider Alt 3 (embedded)

ADR-007: Event Schema Versioning Strategy​

Context​

Event schemas will evolve over time. Breaking changes must not disrupt services.

Problem Statement​

How to handle event schema evolution?

Alternative 1: Semantic Versioning in Subject​

Approach: Include version in NATS subject

llm.request.v1 β†’ Current production
llm.request.v2 β†’ New version with breaking changes

Migration: Services subscribe to both during transition

// Publisher (new)
bus.publish("llm.request.v2", new_schema).await;

// Subscriber (during migration)
let mut sub_v1 = bus.subscribe("llm.request.v1").await?;
let mut sub_v2 = bus.subscribe("llm.request.v2").await?;
tokio::select! {
msg = sub_v1.next() => handle_v1(msg),
msg = sub_v2.next() => handle_v2(msg),
}

Pros:

  • βœ… Explicit versioning: Clear which version used
  • βœ… Backward compatible: Old services keep working
  • βœ… Gradual migration: Can dual-subscribe during transition

Cons:

  • ❌ Subject proliferation: Many v1/v2/v3 subjects
  • ❌ Cleanup burden: Must deprecate old versions

Recommended: βœ… This is the standard approach


Project Prioritization Framework​

Evaluation Matrix​

ProjectMaturityComplexityValueRiskDependenciesPriority Score
cognitive-vault10/102/106/102/10None7.5/10 πŸ₯‡
securellm-bridge10/106/1010/104/10vault (auth)7.2/10 πŸ₯ˆ
ml-offload-api6/106/109/106/10GPU hardware6.0/10 πŸ₯‰
ragtex7/105/108/105/10Vertex AI5.8/10
arch-analyzer7/103/104/103/10None5.0/10
ai-agent-os6/107/105/105/10Hyprland (opt)4.8/10
intelagent6/1010/1010/108/10DAO, ZK circuits4.4/10

Scoring Methodology​

Maturity (10 = production, 0 = prototype):

  • Code quality, test coverage, documentation
  • Production usage history

Complexity (10 = trivial, 0 = very complex):

  • Integration effort (person-days)
  • Dependencies and prerequisites

Value (10 = critical, 0 = nice-to-have):

  • Business impact
  • Architectural demonstration value

Risk (10 = no risk, 0 = high risk):

  • Technical debt
  • Failure blast radius
  • Rollback difficulty

Priority Score: Weighted average (Maturity: 30%, Complexity: 20%, Value: 30%, Risk: 20%)


Integration Scenarios​

Scenario A: Conservative (Low Risk)​

Timeline: 4 weeks Projects: cognitive-vault β†’ arch-analyzer β†’ ai-agent-os

Rationale:

  • Start with simplest integrations
  • Build confidence before complex projects
  • Minimal external dependencies

Pros: Low risk, steady progress Cons: Low demo value early


Timeline: 6 weeks Projects: cognitive-vault + securellm-bridge β†’ ml-offload-api β†’ ragtex

Rationale:

  • Phase 1: Secrets + LLM gateway (high value)
  • Phase 2: Local inference (cost savings demo)
  • Phase 3: RAG system (AI capabilities)

Pros: Good balance of risk and value Cons: Moderate complexity


Scenario C: Aggressive (High Value)​

Timeline: 8 weeks Projects: cognitive-vault + securellm-bridge β†’ intelagent β†’ ml-offload-api

Rationale:

  • Tackle most valuable project (intelagent) early
  • Accept higher risk for higher reward
  • DAO governance showcases innovation

Pros: Maximum value demonstrated Cons: High complexity, technical risk


Risk Analysis​

Technical Risks​

RiskProbabilityImpactMitigation
NATS performance bottleneckLowHighLoad test Phase 0, distributed collectors
Event schema changes break servicesMediumMediumSemantic versioning (ADR-007)
Python NATS integration issuesMediumLowPrototype with nats-py early
cognitive-vault crypto bugsLowCriticalExtensive security audit + testing
GPU availability (ml-offload)HighMediumFallback to cloud (Vertex AI)

Schedule Risks​

RiskProbabilityImpactMitigation
Phase 1 overruns 2 weeksMediumMediumReduce scope to vault-only if needed
intelagent complexity underestimatedHighHighDefer to Phase 4 (Scenario B)
Integration testing takes longerMediumLowAutomated test harness from Phase 0

References​

  • STATUS.md - Current project status
  • NEXT_STEPS.md - Phase 1 detailed roadmap
  • INTEGRATION.md - Integration guide for services
  • README.md - Architecture overview

External Resources​


Document Status: Draft for Architect Review Next Action: Review ADRs, select Phase 1 integration strategy Approval Required: Phase 1 project selection (ADR-001), Integration pattern (ADR-002)


ADR-0040: Service Mesh Adoption β€” Linkerd over Istio/Cilium​

Formal ADR: ADR-0040 in adr-ledger

Status: Accepted Date: 2026-02-17 Classification: Major Project: SPECTRE (spectre-proxy) Issue: #45 Service Mesh Evaluation


Context​

SPECTRE operates under a zero-trust network model where east-west traffic between services (spectre-proxy β†’ neutron, spectre-proxy β†’ NATS) must be encrypted and mutually authenticated. Phase 3 validation exposed three concrete requirements:

  1. mTLS between all service-to-service calls β€” prevent MITM on cluster networks
  2. L7 observability β€” per-route latency/error-rate without code changes to spectre-proxy
  3. Traffic policies β€” timeouts and retry budgets per endpoint (e.g. /ingest vs /health)

A service mesh was chosen over application-level TLS because:

  • Application TLS requires managing certificates in every service (cognitive-vault, neutron, proxy)
  • L7 metrics would need per-service Prometheus instrumentation
  • Retry/timeout logic is already implemented in spectre-proxy but a mesh makes it auditable externally

Decision​

Linkerd stable-2.14.x was selected as the service mesh for SPECTRE.

Linkerd is deployed on the kind cluster (spectre-dev) with:

  • Automatic sidecar injection for spectre-proxy (2/2 containers confirmed)
  • Stub neutron (ghcr.io/mccutchen/go-httpbin:v2.14.0) injected for mTLS validation
  • ServiceProfile CRD defining per-route policies for spectre-proxy
  • NATS outbound ports (4222) excluded from proxy interception to avoid protocol misdetection

Alternatives Considered​

Alternative 1: Istio (Rejected)​

Pros:

  • βœ… Feature-rich: traffic splitting, fault injection, Wasm filters
  • βœ… Large community, extensive documentation
  • βœ… Native Kubernetes Gateway API support

Cons:

  • ❌ Heavy control plane: ~300MB memory (istiod + proxies) vs ~15MB for Linkerd
  • ❌ Complex CRD surface: VirtualService, DestinationRule, Gateway, PeerAuthentication (25+ CRDs)
  • ❌ Envoy proxy per sidecar: larger attack surface, harder to audit
  • ❌ SPIFFE/SPIRE cert rotation has had CVEs (CVE-2022-24752)

Trade-Offs:

DimensionScoreRationale
Performance6/10Envoy overhead ~2-5ms p99
Maintainability5/10CRD sprawl, complex upgrades
Complexity3/1025+ CRDs to understand
Resource Usage4/10~300MB control plane
Security7/10Mature but large attack surface

Overall: 5/10 β€” Eliminated; SPECTRE does not need traffic splitting or Wasm.


Alternative 2: Cilium Service Mesh (Rejected)​

Pros:

  • βœ… eBPF-based: kernel-level enforcement, no sidecar overhead
  • βœ… Network policy + mesh in one agent
  • βœ… Excellent performance: near-zero latency overhead

Cons:

  • ❌ Requires Linux kernel β‰₯ 5.10 (kind nodes run 5.15+ but production constraint)
  • ❌ eBPF maps need CAP_BPF / CAP_NET_ADMIN β€” restricted in hardened clusters
  • ❌ Mutual TLS via WireGuard (node-level, not pod-level) β€” cannot enforce per-pod identity
  • ❌ L7 policies require Hubble which adds ~100MB overhead
  • ❌ kind cluster eBPF support requires privileged containers (security regression)

Trade-Offs:

DimensionScoreRationale
Performance10/10eBPF near-zero overhead
Maintainability7/10Single agent, unified network+mesh
Complexity5/10eBPF debugging requires kernel expertise
Resource Usage8/10No sidecars, one DaemonSet
Security6/10Node-level mTLS, not pod-level identity

Overall: 7/10 β€” Strong candidate for Phase 5 when running on bare-metal NixOS nodes. Deferred: kind environment and current kernel constraints make it premature.


Alternative 3: Linkerd (Selected) βœ…β€‹

Pros:

  • βœ… Lightweight: ~15MB Rust proxy (linkerd2-proxy) per sidecar
  • βœ… Zero-config mTLS: automatic SPIFFE certificate rotation via linkerd-identity
  • βœ… Simple mental model: ~5 CRDs total (ServiceProfile, Server, HTTPRoute, etc.)
  • βœ… Rust-based data plane β€” shares safety properties with spectre-proxy's Rust codebase
  • βœ… ServiceProfile CRD: per-route timeout + retry budget without application changes
  • βœ… linkerd viz golden metrics (success rate, RPS, p50/p95/p99) per deployment

Cons:

  • ❌ No traffic splitting without SMI adaptor (not needed for SPECTRE currently)
  • ❌ UDP not proxied (NATS uses TCP so this is irrelevant)
  • ❌ Smaller community than Istio

Trade-Offs:

DimensionScoreRationale
Performance9/10Rust proxy +~0.5ms p50, <2ms p99
Maintainability9/10Minimal CRDs, clean upgrade path
Complexity9/10linkerd install + inject annotation
Resource Usage9/10~15MB sidecar, ~50MB control plane
Security9/10SPIFFE identity, automatic mTLS, RBAC

Overall: 9/10 β€” Best fit for SPECTRE's current requirements.


Recommendation Matrix​

MeshPerformanceMaintainabilityComplexityResourcesSecurityOverall
Istio6/105/103/104/107/105.0/10
Cilium10/107/105/108/106/107.2/10
Linkerd9/109/109/109/109/109.0/10 βœ…

Consequences​

Positive​

  • mTLS is automatic for all meshed pods β€” no application code changes required
  • linkerd viz tap provides real-time L7 request inspection for debugging
  • ServiceProfile enables per-route SLO enforcement (timeouts, retries) externally from app code
  • Benchmark shows +~0.5ms p50 / +~1.5ms p99 overhead (acceptable for async workloads)
  • Zero-trust posture achieved: all spectre-proxy ↔ neutron traffic encrypted + authenticated

Negative / Trade-offs​

  • Each meshed pod uses ~15MB additional RAM (linkerd2-proxy sidecar)
  • NATS port 4222 must be excluded from interception (skip-outbound-ports: 4222) because Linkerd cannot proxy the NATS binary protocol
  • Linkerd does not proxy UDP β€” irrelevant now but limits future UDP-based protocols

Migration Path​

  • Phase 4: Deploy production neutron behind Linkerd mesh for real mTLS validation
  • Phase 5: Evaluate Cilium as replacement if eBPF kernel constraints are met on bare metal

Validation​

CheckCommandExpected Result
mTLS activelinkerd viz edges deploymentTLS column = true for spectre-proxy ↔ neutron
Traffic visiblelinkerd viz tap deployment/spectre-proxy --to deployment/neutronRequests visible with mTLS
Routes activelinkerd viz routes deployment/spectre-proxyPOST /ingest and GET /health listed
Overheadwrk2 with vs without sidecarp50 delta ≀ 1ms, p99 delta ≀ 2ms

References​


ADR-0043: Phase 3β†’4 Transition β€” Stub Neutron to Production Backend​

Formal ADR: ADR-0043 in adr-ledger

Status: Proposed Date: 2026-02-17 Classification: Major Project: SPECTRE (spectre-proxy, neutron/NEXUS) Supersedes: β€” Related: ADR-0040 (Service Mesh Adoption)


Context​

Phase 3 validated the service mesh infrastructure using a stub neutron β€” a stateless HTTP echo server (ghcr.io/mccutchen/go-httpbin:v2.14.0) deployed under the same Kubernetes service name (neutron.default.svc.cluster.local:8000) that the real backend will use.

This stub was sufficient to prove:

  • mTLS between spectre-proxy ↔ neutron (SECURED = βœ“ via linkerd viz edges)
  • Linkerd sidecar injection on both ends (2/2 containers)
  • ServiceProfile route classification (POST /ingest, GET /health)
  • Golden metrics collection (success rate, RPS, p50/p95/p99)

However, the stub does not exercise:

  • Actual inference latency (CUDA, model loading, GPU scheduling)
  • NATS event flow end-to-end (proxy β†’ NATS β†’ neutron β†’ response)
  • Circuit breaker under real failure modes (OOM, GPU exhaustion, model timeout)
  • Trace propagation across the full proxy β†’ neutron path (W3C traceparent)
  • Realistic payload sizes (LLM request/response bodies: 1KB–100KB)

Phase 4 requires replacing the stub with a production-capable neutron backend.


Decision​

Adopt a graduated replacement strategy with three stages, each independently deployable and validated before proceeding to the next.


Stage 1: Lightweight Neutron Shim (Phase 4 entry)​

Image: Custom minimal container (Python/FastAPI or Rust/Axum) Scope: HTTP API contract + NATS consumer β€” no CUDA, no real inference

spectre-proxy β†’ neutron-shim (HTTP :8000)
β”œβ”€β”€ POST /ingest β†’ fake inference (random latency 50-500ms)
β”œβ”€β”€ GET /health β†’ 200
└── NATS subscriber: spectre.ingest.v1 β†’ ack

What it validates:

  • Full NATS round-trip (proxy publishes β†’ neutron consumes β†’ response)
  • Realistic HTTP response structure (JSON with model output fields)
  • Circuit breaker under simulated failures (shim returns 500 at configurable rate)
  • Trace propagation: spectre-proxy span β†’ neutron-shim span in Jaeger
  • Mesh overhead under sustained load (wrk2 benchmark with real payload)

Nix integration:

  • nix/kubernetes/neutron-shim.nix replaces neutron-stub.nix
  • Same Service name, same port β€” zero changes to spectre-proxy config
  • Linkerd injection annotation preserved

Exit criteria:

  • NATS publish β†’ consume β†’ response latency < 50ms p99
  • Circuit breaker trips at 50% error rate, recovers after 30s
  • Trace spans visible in Jaeger: spectre-proxy β†’ neutron-shim
  • wrk2 benchmark: β‰₯ 10K RPS on /ingest with mesh (p99 < 20ms)

Stage 2: NEXUS Lite (Phase 4 mid)​

Image: Stripped NEXUS build β€” Python + FastAPI + Ray Serve, no CUDA Scope: Real inference API with CPU-only models (e.g., distilbert, small LLMs via llama.cpp CPU)

spectre-proxy β†’ neutron-lite (HTTP :8000)
β”œβ”€β”€ POST /ingest β†’ real inference (CPU model, 200ms-2s)
β”œβ”€β”€ GET /health β†’ 200 + model status
β”œβ”€β”€ GET /metrics β†’ Prometheus (inference_latency, queue_depth)
└── NATS: spectre.ingest.v1 β†’ inference β†’ spectre.result.v1

What it validates:

  • Real inference latency profiles under mesh (not fake random)
  • Memory pressure from model loading (1-4GB for CPU models)
  • Ray Serve autoscaling interaction with Linkerd load balancing
  • Request queuing behavior (burst β†’ queue β†’ timeout β†’ circuit break)
  • pgvector integration for RAG context retrieval (if applicable)

Image build strategy:

  • Multi-stage Dockerfile: Python deps in layer 1, model weights volume-mounted
  • No CUDA toolkit β†’ image < 2GB (vs 8-12GB for full NEXUS)
  • Nix: nix build .#neutron-lite-image using pkgs.dockerTools.buildLayeredImage

Exit criteria:

  • Inference round-trip (proxy β†’ NATS β†’ neutron β†’ response) < 3s p99
  • Model hot-reload without pod restart
  • Memory stays < 4GB under sustained load
  • Mesh overhead negligible vs inference latency (< 5% of total p99)

Stage 3: Full NEXUS (Phase 4 exit / Phase 5 entry)​

Image: Full NEXUS with CUDA, Ray, pgvector Scope: Production deployment β€” GPU inference, vector search, multi-model routing

spectre-proxy β†’ neutron (HTTP :8000)
β”œβ”€β”€ POST /ingest β†’ GPU inference (50ms-5s depending on model)
β”œβ”€β”€ POST /embed β†’ vector embedding
β”œβ”€β”€ GET /health β†’ 200 + GPU/VRAM status
β”œβ”€β”€ GET /metrics β†’ inference_latency, vram_usage, queue_depth
└── NATS: full event schema (ingest, result, error, cost)

Prerequisites:

  • Bare-metal or GPU-enabled nodes (NVIDIA runtime, device plugin)
  • Persistent volume for model weights (50-200GB)
  • NATS JetStream for durable delivery (exactly-once on inference requests)

What changes from mesh perspective:

  • ServiceProfile updated: POST /ingest timeout 10s β†’ 30s (GPU inference slower)
  • RetryBudget reduced: 20% β†’ 5% (inference is expensive, don't retry aggressively)
  • Linkerd load balancing: EWMA for latency-aware routing across GPU replicas

Exit criteria:

  • GPU inference < 5s p99 through mesh
  • VRAM-aware routing via custom metrics (HPA + Prometheus adapter)
  • mTLS maintained with zero application changes from Stage 1
  • ServiceProfile routes updated for new endpoints (/embed, /models)

Alternatives Considered​

Alt 1: Skip Directly to Full NEXUS (Rejected)​

Build and deploy the complete NEXUS image immediately.

Pros: No intermediate steps, production-ready sooner Cons:

  • ❌ CUDA image is 8-12GB β€” slow build, slow kind load
  • ❌ Requires GPU node (not available in kind cluster)
  • ❌ Debugging mesh issues mixed with GPU/CUDA issues
  • ❌ Blocks Phase 4 progress until GPU infrastructure ready

Verdict: Rejected. Coupling infrastructure validation with GPU complexity adds risk.

Alt 2: Keep Stub Forever, Test Real Backend Outside Mesh (Rejected)​

Keep go-httpbin as stub, validate NEXUS separately without mesh.

Pros: Simple, no new containers to build Cons:

  • ❌ Never validates NATS end-to-end through mesh
  • ❌ Mesh overhead under real payloads unknown until production
  • ❌ Circuit breaker behavior with real failure modes untested

Verdict: Rejected. Defeats the purpose of Phase 3β†’4 graduation.

Alt 3: Graduated Replacement (Selected) βœ…β€‹

Three stages: shim β†’ lite β†’ full (this ADR).

Pros:

  • βœ… Each stage independently deployable and testable
  • βœ… Zero changes to spectre-proxy between stages (same Service name/port)
  • βœ… Mesh infrastructure validated incrementally under increasing realism
  • βœ… Can run in kind (stages 1-2) or bare metal (stage 3)

Verdict: Selected. De-risks transition, validates mesh under progressively realistic conditions.


Consequences​

Positive​

  • spectre-proxy is completely decoupled from the neutron backend implementation β€” any container exposing HTTP :8000 with the neutron Service name works
  • Each stage produces measurable validation data (latency, throughput, failure modes) before committing to the next
  • Stub β†’ shim β†’ lite β†’ full path means Phase 4 can start immediately without GPU hardware
  • Mesh configuration (ServiceProfile, mTLS, viz) is validated once and carried forward

Negative / Trade-offs​

  • Three container images to maintain during transition (stub, shim, lite)
  • ServiceProfile timeouts must be updated per stage (10s β†’ 30s for GPU)
  • Stage 2 CPU inference is not representative of GPU latency β€” benchmarks are directional, not production baselines

What Stays the Same Across All Stages​

ComponentValue
Service nameneutron.default.svc.cluster.local
Service port8000
Linkerd injectionlinkerd.io/inject: enabled
mTLSAutomatic (SPIFFE identity)
spectre-proxy configNEUTRON_URL=http://neutron.default.svc.cluster.local:8000
Nix packagenix build .#neutron-{stub,shim,lite}-manifests

Timeline​

StageTargetBlockers
Stage 1 (Shim)Phase 4 start (Mar 2026)None β€” can start now
Stage 2 (Lite)Phase 4 mid (Apr 2026)NEXUS Python deps in Nix
Stage 3 (Full)Phase 4 exit (Jun 2026)GPU nodes, NVIDIA runtime

References​

  • ADR-0040: Service Mesh Adoption (Linkerd)
  • SPECTRE ROADMAP.md: Phase 4 Enterprise Features
  • nix/kubernetes/neutron-stub.nix β€” current Stage 0 (go-httpbin)
  • nix/kubernetes/default.nix β€” Kubernetes manifest composition
  • NEXUS/neutron: ~/dev/low-level/neutron/ (external repo)