Skip to main content

tensorforge β€” Nix & NixOS Guide

For the Nix community: what works, what doesn't, and why.


TL;DR​

GoalCommand
Try llama-server right now (CUDA)nix shell .#llama-cpp-cuda
Try llama-server right now (CPU)nix shell nixpkgs#llama-cpp
Full dev shell (Rust + CUDA)nix develop .#cuda
NixOS production service./tensorforge/scripts/bootstrap-nix.sh nixos --gpu <profile>
Install Nix on bare Linux./tensorforge/scripts/bootstrap-nix.sh install-nix

The CUDA problem in Nix​

This is the part that bites everyone:

nix shell nixpkgs#llama-cpp does NOT give you CUDA.

cudaSupport is a nixpkgs-level flag β€” it must be set when importing nixpkgs, not per-package. This means:

# This does NOT work
pkgs.llama-cpp.override { cudaSupport = true; }

# This DOES work β€” set at import time
pkgs = import nixpkgs { config.cudaSupport = true; config.allowUnfree = true; };
pkgs.llama-cpp # now CUDA-enabled

This flake exposes a separate cudaPkgs nixpkgs instance with cudaSupport = true. The CUDA packages are under .#llama-cpp-cuda, devShells.cuda, and apps.llama-server.

First build warning: building llama-cpp with CUDA from source takes ~10 minutes. There's no official Nix binary cache for CUDA packages because CUDA is unfree. Subsequent runs use the Nix store cache.


Flake outputs​

packages.default β†’ ml-offload-api (Rust API server)
packages.api β†’ same as default
packages.forge β†’ tensorforge (Rust inference engine)
packages.python β†’ Python env for arch-analyzer
packages.llama-cpp-cuda β†’ llama.cpp with CUDA (x86_64-linux only, builds from source)

apps.default β†’ ml-offload-api
apps.forge β†’ tensorforge
apps.llama-server β†’ llama-server binary (CUDA-enabled)

devShells.default β†’ Rust + CUDA headers
devShells.python β†’ Python 3.13 + arch-analyzer deps
devShells.cuda β†’ Rust + llama-server + nvcc (full inference dev env)

Quick start β€” Nix (non-NixOS)​

# Install Nix first (Determinate Systems β€” flakes enabled by default)
curl -fsSL https://install.determinate.systems/nix | sh -s -- install

# Clone
git clone https://github.com/VoidNxSEC/ml-ops-api
cd ml-ops-api

# Drop into CUDA shell (builds llama.cpp with CUDA β€” ~10 min first run)
nix develop .#cuda

# Or use the bootstrap script
./tensorforge/scripts/bootstrap-nix.sh quick # auto-detects GPU

NixOS β€” production service​

# Generate /etc/nixos/tensorforge.nix for your GPU
sudo ./tensorforge/scripts/bootstrap-nix.sh nixos --gpu b200
# profiles: b200 Β· h100 Β· l40s Β· generic

# Rebuild
sudo nixos-rebuild switch
systemctl status llamacpp-turbo

Or manually, add to your configuration.nix:

{ config, pkgs, ... }:
{
imports = [ /path/to/ml-ops-api/tensorforge/nix/b200-optimized/default.nix ];

# Required for CUDA packages
nixpkgs.config.allowUnfree = true;
hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.stable;
hardware.opengl.enable = true;

services.tensorforge.b200Optimized = {
enable = true;
llamacpp.enable = true;
systemTuning.enable = true;
monitoring.enable = true;
};
}

NixOS CUDA requirements​

# hardware-configuration.nix or configuration.nix
hardware.opengl = {
enable = true;
driSupport = true;
driSupport32Bit = true;
};

hardware.nvidia = {
modesetting.enable = true;
powerManagement.enable = false;
open = false;
nvidiaSettings = true;
package = config.boot.kernelPackages.nvidiaPackages.stable;
};

services.xserver.videoDrivers = [ "nvidia" ]; # even on headless

Dev workflow​

# Default: Rust + CUDA headers
nix develop

# Full CUDA shell with llama-server
nix develop .#cuda
llama-server --version # confirms CUDA build

# Python / arch-analyzer
nix develop .#python
python arch-analyzer/src/analyzer.py --help

Why bare Linux scripts exist alongside the Nix path​

The bare Linux scripts (bootstrap.sh, server.sh, etc.) exist because:

  1. Cloud GPU instances (Shadeform, Lambda, RunPod) run Ubuntu, not NixOS. Installing Nix there and waiting for a CUDA build is overkill for a spot instance.
  2. NVIDIA NIM containers have no Nix at all.
  3. Speed: bootstrap.sh compiles llama.cpp in ~3 min on 12 cores. nix build .#llama-cpp-cuda takes ~10 min without a binary cache.

On NixOS workstations and persistent servers, the Nix path is the right choice. On ephemeral GPU nodes, the bash scripts are more practical.


Binary cache note​

There's no public binary cache for .#llama-cpp-cuda β€” CUDA is unfree and the package depends on your driver version. You can set up your own cache with:

# Attic or Cachix
nix build .#llama-cpp-cuda
cachix push your-cache result

Then add to flake.nix:

nixConfig.extra-substituters = [ "https://your-cache.cachix.org" ];
nixConfig.extra-trusted-public-keys = [ "your-cache.cachix.org-1:..." ];

GPU arch reference (for CUDA_ARCH in bare scripts):

GPUsm--cuda-arch
B200 / B300100--cuda-arch 100
H100 / H20090--cuda-arch 90
A10080--cuda-arch 80
L40S / RTX 409089--cuda-arch 89
RTX 309086--cuda-arch 86
T475--cuda-arch 75

Maintained by: VoidNxSEC β€” NVIDIA Inception (B200/B300)