Skip to main content

Spider-Nix πŸ•·οΈ

CI Pipeline Security Scanning codecov Python Version License: MIT Code style: ruff Security: bandit Nix

Enterprise-grade OSINT/web crawler toolkit built with Python 3.13, asyncio, and NixOS

Features β€’ Architecture β€’ Quick Start β€’ Documentation β€’ Contributing


Why This Matters​

Spider-Nix demonstrates production-ready software engineering practices:

  • Security-First Architecture: Multi-layered security scanning (SAST, dependency auditing, secret detection)
  • CI/CD Excellence: Automated testing across Python 3.11-3.13, coverage tracking, parallel job execution
  • Modern Python: Async/await throughout, type hints, Pydantic models, httpx/Playwright
  • DevOps Integration: NixOS flakes for reproducible environments, pre-commit hooks, Justfile automation
  • Test-Driven Development: 63 test cases with comprehensive coverage, pytest-asyncio, matrix testing
  • Professional Standards: Ruff linting, mypy type checking, comprehensive documentation

By The Numbers​

4,638 LOC β”‚ 17 modules β”‚ 63 tests β”‚ Python 3.11-3.13
6 OSINT categories β”‚ 20+ integrations β”‚ 4 anti-detection techniques

Features​

Core Capabilities​

  • Dual-Mode Crawling: HTTP (httpx) for speed, Browser (Playwright) for JavaScript-heavy sites
  • Advanced Stealth: Canvas fingerprinting, WebGL spoofing, navigator masking, automation detection bypass
  • Full OSINT Suite: DNS enumeration, WHOIS, subdomain discovery, port scanning, vulnerability assessment
  • External Integrations: Shodan, VirusTotal, URLScan.io with correlation engine
  • Intelligent Proxy Rotation: 4 strategies (round-robin, random, weighted, health-based)
  • Job Intelligence: Career page discovery, salary extraction, opportunity scoring

Technical Highlights​

  • Async Architecture: Built on asyncio for high concurrency (configurable limits)
  • Type Safety: Pydantic models for configuration and data validation
  • Storage Flexibility: JSON, CSV, SQLite with FTS5 full-text search
  • CLI Excellence: Typer + Rich for beautiful terminal interfaces
  • NixOS Integration: Flakes for reproducible dev environments, declarative dependencies

Architecture​

graph TB
CLI[CLI Interface<br/>Typer + Rich] --> Core[Core Crawler Engine]
CLI --> OSINT[OSINT Module]
CLI --> Intel[Job Intel Module]

Core --> HTTP[HTTP Crawler<br/>httpx + asyncio]
Core --> Browser[Browser Crawler<br/>Playwright]
Core --> Stealth[Anti-Detection<br/>Canvas/WebGL Spoofing]
Core --> Proxy[Proxy Rotator<br/>4 Strategies]

OSINT --> Recon[Reconnaissance<br/>DNS, WHOIS, Subdomains]
OSINT --> Scanner[Port Scanner<br/>TCP/UDP + Service Detection]
OSINT --> Vuln[Vulnerability Scanner<br/>CVE Matching + Headers]
OSINT --> Integrations[External APIs<br/>Shodan, VirusTotal, URLScan]
OSINT --> Correlator[Correlation Engine<br/>Entity-Relationship Graph]

Intel --> Career[Career Page Finder]
Intel --> JobAnalyzer[Job Opportunity Analyzer]

Core --> Storage[Storage Backend<br/>JSON/CSV/SQLite FTS5]
OSINT --> Storage
Intel --> Storage

style CLI fill:#2ea44f,color:#fff
style Core fill:#0969da,color:#fff
style OSINT fill:#bf3989,color:#fff
style Intel fill:#bf8700,color:#fff

Quick Start​

Prerequisites​

  • NixOS (or Nix package manager on Linux/macOS)
  • Python 3.11+ (provided by Nix)
  • Git

Installation​

# Clone repository
git clone https://github.com/VoidNxSEC/spider-nix.git
cd spider-nix

# Enter Nix development shell (installs all dependencies)
nix develop

# Install pre-commit hooks
spider hooks-install

# Run tests to verify setup
spider test

Usage Examples​

# Basic crawling
spider crawl https://example.com --pages 10

# Browser mode for JavaScript sites
spider crawl https://spa-site.com --browser --pages 5

# OSINT reconnaissance
spider recon dns example.com
spider recon subdomains example.com -o results.json
spider recon portscan 192.168.1.1 -p 1-1000

# Job hunting intelligence
spider job-hunt example.com --pages 20 --output jobs.json

# Aggressive mode with proxy rotation
spider crawl https://target.com --aggressive --proxy-file proxies.txt

Development​

Setup Development Environment​

# Enter Nix devShell
nix develop

# Install pre-commit hooks
spider hooks-install

# Run full CI checks locally
spider ci-local

Development Commands​

spider test # Run tests
spider test-cov # Tests with coverage report
spider check # Run linters
spider typecheck # Run mypy type checking
spider security # Run security scans
spider ci-local # Simulate full CI pipeline

Testing​

# Run all tests
spider test

# Run with coverage
spider test-cov

# Run specific test file
pytest tests/test_crawler.py

# Run tests matching pattern
pytest -k "test_dns"

Project Structure​

spider-nix/
β”œβ”€β”€ src/spider_nix/
β”‚ β”œβ”€β”€ cli.py # Typer CLI interface (600 LOC)
β”‚ β”œβ”€β”€ crawler.py # HTTP async crawler (214 LOC)
β”‚ β”œβ”€β”€ browser.py # Playwright integration (209 LOC)
β”‚ β”œβ”€β”€ stealth.py # Anti-detection techniques (159 LOC)
β”‚ β”œβ”€β”€ proxy.py # Proxy rotation engine (141 LOC)
β”‚ β”œβ”€β”€ storage.py # Storage backends (162 LOC)
β”‚ β”œβ”€β”€ config.py # Pydantic configuration (62 LOC)
β”‚ β”œβ”€β”€ osint/
β”‚ β”‚ β”œβ”€β”€ reconnaissance.py # DNS, WHOIS, subdomains (560 LOC)
β”‚ β”‚ β”œβ”€β”€ scanner.py # Port scanning (491 LOC)
β”‚ β”‚ β”œβ”€β”€ analyzer.py # Tech detection (433 LOC)
β”‚ β”‚ β”œβ”€β”€ vulnerability.py # Vuln assessment (421 LOC)
β”‚ β”‚ β”œβ”€β”€ integrations.py # Shodan, VirusTotal, URLScan (486 LOC)
β”‚ β”‚ └── correlator.py # Entity correlation (454 LOC)
β”‚ └── intel/
β”‚ └── jobs.py # Job intelligence (194 LOC)
β”œβ”€β”€ tests/ # 63 test cases (1,123 LOC)
β”œβ”€β”€ .github/workflows/ # CI/CD pipelines
β”œβ”€β”€ flake.nix # Nix development environment
β”œβ”€β”€ pyproject.toml # Python package config
└── Justfile # Development commands

OSINT Arsenal​

20 modules across 6 categories:

CategoryModulesKey Features
ReconnaissanceDNS, WHOIS, Subdomain EnumCertificate Transparency, DNS bruteforce, 7 record types
AnalysisContent Analyzer, Tech DetectorWappalyzer-style detection, 50+ frameworks/CMS
ScanningPort Scanner, Service Detector25+ service signatures, TCP/UDP, banner grabbing
VulnerabilityScanner, Header Checker, CVE MatcherSecurity score (0-100), HSTS/CSP analysis
IntegrationsShodan, URLScan, VirusTotal, AggregatorMulti-source correlation, reputation checks
CorrelationEntity-Relationship GraphGraph export (JSON, Graphviz DOT)

Security​

Spider-Nix takes security seriously:

  • SAST Scanning: Bandit for Python-specific vulnerabilities
  • Dependency Auditing: Safety + pip-audit for known CVEs
  • Secret Detection: Gitleaks in CI + pre-commit hooks
  • Ruff Security Rules: Flake8-bandit integration

See SECURITY.md for our security policy and how to report vulnerabilities.

Contributing​

We welcome contributions! Please see CONTRIBUTING.md for:

  • Development setup
  • Code style guidelines
  • Testing requirements
  • Pull request process

License​

This project is licensed under the MIT License - see LICENSE for details.

Acknowledgments​

Built with modern Python tools:


Built for the security and OSINT communities