Spider-Nix π·οΈ
Enterprise-grade OSINT/web crawler toolkit built with Python 3.13, asyncio, and NixOS
Features β’ Architecture β’ Quick Start β’ Documentation β’ Contributing
Why This Mattersβ
Spider-Nix demonstrates production-ready software engineering practices:
- Security-First Architecture: Multi-layered security scanning (SAST, dependency auditing, secret detection)
- CI/CD Excellence: Automated testing across Python 3.11-3.13, coverage tracking, parallel job execution
- Modern Python: Async/await throughout, type hints, Pydantic models, httpx/Playwright
- DevOps Integration: NixOS flakes for reproducible environments, pre-commit hooks, Justfile automation
- Test-Driven Development: 63 test cases with comprehensive coverage, pytest-asyncio, matrix testing
- Professional Standards: Ruff linting, mypy type checking, comprehensive documentation
By The Numbersβ
4,638 LOC β 17 modules β 63 tests β Python 3.11-3.13
6 OSINT categories β 20+ integrations β 4 anti-detection techniques
Featuresβ
Core Capabilitiesβ
- Dual-Mode Crawling: HTTP (httpx) for speed, Browser (Playwright) for JavaScript-heavy sites
- Advanced Stealth: Canvas fingerprinting, WebGL spoofing, navigator masking, automation detection bypass
- Full OSINT Suite: DNS enumeration, WHOIS, subdomain discovery, port scanning, vulnerability assessment
- External Integrations: Shodan, VirusTotal, URLScan.io with correlation engine
- Intelligent Proxy Rotation: 4 strategies (round-robin, random, weighted, health-based)
- Job Intelligence: Career page discovery, salary extraction, opportunity scoring
Technical Highlightsβ
- Async Architecture: Built on asyncio for high concurrency (configurable limits)
- Type Safety: Pydantic models for configuration and data validation
- Storage Flexibility: JSON, CSV, SQLite with FTS5 full-text search
- CLI Excellence: Typer + Rich for beautiful terminal interfaces
- NixOS Integration: Flakes for reproducible dev environments, declarative dependencies
Architectureβ
graph TB
CLI[CLI Interface<br/>Typer + Rich] --> Core[Core Crawler Engine]
CLI --> OSINT[OSINT Module]
CLI --> Intel[Job Intel Module]
Core --> HTTP[HTTP Crawler<br/>httpx + asyncio]
Core --> Browser[Browser Crawler<br/>Playwright]
Core --> Stealth[Anti-Detection<br/>Canvas/WebGL Spoofing]
Core --> Proxy[Proxy Rotator<br/>4 Strategies]
OSINT --> Recon[Reconnaissance<br/>DNS, WHOIS, Subdomains]
OSINT --> Scanner[Port Scanner<br/>TCP/UDP + Service Detection]
OSINT --> Vuln[Vulnerability Scanner<br/>CVE Matching + Headers]
OSINT --> Integrations[External APIs<br/>Shodan, VirusTotal, URLScan]
OSINT --> Correlator[Correlation Engine<br/>Entity-Relationship Graph]
Intel --> Career[Career Page Finder]
Intel --> JobAnalyzer[Job Opportunity Analyzer]
Core --> Storage[Storage Backend<br/>JSON/CSV/SQLite FTS5]
OSINT --> Storage
Intel --> Storage
style CLI fill:#2ea44f,color:#fff
style Core fill:#0969da,color:#fff
style OSINT fill:#bf3989,color:#fff
style Intel fill:#bf8700,color:#fff
Quick Startβ
Prerequisitesβ
- NixOS (or Nix package manager on Linux/macOS)
- Python 3.11+ (provided by Nix)
- Git
Installationβ
# Clone repository
git clone https://github.com/VoidNxSEC/spider-nix.git
cd spider-nix
# Enter Nix development shell (installs all dependencies)
nix develop
# Install pre-commit hooks
spider hooks-install
# Run tests to verify setup
spider test
Usage Examplesβ
# Basic crawling
spider crawl https://example.com --pages 10
# Browser mode for JavaScript sites
spider crawl https://spa-site.com --browser --pages 5
# OSINT reconnaissance
spider recon dns example.com
spider recon subdomains example.com -o results.json
spider recon portscan 192.168.1.1 -p 1-1000
# Job hunting intelligence
spider job-hunt example.com --pages 20 --output jobs.json
# Aggressive mode with proxy rotation
spider crawl https://target.com --aggressive --proxy-file proxies.txt
Developmentβ
Setup Development Environmentβ
# Enter Nix devShell
nix develop
# Install pre-commit hooks
spider hooks-install
# Run full CI checks locally
spider ci-local
Development Commandsβ
spider test # Run tests
spider test-cov # Tests with coverage report
spider check # Run linters
spider typecheck # Run mypy type checking
spider security # Run security scans
spider ci-local # Simulate full CI pipeline
Testingβ
# Run all tests
spider test
# Run with coverage
spider test-cov
# Run specific test file
pytest tests/test_crawler.py
# Run tests matching pattern
pytest -k "test_dns"
Project Structureβ
spider-nix/
βββ src/spider_nix/
β βββ cli.py # Typer CLI interface (600 LOC)
β βββ crawler.py # HTTP async crawler (214 LOC)
β βββ browser.py # Playwright integration (209 LOC)
β βββ stealth.py # Anti-detection techniques (159 LOC)
β βββ proxy.py # Proxy rotation engine (141 LOC)
β βββ storage.py # Storage backends (162 LOC)
β βββ config.py # Pydantic configuration (62 LOC)
β βββ osint/
β β βββ reconnaissance.py # DNS, WHOIS, subdomains (560 LOC)
β β βββ scanner.py # Port scanning (491 LOC)
β β βββ analyzer.py # Tech detection (433 LOC)
β β βββ vulnerability.py # Vuln assessment (421 LOC)
β β βββ integrations.py # Shodan, VirusTotal, URLScan (486 LOC)
β β βββ correlator.py # Entity correlation (454 LOC)
β βββ intel/
β βββ jobs.py # Job intelligence (194 LOC)
βββ tests/ # 63 test cases (1,123 LOC)
βββ .github/workflows/ # CI/CD pipelines
βββ flake.nix # Nix development environment
βββ pyproject.toml # Python package config
βββ Justfile # Development commands
OSINT Arsenalβ
20 modules across 6 categories:
| Category | Modules | Key Features |
|---|---|---|
| Reconnaissance | DNS, WHOIS, Subdomain Enum | Certificate Transparency, DNS bruteforce, 7 record types |
| Analysis | Content Analyzer, Tech Detector | Wappalyzer-style detection, 50+ frameworks/CMS |
| Scanning | Port Scanner, Service Detector | 25+ service signatures, TCP/UDP, banner grabbing |
| Vulnerability | Scanner, Header Checker, CVE Matcher | Security score (0-100), HSTS/CSP analysis |
| Integrations | Shodan, URLScan, VirusTotal, Aggregator | Multi-source correlation, reputation checks |
| Correlation | Entity-Relationship Graph | Graph export (JSON, Graphviz DOT) |
Securityβ
Spider-Nix takes security seriously:
- SAST Scanning: Bandit for Python-specific vulnerabilities
- Dependency Auditing: Safety + pip-audit for known CVEs
- Secret Detection: Gitleaks in CI + pre-commit hooks
- Ruff Security Rules: Flake8-bandit integration
See SECURITY.md for our security policy and how to report vulnerabilities.
Contributingβ
We welcome contributions! Please see CONTRIBUTING.md for:
- Development setup
- Code style guidelines
- Testing requirements
- Pull request process
Licenseβ
This project is licensed under the MIT License - see LICENSE for details.
Acknowledgmentsβ
Built with modern Python tools:
- httpx - HTTP client
- Playwright - Browser automation
- Typer - CLI framework
- Pydantic - Data validation
- NixOS - Reproducible environments
Built for the security and OSINT communities