Testing Guide - Phase 1
Quick Validation (No Dependencies)β
Syntax checks (fast, no Nix needed):
python -m py_compile src/spider_nix/extraction/*.py
python -m py_compile src/spider_nix/ml/*.py
Full Test Suite (Requires Nix Environment)β
Enter Nix Shellβ
nix develop
Run All Testsβ
pytest tests/ -v
Run Specific Modulesβ
# Extraction tests
pytest tests/extraction/ -v
# ML tests
pytest tests/ml/ -v
# With coverage
pytest tests/ --cov=spider_nix --cov-report=html
Quick Import Testβ
nix develop --command python -c "
from spider_nix.extraction import BoundingBox, VisionExtractor, DOMAnalyzer, FusionEngine
from spider_nix.ml import FeedbackLogger, FailureClassifier, StrategySelector
print('β All imports successful')
"
Module Structureβ
Extraction (src/spider_nix/extraction/)β
models.py- BoundingBox, VisionDetection, DOMElement, FusedElementvision_extractor.py- VisionExtractor (ml-offload-api client)dom_analyzer.py- DOMAnalyzer (lxml HTML parser)fusion_engine.py- FusionEngine (IoU matching)
ML Feedback (src/spider_nix/ml/)β
models.py- FailureClass, Strategy, CrawlAttempt, StrategyEffectivenessfeedback_logger.py- FeedbackLogger (async SQLite logging)failure_classifier.py- FailureClassifier (rule-based, Phase 1)strategy_selector.py- StrategySelector (epsilon-greedy bandit)schema.sql- Database schema
Test Filesβ
Extraction Tests (tests/extraction/)β
test_models.py- BoundingBox IoU, data modelstest_fusion_engine.py- Fusion algorithms
ML Tests (tests/ml/)β
test_failure_classifier.py- Failure classification rules
Manual Integration Testβ
import asyncio
from spider_nix.extraction import VisionExtractor, DOMAnalyzer, FusionEngine
from spider_nix.ml import FeedbackLogger, FailureClassifier, StrategySelector
async def test_integration():
# Test ML feedback system
logger = FeedbackLogger("test_feedback.db")
await logger.initialize()
classifier = FailureClassifier()
selector = StrategySelector(logger, epsilon=0.2)
# Classify a failure
failure = classifier.classify(status_code=429, response_time_ms=5000)
print(f"Classified as: {failure}")
# Select strategies
strategies = await selector.select_strategies("https://example.com")
print(f"Selected strategies: {strategies}")
# Get stats
stats = await logger.get_stats()
print(f"Stats: {stats}")
print("β ML integration test passed")
asyncio.run(test_integration())
Phase 1 Checklistβ
- β Extraction module structure
- β Vision extractor (ml-offload-api integration)
- β DOM analyzer (lxml parser)
- β Fusion engine (IoU algorithm)
- β ML feedback database schema
- β Failure classifier (rule-based)
- β Strategy selector (bandit algorithm)
- β Enhanced stealth (9 JS patches)
- β Python syntax validation
- β³ Full pytest suite (run when nix develop ready)
Next Stepsβ
- Enter nix shell:
nix develop - Run tests:
pytest tests/ -v - Review PHASE1_IMPLEMENTATION.md for full details
- Start Phase 2 when ready