DARTS Search Pipeline¶
This page maps the conceptual DARTS workflow onto the actual ForeBlocks modules.
Why this pipeline exists¶
ForeBlocks does not treat DARTS as a single opaque training call. The search stack is deliberately staged so you can:
- prune bad candidates early with cheap signals
- spend short differentiable-search budgets only on promoted candidates
- derive a discrete architecture explicitly
- retrain the final model from scratch when desired
That structure makes it easier to debug, benchmark, and reason about the search than a monolithic NAS loop.
Phase-to-module map¶
| Phase | Main code location | Role |
|---|---|---|
| search-space definition | foreblocks/darts/config.py |
defines architecture modes, op pools, and search budgets |
| candidate generation | foreblocks/darts/search/orchestrator.py |
samples candidate configs and coordinates evaluation |
| zero-cost ranking | foreblocks/darts/search/zero_cost.py |
computes cheap pre-training metrics |
| bilevel search training | foreblocks/darts/training/darts_loop.py |
runs the mixed-architecture DARTS phase |
| multi-fidelity orchestration | foreblocks/darts/search/multi_fidelity.py |
promotes top candidates and manages staged budgets |
| final fixed-model retraining | foreblocks/darts/training/final_trainer.py |
retrains the derived architecture |
| result analysis | foreblocks/darts/evaluation/analyzer.py |
builds search-result summaries and plots |
| public entry surface | foreblocks/darts/trainer.py |
exposes the staged workflow as DARTSTrainer methods |
Public control flow¶
The high-level path is:
- instantiate
DARTSTrainer - call
multi_fidelity_search(...) - inspect the returned candidates and final metrics
- save the best final model
Internally, the call chain is roughly:
DARTSTrainer.multi_fidelity_search(...)
-> search/multi_fidelity.py
-> search/orchestrator.py
-> search/zero_cost.py
-> training/darts_loop.py
-> trainer.derive_final_architecture(...)
-> training/final_trainer.py
Result artifacts by phase¶
Candidate generation and zero-cost ranking¶
Produces:
- raw candidates
- cheap ranking metrics
- promoted
top_kcandidates
This is the best phase for debugging search-space quality.
Bilevel DARTS phase¶
Produces:
- mixed searched model
- training and validation losses
- alpha histories
- intermediate metrics
This is the best phase for debugging collapse, instability, or bad promotion thresholds.
Final retraining phase¶
Produces:
- the fixed
final_model - final metrics
- training info
This is the artifact that should matter most for downstream use.
The important design choice: fixed-model retraining¶
ForeBlocks explicitly separates search-time mixed architectures from final deployment artifacts. That means the model you save after derive_final_architecture(...) is not just the same mixed model with a different label; it is a discrete architecture intended to be retrained and validated on its own merits.
This separation matters because:
- mixed-model performance can overestimate the final discrete architecture
- retraining reveals whether the search signal really transfers
- saving and sharing the final model becomes cleaner
Stats collection and benchmarking¶
When collect_stats=True, the multi-fidelity pipeline stores timing and worker-parallelism artifacts. This is useful when you want to compare:
- how expensive phase 1 really is
- whether more workers help in practice
- whether candidate generation or search training is the true bottleneck
Treat this as an experiment-benchmarking feature, not something you need for every search run.