ForeBlocks DARTS Guide¶
ForeBlocks includes a staged neural architecture search subsystem for time-series forecasting. The DARTS stack is built around DARTSTrainer, but it is not just a single differentiable loop. It combines:
- search-space sampling
- zero-cost candidate screening
- bilevel DARTS training on promoted candidates
- conversion to a discrete fixed model
- final retraining and optional result analysis
Install¶
For the search loop itself:
If you also want the analyzer and richer search-result visuals:
Public imports¶
from foreblocks.darts import (
DARTSTrainer,
DARTSConfig,
DARTSTrainConfig,
FinalTrainConfig,
MultiFildelitySearchConfig,
AblationSearchConfig,
RobustPoolSearchConfig,
)
When to use DARTS here¶
Use the DARTS subsystem when:
- your operation pool is already fairly well defined, but you do not know which combinations work best
- you want a cheaper staged NAS workflow instead of fully training every candidate
- you want to inspect alpha evolution, promoted candidates, and final retrained metrics
Do not start here if:
- your base training loop is not working yet
- you have not verified tensor shapes with a normal
ForecastingModelrun - you are still deciding between totally different problem formulations
Quick start¶
from foreblocks.darts import DARTSTrainer
trainer = DARTSTrainer(
input_dim=5,
hidden_dims=[32, 64, 128],
forecast_horizon=24,
seq_length=48,
device="auto",
)
results = trainer.multi_fidelity_search(
train_loader=train_loader,
val_loader=val_loader,
test_loader=test_loader,
num_candidates=20,
search_epochs=20,
final_epochs=80,
max_samples=32,
top_k=5,
)
best_model = results["final_model"]
trainer.save_best_model("best_darts_model.pth")
This is the highest-level API. It runs candidate generation, ranking, short DARTS training, discrete derivation, and final retraining in one call.
Mental model¶
The public entry point is DARTSTrainer, but the implementation is intentionally split into focused modules:
foreblocks/darts/search/zero_cost.py: cheap candidate scoringforeblocks/darts/training/darts_loop.py: bilevel search trainingforeblocks/darts/training/final_trainer.py: fixed-model retrainingforeblocks/darts/search/multi_fidelity.py: staged orchestrationforeblocks/darts/evaluation/analyzer.py: post-search analysis
That means you can use the trainer in two styles:
- as one end-to-end NAS pipeline via
multi_fidelity_search(...) - as a lower-level orchestration layer where you call each phase manually
Search phases¶
The multi-fidelity flow is:
- generate
num_candidatesrandom architectures from the configured search space - score them with zero-cost metrics
- keep the top
top_k - run short bilevel DARTS training on the promoted candidates
- derive a discrete architecture for each searched mixed model
- select the best candidate by validation behavior
- retrain the best fixed model and report final metrics
This is closer to a staged NAS pipeline than to a single bare DARTS loop.
Core APIs¶
| Method | Use it when | Returns |
|---|---|---|
evaluate_zero_cost_metrics(...) |
you want a cheap score before expensive training | weighted zero-cost metrics |
evaluate_zero_cost_metrics_raw(...) |
you want the underlying raw zero-cost signals | raw metrics without weighting |
train_darts_model(...) |
you want to search a single candidate with bilevel optimization | searched model, losses, alphas, metrics |
derive_final_architecture(...) |
you want a discrete model from a mixed searched one | fixed model |
train_final_model(...) |
you want to retrain a fixed model independently | final metrics and training info |
multi_fidelity_search(...) |
you want the end-to-end staged workflow | full search result dictionary |
ablation_weight_search(...) |
you want to study zero-cost weighting choices | ablation artifacts and rankings |
robust_initial_pool_over_op_pools(...) |
you want to test sensitivity to op-pool composition | robustness report |
evaluate_zero_cost_metrics(...)¶
metrics = trainer.evaluate_zero_cost_metrics(
model=candidate,
dataloader=val_loader,
max_samples=32,
num_batches=1,
fast_mode=True,
)
Useful knobs:
max_samplesnum_batchesfast_modeablationn_randomrandom_sigmaseed
train_darts_model(...)¶
search_results = trainer.train_darts_model(
model=candidate,
train_loader=train_loader,
val_loader=val_loader,
epochs=30,
arch_learning_rate=3e-3,
model_learning_rate=1e-3,
use_bilevel_optimization=True,
)
This runs the differentiable search phase for a single candidate and returns a dictionary with the searched model, train/validation losses, alpha traces, and final metrics.
derive_final_architecture(...)¶
Use this after the mixed architecture has converged enough that you want a discrete artifact for final training or export.
train_final_model(...)¶
final_results = trainer.train_final_model(
model=fixed_model,
train_loader=train_loader,
val_loader=val_loader,
test_loader=test_loader,
epochs=100,
)
multi_fidelity_search(...)¶
results = trainer.multi_fidelity_search(
train_loader=train_loader,
val_loader=val_loader,
test_loader=test_loader,
num_candidates=10,
search_epochs=10,
final_epochs=100,
max_samples=32,
top_k=5,
)
Result structure¶
The end-to-end search dictionary includes:
final_modelcandidatestop_candidatestrained_candidatesbest_candidatefinal_resultssearch_configstatswhencollect_stats=True
In practice, the most useful objects to inspect are:
results["best_candidate"]results["final_results"]["final_metrics"]results["trained_candidates"]results["candidates"]if you are debugging promotion behavior
Configuration surface¶
The config dataclasses in foreblocks.darts.config map cleanly onto the staged workflow:
| Config | What it controls |
|---|---|
DARTSSearchSpaceConfig |
op pool, architecture modes, hidden dims, cells, nodes, family grouping |
DARTSTrainConfig |
bilevel search phase, regularization, pruning, temperature schedule |
FinalTrainConfig |
fixed-model retraining budget and optimizer behavior |
MultiFildelitySearchConfig |
candidate count, promotion size, epoch budgets, worker/stats settings |
AblationSearchConfig |
zero-cost weighting ablations |
RobustPoolSearchConfig |
sensitivity to alternative op-pool definitions |
Default search-space ingredients¶
The default architecture modes are:
encoder_decoderencoder_onlydecoder_onlymamba
The default operation pool includes:
IdentityTimeConvGRNWaveletFourierTCNResidualMLPConvMixerMultiScaleConvPyramidConvPatchEmbedInvertedAttentionDLinearTimeMixerNBeatsTimesNet
Important note:
mambais represented as an architecture mode rather than as a normal operation in the default op list
Recommended tuning order¶
When the search behaves poorly, change things in this order:
- shrink or clarify the search space
- verify zero-cost metrics are ranking plausible candidates
- reduce
top_kand search budgets only after phase 2 looks sensible - tune bilevel parameters like
arch_learning_rate,model_learning_rate, andarchitecture_update_freq - only then add more advanced regularization or pruning rules
Useful early knobs:
hidden_dimscell_rangenode_rangemin_opsmax_opsnum_candidatestop_ksearch_epochs
Practical tips¶
- Run a tiny end-to-end search first, even if the result quality is poor. That verifies your search loop, result structure, and save/load path.
- Keep
num_candidates,top_k, andsearch_epochssmall while you are validating the search space. - Use
evaluate_zero_cost_metrics(...)directly before trusting a longer search run. - Treat
collect_stats=Trueas a benchmarking tool, not a default setting for every run. - Save the best discrete model with
trainer.save_best_model(...)once you have a search worth keeping.