Time Series Generator¶
foretools/tsgen/ts_gen.py provides a pedagogical synthetic time-series generator with explicit structural components. It is useful when you need data that is fully controlled, explainable, and easy to visualize.
Import path¶
Core workflow¶
from foretools.tsgen.ts_gen import TimeSeriesGenerator
gen = TimeSeriesGenerator(random_state=42)
df, meta = gen.make(
n_series=3,
n_steps=365,
freq="D",
trend={
"type": "piecewise",
"knots": [120, 250],
"slopes": [0.08, -0.03, 0.05],
"intercept": 18.0,
},
seasonality=[
{"period": 7.0, "amplitude": 4.2, "harmonics": 2},
{"period": 30.5, "amplitude": 1.8, "harmonics": 1},
],
cycle={"period": 180.0, "amplitude": 2.4, "freq_drift": 0.08, "phase": 0.2},
noise={"ar": [0.55], "ma": [0.2], "sigma": 0.9},
regime={
"n_states": 2,
"p": [[0.96, 0.04], [0.08, 0.92]],
"state_bias": [0.0, 3.2],
"state_sigma_scale": [1.0, 1.35],
},
exog={
"n_features": 2,
"types": ["random_walk", "seasonal"],
"beta": [0.22, -1.1],
},
add_calendar=True,
return_components=True,
)
What make() returns¶
df¶
A tidy dataframe with one row per timestamp per series.
Expected columns:
seriestimey- optional exogenous columns such as
x1,x2, ... - optional calendar columns such as
month,dayofweek,fourier_wk_sin,fourier_yr_cos
meta¶
A metadata dictionary that contains the ground truth behind the generated series.
Most useful keys:
meta["time_index"]: shared pandas time indexmeta["params"]: the configuration used to generate the datameta["components"]: arrays for trend, seasonality, cycle, noise, regime bias, and exogenous effectmeta["states"]: latent regime state path when regime switching is enabledmeta["missing_mask"]: missing-value mask if missingness was injectedmeta["splits"]: chronological split boundaries ifsplits=was requested
Supported component families¶
| Component | Main options | Notes |
|---|---|---|
| Trend | linear, poly, piecewise |
piecewise uses knots and per-segment slopes |
| Seasonality | list of seasonal specs | each seasonal spec supports period, amplitude, phase, harmonics |
| Cycle | period, amplitude, freq_drift, phase |
low-frequency oscillation separate from the main seasonal terms |
| Noise | ar, ma, sigma |
ARMA-style base noise |
| Regime | n_states, transition matrix p, state_bias, state_sigma_scale |
adds discrete state-dependent bias and volatility |
| Heteroskedasticity | {"type": "arch1", "alpha0": ..., "alpha1": ...} |
replaces the base noise with ARCH-like noise |
| Outliers | prob, scale |
injects heavy-tailed spikes |
| Missingness | prob, block_prob, block_max |
applies missing values to y only |
| Exogenous inputs | n_features, types, beta |
supported types are random_walk, seasonal, binary_event, noise |
| Multivariate mixing | n_factors, mix_strength |
induces cross-series correlation after component synthesis |
| Calendar features | add_calendar=True |
adds date parts and Fourier encodings |
Common examples¶
1. Fully observed structural series¶
Use this when you want clean series for plotting, teaching, or debugging a model's ability to recover trend and seasonality.
df, meta = gen.make(
n_series=2,
n_steps=400,
trend={"type": "linear", "slope": 0.03, "intercept": 5.0},
seasonality=[{"period": 7.0, "amplitude": 3.0, "harmonics": 2}],
cycle={"period": 160.0, "amplitude": 1.2},
noise={"ar": [0.5], "ma": [0.1], "sigma": 0.4},
return_components=True,
)
2. Stress-test with missingness, regimes, and exogenous drivers¶
Use this when you want a harder forecasting or preprocessing benchmark.
df, meta = gen.make(
n_series=4,
n_steps=730,
trend={"type": "piecewise", "knots": [180, 500], "slopes": [0.05, -0.02, 0.01]},
seasonality=[
{"period": 7.0, "amplitude": 2.0, "harmonics": 2},
{"period": 30.5, "amplitude": 1.0, "harmonics": 1},
],
regime={
"n_states": 3,
"p": [[0.95, 0.04, 0.01], [0.05, 0.90, 0.05], [0.02, 0.08, 0.90]],
"state_bias": [0.0, 1.5, -1.0],
"state_sigma_scale": [1.0, 1.4, 1.8],
},
exog={
"n_features": 3,
"types": ["random_walk", "binary_event", "seasonal"],
"beta": [0.1, 2.0, -0.7],
},
missing={"prob": 0.02, "block_prob": 0.01, "block_max": 7},
outliers={"prob": 0.01, "scale": 5.0},
return_components=True,
)
3. Create a train/val/test split quickly¶
make_train_ready() pivots the generated data into wide format and reserves the last two horizons for validation and test.
dataset = gen.make_train_ready(
n_series=3,
n_steps=240,
horizon=24,
trend={"type": "linear", "slope": 0.02},
seasonality=[{"period": 24.0, "amplitude": 1.5}],
noise={"ar": [0.3], "sigma": 0.5},
)
train_df = dataset["train"]
val_df = dataset["val"]
test_df = dataset["test"]
meta = dataset["meta"]
Plotting helpers¶
TimeSeriesGenerator includes simple plotting helpers for fast inspection:
plot_series()shows the observedypath for a single series.plot_decompose()renders separate trend, seasonality, cycle, and noise plots.
If you need a richer figure that overlays all components at once, use the companion notebook foretools/tsgen/ts_gen_complete_series.ipynb.
Practical behavior to know¶
- Missingness is applied after the full signal is constructed, and only to the
ycolumn. - When
heterosked={"type": "arch1", ...}is enabled, the generator replaces the previously built base noise with ARCH-like noise for clarity. - Calendar features are shared across all series because they come from the common timestamp index.
- Multivariate factor mixing happens after the univariate structural components are combined, so it affects the final series rather than individual components.
return_components=Falsekeeps the dataframe output but omits the large component arrays frommeta.
Recommended workflow¶
- Start with
return_components=Trueso you can inspect the exact decomposition. - Keep missingness and outliers off for first-pass model debugging.
- Add regimes, exogenous drivers, and missingness only after the baseline workflow is working.
- Use the notebook examples when you want publication-style plots or teaching material.