Ship a demand forecast and capacity-alert artifact with rolling backtests, alert review, and retraining policy.
The ranking capstone influenced which products users could purchase. Warehouse teams now need an operational input: forecast daily parcel volume by fulfillment center so staffing and packing capacity can be planned before demand arrives.
This capstone ships a forecast and alert artifact. It doesn't automatically hire labor, move inventory, or page an operator on every miss. It creates a versioned expectation, detects unusually large residuals, and records evidence for a planner's decision.
Predict daily shipped parcels for each warehouse seven days ahead. Use an explicit planning contract:
| Field | Contract |
|---|---|
| entity | fulfillment center and shipping service tier |
| target | parcels shipped per calendar day |
| horizon | next seven days |
| decision | planner reviews capacity when forecast or alert requires it |
| baseline | same weekday from prior week |
| evaluation | MAE plus underforecast cost by high-volume slice |
Demand can change around promotions, holidays, seller campaigns, inventory shortages, and data outages. Those known drivers should appear as features only if they are scheduled and available before the forecast cutoff.
Hyndman and Athanasopoulos explain why forecast evaluation must use later observations and rolling forecasting origins rather than random splits.[1] For this project, each backtest run records its training cutoff, horizon, model version, and the actual values that arrived afterward.
Your repository surface should look like:
1demand-forecast/
2 data/
3 warehouse_daily_counts.parquet
4 planned_events.json
5 split_manifest.json
6 forecasting/
7 seasonal_baseline.py
8 train_candidate.py
9 rolling_backtest.py
10 alerts/
11 residual_policy.json
12 evaluate_alerts.py
13 reports/
14 backtest_metrics.json
15 alert_review.csv
16 tests/
17 test_future_rows_excluded.py
18 test_alert_contract.pyThe candidate can be a tree model over lag features, rolling means, service tier, weekday, and known promotions. It must beat the seasonal baseline on later windows, especially where underforecasting is expensive. A candidate that marginally improves MAE but misses peak-volume days should remain blocked.
This small fixture compares a same-weekday baseline against one candidate forecast and records alerts when actual volume exceeds the candidate by at least 20 parcels.
1actual = [104, 110, 119, 116, 160, 84, 78]
2baseline = [100, 112, 115, 118, 132, 82, 76]
3candidate = [103, 111, 117, 117, 140, 83, 77]
4
5def mae(values, predictions):
6 return sum(abs(value - prediction) for value, prediction in zip(values, predictions)) / len(values)
7
8def capacity_alerts(values, predictions, limit=20):
9 return [
10 {"day": index + 1, "observed": value, "expected": prediction, "residual": value - prediction}
11 for index, (value, prediction) in enumerate(zip(values, predictions))
12 if value - prediction >= limit
13 ]
14
15baseline_mae = mae(actual, baseline)
16candidate_mae = mae(actual, candidate)
17alerts = capacity_alerts(actual, candidate)
18decision = "eligible_for_planner_review" if candidate_mae < baseline_mae else "hold"
19
20print("baseline MAE:", round(baseline_mae, 1))
21print("candidate MAE:", round(candidate_mae, 1))
22print("alerts:", alerts)
23print("decision:", decision)1baseline MAE: 6.3
2candidate MAE: 3.9
3alerts: [{'day': 5, 'observed': 160, 'expected': 140, 'residual': 20}]
4decision: eligible_for_planner_reviewEven the improved candidate underforecasts the peak Friday. That alert isn't a failure to hide: it is a deliverable. A planner can examine whether the spike came from a known campaign, decide whether to add capacity, and attach the resolution to the alert row.
The report needs two sections:
| Report | Metrics | Release question |
|---|---|---|
| forecast quality | MAE, weighted underforecast cost, error by center/tier/horizon | does candidate help planning? |
| alert policy | useful review rate, missed high-volume events, alert volume | does the review queue help operations? |
Avoid claiming a prediction interval is reliable until it has been measured on held-out windows. If a 90 percent interval misses too many future days, the product should report that coverage failure and adjust the candidate or uncertainty method before relying on it.
Known promotions offer a useful slice. If the candidate improves routine days but misses every promotion peak, a global MAE improvement doesn't support deployment to promotion planning. Store slices and block claims the evidence doesn't support.
New outcomes arrive daily, but model replacement should happen on a scheduled or triggered review cycle. Store:
| Operational item | Required decision |
|---|---|
| daily observation join | attach actual count to stored forecast |
| weekly accuracy report | compare baseline and current candidate |
| alert resolution review | classify useful, expected, or data issue |
| retraining trigger | sustained cost regression or approved calendar cadence |
| promotion gate | rolling backtest and planner review |
This capstone provides forecasting artifacts for the next project, which will automate validation and promotion boundaries for several predictive models.
| Artifact | Strong submission demonstrates |
|---|---|
| forecast package | time-aware training windows, baseline, uncertainty or error policy, and backtest report |
| alert workflow | residual-based alerts with reason codes and planner resolution logging |
| operations | retraining cadence, monitoring, promotion gates, and rollback plan |
| Symptom | Cause | Fix |
|---|---|---|
| Backtest appears precise but live peaks miss | future or promotion leakage | freeze cutoff and known-in-advance fields |
| Planner receives noisy alerts | threshold lacks reviewed outcomes | evaluate alert usefulness separately |
| Forecast changes without explanation | artifact and cutoff missing | log versioned forecast bundle |