LearnPortfolio CapstonesCapstone: Production ML Pipeline

⚙️HardMLOps & Deployment

Capstone: Production ML Pipeline

Assemble predictive ML artifacts into validated training, registry promotion, canary monitoring, and rollback.

17 min read

Learning path

Step 83 of 158 in the full curriculum

Capstone: Image Damage Classifier Capstone: Document QA

Four products now need the same operational discipline: a late-delivery warning model, a product ranker, a warehouse demand forecast, and a damaged-package photo classifier. Each uses different metrics, but each relies on immutable data evidence, validated candidates, controlled promotion, monitoring, and rollback.

This capstone assembles that discipline into one ML platform workflow. It isn't tied to a particular orchestrator or cloud vendor. A reviewer must be able to trace any live decision back to data, feature, model, policy, and promotion evidence. That requires stored receipts, not a mutable Boolean that happens to say passed.

Production alias timeline for delivery-risk-v1. A failed offline receipt holds on critical-slice failure while accepted offline-receipt-2 opens only the canary alias beside production v0. The first-hour window has 500 requests, error rate 0.002, p95 latency 118 milliseconds, and no delayed labels, so it holds. The day-seven window has 4200 requests, error rate 0.003, p95 latency 124 milliseconds, delayed labels ready, and late-warning cost delta minus 0.08, so canary-receipt-3 becomes ready. Promotion rechecks production v0 before moving production to v1. Production-receipt-1 records a day-eight cost delta of plus 0.14, so an audited rollback restores the complete v0 release. — Production stays on v0 until stored offline and canary receipts authorize promotion; a stored production-monitor receipt later records the cost regression that restores the complete v0 release.

Define the Shared Release Tuple

Models differ, but their release manifest can share a schema:

Field	ETA example	Ranking example	Forecast example	Vision example
data snapshot	carrier events through cutoff	catalog and judged queries	daily counts through cutoff	return photos grouped by shipment
feature or preprocessing version	`eta-features-v1`	`ranking-features-v1`	`demand-lags-v1`	`parcel-rgb-224-center-crop-v1`
model artifact	`delay-model-v1`	`market-ranker-v1`	`warehouse-demand-v1`	`damage-cnn-v1`
action policy	warning threshold	eligibility and slate rule	alert threshold	quality check and review threshold
promotion policy	slice recall and cost limits	blocked-listing and NDCG limits	peak underforecast-cost limit	usable-image and source-slice limits
monitor	delayed labels and freshness	impressions and returns	residuals and alert review	photo quality and reviewer labels
previous release	`delivery-risk-v0`	`market-ranker-v0`	`warehouse-demand-v0`	`damage-cnn-v0`

The release tuple prevents an incident review from asking which threshold, feature transform, or gate policy happened to be active. Sculley et al. warn that ML systems accumulate debt through data dependencies, configuration, and feedback loops unless those boundaries are managed explicitly.^{[1]Reference 1Hidden Technical Debt in Machine Learning Systems.https://research.google/pubs/hidden-technical-debt-in-machine-learning-systems/}

Diagram showing Trigger + validate schema + cutoff, Register candidate immutable release, Offline eval gate append receipt, and pass. — Trigger + validate schema + cutoff, Register candidate immutable release, Offline eval gate append receipt, and pass.

Build the Portfolio Repository

Submit a small but inspectable platform surface:

text

production-ml-platform/
  contracts/
    release_manifest.schema.json
    promotion_policy.json
  pipelines/
    validate_snapshot.py
    train_candidate.py
    evaluate_candidate.py
    promote_alias.py
  registry/
    releases.jsonl
  monitoring/
    live_windows.py
    rollback_policy.py
  receipts/
    offline_gate_report.json
    canary_monitor_report.json
    alias_audit.jsonl
  projects/
    eta/
    ranking/
    forecast/
    vision/
  tests/
    test_failed_gate_never_promotes.py
    test_empty_monitor_window_holds.py
    test_unregistered_receipt_never_promotes.py
    test_alias_race_blocks_promotion.py
    test_delayed_labels_block_promotion.py
    test_rollback_restores_manifest.py

Google Cloud's MLOps architecture separates automated data/model validation, metadata, serving, monitoring, and continuous-training triggers around promotion.^{[2]Reference 2MLOps: Continuous Delivery and Automation Pipelines in Machine Learning.https://docs.cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning} Your repository needn't copy that platform, but it should prove each boundary through a deterministic local fixture and test.

Register a Candidate Without Moving Production

Training completion is evidence, not permission to change live behavior. Start by freezing the whole release tuple. The candidate records both its previous release and its promotion policy, so a reviewer can replay the rollback target and gate thresholds before any traffic moves.

01-register-release-registry.py

from dataclasses import asdict, dataclass
import json

@dataclass(frozen=True)
class PromotionPolicy:
    policy_id: str
    required_offline_gates: tuple[str, ...]
    min_canary_windows: int
    max_error_rate: float
    max_p95_latency_ms: int
    max_late_warning_cost_delta: float

@dataclass(frozen=True)
class Release:
    release_id: str
    data_snapshot: str
    feature_version: str
    model_artifact: str
    action_policy_version: str
    promotion_policy_version: str
    previous_release: str | None

@dataclass(frozen=True)
class OfflineReceipt:
    receipt_id: str
    candidate: str
    production_before: str
    promotion_policy_version: str | None
    failed_gates: tuple[str, ...]
    decision: str

POLICIES = {
    "eta-promotion-v1": PromotionPolicy(
        policy_id="eta-promotion-v1",
        required_offline_gates=(
            "schema_valid",
            "no_leakage",
            "critical_slice_pass",
            "cost_improves",
        ),
        min_canary_windows=2,
        max_error_rate=0.01,
        max_p95_latency_ms=250,
        max_late_warning_cost_delta=0.0,
    )
}

registry = {
    "delivery-risk-v0": Release(
        "delivery-risk-v0",
        "carrier-events-through-2026-04-30",
        "eta-features-v1",
        "delay-model-v0",
        "eta-threshold-v1",
        "eta-promotion-v1",
        None,
    ),
    "delivery-risk-v1": Release(
        "delivery-risk-v1",
        "carrier-events-through-2026-05-31",
        "eta-features-v1",
        "delay-model-v1",
        "eta-threshold-v1",
        "eta-promotion-v1",
        "delivery-risk-v0",
    ),
}
aliases = {"production": "delivery-risk-v0"}
offline_receipts: dict[str, OfflineReceipt] = {}

print("registry:", list(registry))
print("production:", aliases["production"])

Output

registry: ['delivery-risk-v0', 'delivery-risk-v1']
production: delivery-risk-v0

02-open-canary-after-offline-gates.py

def open_canary(candidate_id: str, gates: dict[str, bool]) -> OfflineReceipt:
    candidate = registry.get(candidate_id)
    production_before = aliases["production"]
    failed = []
    policy = POLICIES.get(candidate.promotion_policy_version) if candidate else None
    if candidate is None:
        failed.append("candidate_not_registered")
    elif policy is None:
        failed.append("promotion_policy_not_registered")
    else:
        failed.extend(
            gate for gate in policy.required_offline_gates if not gates.get(gate, False)
        )
    if candidate is not None and candidate.previous_release != production_before:
        failed.append("previous_release_mismatch")
    if aliases.get("canary") not in (None, candidate_id):
        failed.append("another_canary_is_active")

    receipt = OfflineReceipt(
        receipt_id=f"offline-receipt-{len(offline_receipts) + 1}",
        candidate=candidate_id,
        production_before=production_before,
        promotion_policy_version=(
            candidate.promotion_policy_version if candidate is not None else None
        ),
        failed_gates=tuple(sorted(failed)),
        decision="hold_offline" if failed else "open_canary",
    )
    offline_receipts[receipt.receipt_id] = receipt
    if receipt.decision == "open_canary":
        aliases["canary"] = candidate_id
    return receipt

bad_gates = {
    "schema_valid": True,
    "no_leakage": True,
    "critical_slice_pass": False,
    "cost_improves": True,
}
good_gates = {**bad_gates, "critical_slice_pass": True}

bad_offline_receipt = open_canary("delivery-risk-v1", bad_gates)
accepted_offline_receipt = open_canary("delivery-risk-v1", good_gates)
print(json.dumps(asdict(bad_offline_receipt), indent=2))
print(json.dumps(asdict(accepted_offline_receipt), indent=2))
print("aliases:", json.dumps(aliases, sort_keys=True))

Output

{
  "receipt_id": "offline-receipt-1",
  "candidate": "delivery-risk-v1",
  "production_before": "delivery-risk-v0",
  "promotion_policy_version": "eta-promotion-v1",
  "failed_gates": [
    "critical_slice_pass"
  ],
  "decision": "hold_offline"
}
{
  "receipt_id": "offline-receipt-2",
  "candidate": "delivery-risk-v1",
  "production_before": "delivery-risk-v0",
  "promotion_policy_version": "eta-promotion-v1",
  "failed_gates": [],
  "decision": "open_canary"
}
aliases: {"canary": "delivery-risk-v1", "production": "delivery-risk-v0"}

A failed offline slice leaves production unchanged. Passing evaluation gates opens only the canary alias. Each attempt appends an immutable receipt with candidate, production base, policy version, verdict, and reasons. Nothing in this cell can overwrite production.

Wait for Canary Evidence

Fast checks catch broken schemas, errors, and latency spikes. They can't prove prediction quality when labels arrive later. A canary rollout needs both kinds of evidence. The controller below stores each observation window inside an immutable receipt, refuses to promote after the first hour because late-delivery outcomes aren't ready yet, and handles an empty window list as a hold rather than crashing.

03-canary-window-contract.py

@dataclass(frozen=True)
class CanaryWindow:
    window_id: str
    release_id: str
    observed_day: int
    requests: int
    error_rate: float
    p95_latency_ms: int
    delayed_labels_ready: bool
    late_warning_cost_delta: float | None

@dataclass(frozen=True)
class CanaryReceipt:
    receipt_id: str
    candidate: str
    production_before: str
    offline_receipt_id: str
    promotion_policy_version: str | None
    windows: tuple[CanaryWindow, ...]
    failed_gates: tuple[str, ...]
    decision: str

canary_receipts: dict[str, CanaryReceipt] = {}

print("canary policy windows:", POLICIES["eta-promotion-v1"].min_canary_windows)

04-evaluate-canary-evidence.py

def evaluate_canary(
    candidate_id: str,
    offline_receipt_id: str,
    windows: list[CanaryWindow],
) -> CanaryReceipt:
    candidate = registry.get(candidate_id)
    policy = POLICIES.get(candidate.promotion_policy_version) if candidate else None
    offline_receipt = offline_receipts.get(offline_receipt_id)
    failed = []
    abort_reasons = []
    if candidate is None:
        failed.append("candidate_not_registered")
        abort_reasons.append("candidate_not_registered")
    elif policy is None:
        failed.append("promotion_policy_not_registered")
        abort_reasons.append("promotion_policy_not_registered")
    if aliases.get("canary") != candidate_id:
        failed.append("canary_alias_missing")
    if (
        offline_receipt is None
        or offline_receipt.candidate != candidate_id
        or offline_receipt.decision != "open_canary"
    ):
        failed.append("accepted_offline_receipt_missing")
        abort_reasons.append("accepted_offline_receipt_missing")
    if candidate is not None and candidate.previous_release != aliases["production"]:
        failed.append("production_changed_during_canary")
        abort_reasons.append("production_changed_during_canary")
    if policy is not None and len(windows) < policy.min_canary_windows:
        failed.append("observation_window_incomplete")
    if len({window.window_id for window in windows}) != len(windows):
        failed.append("duplicate_window_id")
        abort_reasons.append("duplicate_window_id")
    observed_days = [window.observed_day for window in windows]
    if observed_days != sorted(set(observed_days)):
        failed.append("window_order_invalid")
        abort_reasons.append("window_order_invalid")
    if any(window.release_id != candidate_id for window in windows):
        failed.append("mixed_release_windows")
        abort_reasons.append("mixed_release_windows")
    if any(window.requests <= 0 for window in windows):
        failed.append("request_count_missing")
        abort_reasons.append("request_count_missing")
    if policy is not None and any(window.error_rate > policy.max_error_rate for window in windows):
        failed.append("error_rate_regression")
        abort_reasons.append("error_rate_regression")
    if policy is not None and any(
        window.p95_latency_ms > policy.max_p95_latency_ms for window in windows
    ):
        failed.append("latency_regression")
        abort_reasons.append("latency_regression")

    latest = windows[-1] if windows else None
    if latest is None or not latest.delayed_labels_ready:
        failed.append("delayed_quality_not_ready")
    elif (
        policy is not None
        and (
            latest.late_warning_cost_delta is None
            or latest.late_warning_cost_delta > policy.max_late_warning_cost_delta
        )
    ):
        failed.append("late_warning_cost_regression")
        abort_reasons.append("late_warning_cost_regression")

    decision = (
        "abort_canary"
        if abort_reasons
        else "hold_canary"
        if failed
        else "ready_for_promotion"
    )
    receipt = CanaryReceipt(
        receipt_id=f"canary-receipt-{len(canary_receipts) + 1}",
        candidate=candidate_id,
        production_before=aliases["production"],
        offline_receipt_id=offline_receipt_id,
        promotion_policy_version=(
            candidate.promotion_policy_version if candidate is not None else None
        ),
        windows=tuple(windows),
        failed_gates=tuple(sorted(failed)),
        decision=decision,
    )
    canary_receipts[receipt.receipt_id] = receipt
    if decision == "abort_canary":
        aliases.pop("canary", None)
    return receipt

first_hour = CanaryWindow("first-hour", "delivery-risk-v1", 0, 500, 0.002, 118, False, None)
day_seven = CanaryWindow("day-seven", "delivery-risk-v1", 7, 4200, 0.003, 124, True, -0.08)

empty_receipt = evaluate_canary("delivery-risk-v1", accepted_offline_receipt.receipt_id, [])
early_receipt = evaluate_canary(
    "delivery-risk-v1", accepted_offline_receipt.receipt_id, [first_hour]
)
ready_receipt = evaluate_canary(
    "delivery-risk-v1", accepted_offline_receipt.receipt_id, [first_hour, day_seven]
)
print("empty decision:", empty_receipt.decision)
print("early decision:", early_receipt.decision)
print("ready decision:", ready_receipt.decision)

05-print-canary-receipts.py

print("empty:", json.dumps(asdict(empty_receipt), indent=2))
print("early:", json.dumps(asdict(early_receipt), indent=2))
print("ready:", json.dumps(asdict(ready_receipt), indent=2))

Output

empty: {
  "receipt_id": "canary-receipt-1",
  "candidate": "delivery-risk-v1",
  "production_before": "delivery-risk-v0",
  "offline_receipt_id": "offline-receipt-2",
  "promotion_policy_version": "eta-promotion-v1",
  "windows": [],
  "failed_gates": [
    "delayed_quality_not_ready",
    "observation_window_incomplete"
  ],
  "decision": "hold_canary"
}
early: {
  "receipt_id": "canary-receipt-2",
  "candidate": "delivery-risk-v1",
  "production_before": "delivery-risk-v0",
  "offline_receipt_id": "offline-receipt-2",
  "promotion_policy_version": "eta-promotion-v1",
  "windows": [
    {
      "window_id": "first-hour",
      "release_id": "delivery-risk-v1",
      "observed_day": 0,
      "requests": 500,
      "error_rate": 0.002,
      "p95_latency_ms": 118,
      "delayed_labels_ready": false,
      "late_warning_cost_delta": null
    }
  ],
  "failed_gates": [
    "delayed_quality_not_ready",
    "observation_window_incomplete"
  ],
  "decision": "hold_canary"
}
ready: {
  "receipt_id": "canary-receipt-3",
  "candidate": "delivery-risk-v1",
  "production_before": "delivery-risk-v0",
  "offline_receipt_id": "offline-receipt-2",
  "promotion_policy_version": "eta-promotion-v1",
  "windows": [
    {
      "window_id": "first-hour",
      "release_id": "delivery-risk-v1",
      "observed_day": 0,
      "requests": 500,
      "error_rate": 0.002,
      "p95_latency_ms": 118,
      "delayed_labels_ready": false,
      "late_warning_cost_delta": null
    },
    {
      "window_id": "day-seven",
      "release_id": "delivery-risk-v1",
      "observed_day": 7,
      "requests": 4200,
      "error_rate": 0.003,
      "p95_latency_ms": 124,
      "delayed_labels_ready": true,
      "late_warning_cost_delta": -0.08
    }
  ],
  "failed_gates": [],
  "decision": "ready_for_promotion"
}

late_warning_cost_delta=-0.08 means the candidate reduced late-warning cost by eight percent relative to the previous release on this local fixture. It's a teaching threshold, not a universal production policy. Real teams choose windows and limits from product risk, traffic volume, and label delay.

An incomplete window returns hold_canary: gather more evidence without widening exposure. A measured latency, error-rate, or delayed-quality regression returns abort_canary and removes the canary alias. Corrupted or mismatched telemetry aborts too because the controller can't prove safe exposure. Missing evidence and negative evidence aren't the same operational state.

Promote Last, Then Prove Rollback

The final cell makes alias movement explicit. Promotion fetches a stored canary receipt by ID rather than trusting a caller-supplied decision dictionary. Rollback follows the same rule: production metrics become a stored receipt before they can restore the previous release. Both paths recheck the live production alias immediately before movement.

06-promote-with-stored-receipt.py

@dataclass(frozen=True)
class AliasEvent:
    action: str
    from_release: str
    to_release: str
    evidence_receipt_id: str
    reasons: tuple[str, ...]

@dataclass(frozen=True)
class ProductionReceipt:
    receipt_id: str
    window: CanaryWindow
    failed_gates: tuple[str, ...]
    decision: str

audit_events: list[AliasEvent] = []
production_receipts: dict[str, ProductionReceipt] = {}

def promote(candidate_id: str, canary_receipt_id: str) -> dict[str, object]:
    candidate = registry.get(candidate_id)
    canary_receipt = canary_receipts.get(canary_receipt_id)
    failed = []
    if candidate is None:
        failed.append("candidate_not_registered")
    if aliases.get("canary") != candidate_id:
        failed.append("canary_alias_missing")
    if canary_receipt is None:
        failed.append("canary_receipt_not_registered")
    elif canary_receipt.candidate != candidate_id:
        failed.append("canary_receipt_candidate_mismatch")
    elif canary_receipt.decision != "ready_for_promotion":
        failed.append("canary_receipt_not_ready")
    elif (
        candidate is not None
        and canary_receipt.promotion_policy_version != candidate.promotion_policy_version
    ):
        failed.append("canary_receipt_policy_mismatch")
    elif (
        offline_receipts.get(canary_receipt.offline_receipt_id) is None
        or offline_receipts[canary_receipt.offline_receipt_id].decision != "open_canary"
    ):
        failed.append("offline_receipt_not_registered")
    if candidate is not None and candidate.previous_release != aliases["production"]:
        failed.append("production_changed_since_canary_open")
    if canary_receipt is not None and canary_receipt.production_before != aliases["production"]:
        failed.append("production_changed_since_canary_receipt")
    if failed:
        return {"action": "hold_promotion", "reasons": sorted(failed)}

    previous = aliases["production"]
    aliases["previous_production"] = previous
    aliases["production"] = candidate_id
    aliases.pop("canary")
    event = AliasEvent("promote", previous, candidate_id, canary_receipt_id, ())
    audit_events.append(event)
    return asdict(event)

print("promote helper ready")

07-monitor-production-and-rollback.py

def rollback_reasons(window: CanaryWindow) -> list[str]:
    release = registry.get(window.release_id)
    if release is None:
        return ["production_window_release_not_registered"]
    policy = POLICIES.get(release.promotion_policy_version)
    if policy is None:
        return ["production_policy_not_registered"]
    failed = []
    if window.error_rate > policy.max_error_rate:
        failed.append("error_rate_regression")
    if window.p95_latency_ms > policy.max_p95_latency_ms:
        failed.append("latency_regression")
    if not window.delayed_labels_ready:
        failed.append("delayed_quality_not_ready")
    elif (
        window.late_warning_cost_delta is None
        or window.late_warning_cost_delta > policy.max_late_warning_cost_delta
    ):
        failed.append("late_warning_cost_regression")
    return failed

def evaluate_production(window: CanaryWindow) -> ProductionReceipt:
    failed = rollback_reasons(window)
    release_mismatch = window.release_id != aliases["production"]
    if release_mismatch:
        failed.append("production_window_release_mismatch")
    decision = (
        "hold_rollback"
        if release_mismatch
        else "rollback_required"
        if failed
        else "keep_production"
    )
    receipt = ProductionReceipt(
        receipt_id=f"production-receipt-{len(production_receipts) + 1}",
        window=window,
        failed_gates=tuple(sorted(failed)),
        decision=decision,
    )
    production_receipts[receipt.receipt_id] = receipt
    return receipt

def rollback_if_needed(production_receipt_id: str) -> dict[str, object]:
    receipt = production_receipts.get(production_receipt_id)
    if receipt is None:
        return {"action": "hold_rollback", "reason": "production_receipt_not_registered"}
    if receipt.decision == "hold_rollback":
        return {"action": "hold_rollback", "reasons": list(receipt.failed_gates)}
    if receipt.window.release_id != aliases["production"]:
        return {"action": "hold_rollback", "reason": "production_changed_since_monitor_receipt"}
    if receipt.decision == "keep_production":
        return {"action": "keep_production", "release": aliases["production"]}
    if receipt.decision != "rollback_required":
        return {"action": "hold_rollback", "reason": "production_receipt_decision_invalid"}

    previous = aliases.get("previous_production")
    if previous is None or previous not in registry:
        return {"action": "hold_rollback", "reason": "previous_production_not_registered"}
    failed_release = aliases["production"]
    aliases["production"] = previous
    aliases["rollback_from"] = failed_release
    event = AliasEvent(
        "rollback",
        failed_release,
        previous,
        receipt.receipt_id,
        receipt.failed_gates,
    )
    audit_events.append(event)
    return asdict(event)

print("rollback helper ready")

08-attempt-promotion-paths.py

print("fabricated promotion:", promote("delivery-risk-v1", "canary-receipt-missing"))
print("early promotion:", promote("delivery-risk-v1", early_receipt.receipt_id))
print("approved promotion:", promote("delivery-risk-v1", ready_receipt.receipt_id))

09-detect-production-regression.py

degraded = CanaryWindow("production-day-eight", "delivery-risk-v1", 8, 900, 0.004, 130, True, 0.14)
degraded_receipt = evaluate_production(degraded)
rollback_result = rollback_if_needed(degraded_receipt.receipt_id)
print("production decision:", degraded_receipt.decision)
print("rollback action:", rollback_result["action"])

10-audit-alias-movements.py

print("production receipt:", json.dumps(asdict(degraded_receipt), indent=2))
print("rollback:", rollback_result)
print("aliases:", json.dumps(aliases, sort_keys=True))
print("audit:", json.dumps([asdict(event) for event in audit_events], indent=2))

Output

production receipt: {
  "receipt_id": "production-receipt-1",
  "window": {
    "window_id": "production-day-eight",
    "release_id": "delivery-risk-v1",
    "observed_day": 8,
    "requests": 900,
    "error_rate": 0.004,
    "p95_latency_ms": 130,
    "delayed_labels_ready": true,
    "late_warning_cost_delta": 0.14
  },
  "failed_gates": [
    "late_warning_cost_regression"
  ],
  "decision": "rollback_required"
}
rollback: {'action': 'rollback', 'from_release': 'delivery-risk-v1', 'to_release': 'delivery-risk-v0', 'evidence_receipt_id': 'production-receipt-1', 'reasons': ('late_warning_cost_regression',)}
aliases: {"previous_production": "delivery-risk-v0", "production": "delivery-risk-v0", "rollback_from": "delivery-risk-v1"}
audit: [
  {
    "action": "promote",
    "from_release": "delivery-risk-v0",
    "to_release": "delivery-risk-v1",
    "evidence_receipt_id": "canary-receipt-3",
    "reasons": []
  },
  {
    "action": "rollback",
    "from_release": "delivery-risk-v1",
    "to_release": "delivery-risk-v0",
    "evidence_receipt_id": "production-receipt-1",
    "reasons": [
      "late_warning_cost_regression"
    ]
  }
]

Rollback restores delivery-risk-v0, not its weights alone. That distinction matters because preprocessing, features, thresholds, and policy can all change serving behavior. Each audit event names the stored receipt that authorized its alias movement, so a later reviewer can reconstruct both promotion and rollback.

Join Fast and Delayed Monitoring

Live checks differ by product, but the promotion controller handles the same categories:

Gate type	ETA	Ranking	Forecast	Vision
immediate data health	scan freshness	eligible candidate supply	latest counts loaded	photo quality
immediate service health	latency/errors	scoring latency	forecast API availability	image scoring latency
delayed quality	late warning cost	purchase/return experiment	MAE and peak residual	reviewer-confirmed damage
rollback event	stale warning spike	blocked listing exposure	broken alert flood	unsupported escalations

For scoring systems with delayed labels, canary monitoring should pause wider promotion until enough outcomes arrive. A model that hasn't failed yet isn't the same as a model that has passed.

Continuous training is appropriate when a schedule or monitored condition creates a candidate run. It should never skip data validation, offline comparisons, or a promotion record. The pipeline's value isn't automation alone; it's refusing untraceable changes.

Practice: break the release controller

Run the runnable examples again after each mutation. Predict which receipt or alias changes before reading output.

Change the delivery-risk-v1 constructor's previous_release from "delivery-risk-v0" to "delivery-risk-v-missing".
Set good_gates["critical_slice_pass"] back to False.
Pass [day_seven, first_hour] to one evaluate_canary call. Confirm that the receipt records window_order_invalid.
Change the day_seven constructor's delayed_labels_ready argument from True to False.
Change the day_seven constructor's late_warning_cost_delta argument from -0.08 to 0.05.
Replace ready_receipt.receipt_id with "canary-receipt-missing" in the approved promotion call.
Before approved promotion, set aliases["production"] = "delivery-risk-v1" to simulate an out-of-band alias move. Confirm that promotion holds, then reset it to "delivery-risk-v0".
Change the degraded constructor's late_warning_cost_delta argument from 0.14 to -0.01.
Change the degraded constructor's release ID from "delivery-risk-v1" to "delivery-risk-v0". Confirm that monitoring evidence for a different release can't move the current alias.
Replace degraded_receipt.receipt_id with "production-receipt-missing" in the rollback call. Which authorization check holds the alias?

Practice answer sketches

Submission checklist

Artifact	Acceptance condition
release schema	identifies data, features, model, action policy, promotion policy, and previous release
registry	contains immutable stable and candidate releases
append-only receipts	record why each candidate passed, held, aborted, promoted, or rolled back
alias promotion code	fetches stored receipt and rechecks production before moving alias
monitor policy	defines canary pause, promote, abort, rollback
tests	execute empty-window, failed-gate, fabricated-receipt, alias-race, and rollback paths

This completes the conventional production ML portfolio. The next capstone returns to LLM products: document QA must apply the same lineage and release discipline to retrieved evidence and generated answers.

Mastery check

Evaluation rubric

Artifact	Strong submission demonstrates
reproducible run	versioned data, features, model artifact, threshold policy, and evaluation evidence
controlled promotion	candidate alias, automated gates, canary criteria, and explicit production move
recovery	monitoring tied to actions, rollback trigger, and deployable prior release

Common failures

Symptom	Cause	Fix
Retrain job changes behavior with no review	training and promotion merged	separate candidate registry from aliases
Rollback restores weights but not threshold	policy omitted from release bundle	version complete release tuple
Empty canary window crashes controller	latest observation indexed before evidence exists	return a stored hold receipt for missing evidence
Canary promotes before outcomes exist	only latency checked	require delayed quality window
Fabricated `ready` dictionary promotes candidate	alias mover trusts caller-owned state	fetch immutable receipt by stored ID
Production changed after canary opened	rollback base checked only once	recheck current alias before promotion
Old monitoring window rolls back current release	rollback trusts reasons without release identity	verify monitor window matches production alias
Caller fabricates rollback reasons	alias mover accepts an unregistered reason list	evaluate monitoring once, store receipt, and fetch it by ID
Rollback points at an unknown state	candidate omits previous release	verify rollback target before opening canary traffic
Production changes but incident review has no history	alias moves aren't audited	append promotion and rollback events

Next Step

Continue to Capstone: Document QA

You have shipped a validated predictive-ML promotion path. Next you'll carry the same evidence discipline into a document-answering service whose outputs must cite approved source material or abstain.

PreviousCapstone: Image Damage Classifier

Share this article

X Facebook LinkedIn Bluesky Reddit Hacker News Email

References

Hidden Technical Debt in Machine Learning Systems.

Sculley et al. · 2015

MLOps: Continuous Delivery and Automation Pipelines in Machine Learning.

Google Cloud. · 2026 · Official documentation

Discussion

Questions and insights from fellow learners.

Discussion loads when you reach this section.

Back to Topics

LearnPortfolio CapstonesCapstone: Production ML Pipeline

⚙️HardMLOps & Deployment

Capstone: Production ML Pipeline

Assemble predictive ML artifacts into validated training, registry promotion, canary monitoring, and rollback.

17 min read

Learning path

Step 83 of 158 in the full curriculum

Capstone: Image Damage Classifier Capstone: Document QA

Define the Shared Release Tuple

Models differ, but their release manifest can share a schema:

Field	ETA example	Ranking example	Forecast example	Vision example
data snapshot	carrier events through cutoff	catalog and judged queries	daily counts through cutoff	return photos grouped by shipment
feature or preprocessing version	`eta-features-v1`	`ranking-features-v1`	`demand-lags-v1`	`parcel-rgb-224-center-crop-v1`
model artifact	`delay-model-v1`	`market-ranker-v1`	`warehouse-demand-v1`	`damage-cnn-v1`
action policy	warning threshold	eligibility and slate rule	alert threshold	quality check and review threshold
promotion policy	slice recall and cost limits	blocked-listing and NDCG limits	peak underforecast-cost limit	usable-image and source-slice limits
monitor	delayed labels and freshness	impressions and returns	residuals and alert review	photo quality and reviewer labels
previous release	`delivery-risk-v0`	`market-ranker-v0`	`warehouse-demand-v0`	`damage-cnn-v0`

Build the Portfolio Repository

Submit a small but inspectable platform surface:

text

production-ml-platform/
  contracts/
    release_manifest.schema.json
    promotion_policy.json
  pipelines/
    validate_snapshot.py
    train_candidate.py
    evaluate_candidate.py
    promote_alias.py
  registry/
    releases.jsonl
  monitoring/
    live_windows.py
    rollback_policy.py
  receipts/
    offline_gate_report.json
    canary_monitor_report.json
    alias_audit.jsonl
  projects/
    eta/
    ranking/
    forecast/
    vision/
  tests/
    test_failed_gate_never_promotes.py
    test_empty_monitor_window_holds.py
    test_unregistered_receipt_never_promotes.py
    test_alias_race_blocks_promotion.py
    test_delayed_labels_block_promotion.py
    test_rollback_restores_manifest.py

Register a Candidate Without Moving Production

01-register-release-registry.py

from dataclasses import asdict, dataclass
import json

@dataclass(frozen=True)
class PromotionPolicy:
    policy_id: str
    required_offline_gates: tuple[str, ...]
    min_canary_windows: int
    max_error_rate: float
    max_p95_latency_ms: int
    max_late_warning_cost_delta: float

@dataclass(frozen=True)
class Release:
    release_id: str
    data_snapshot: str
    feature_version: str
    model_artifact: str
    action_policy_version: str
    promotion_policy_version: str
    previous_release: str | None

@dataclass(frozen=True)
class OfflineReceipt:
    receipt_id: str
    candidate: str
    production_before: str
    promotion_policy_version: str | None
    failed_gates: tuple[str, ...]
    decision: str

POLICIES = {
    "eta-promotion-v1": PromotionPolicy(
        policy_id="eta-promotion-v1",
        required_offline_gates=(
            "schema_valid",
            "no_leakage",
            "critical_slice_pass",
            "cost_improves",
        ),
        min_canary_windows=2,
        max_error_rate=0.01,
        max_p95_latency_ms=250,
        max_late_warning_cost_delta=0.0,
    )
}

registry = {
    "delivery-risk-v0": Release(
        "delivery-risk-v0",
        "carrier-events-through-2026-04-30",
        "eta-features-v1",
        "delay-model-v0",
        "eta-threshold-v1",
        "eta-promotion-v1",
        None,
    ),
    "delivery-risk-v1": Release(
        "delivery-risk-v1",
        "carrier-events-through-2026-05-31",
        "eta-features-v1",
        "delay-model-v1",
        "eta-threshold-v1",
        "eta-promotion-v1",
        "delivery-risk-v0",
    ),
}
aliases = {"production": "delivery-risk-v0"}
offline_receipts: dict[str, OfflineReceipt] = {}

print("registry:", list(registry))
print("production:", aliases["production"])

Output

registry: ['delivery-risk-v0', 'delivery-risk-v1']
production: delivery-risk-v0

02-open-canary-after-offline-gates.py

def open_canary(candidate_id: str, gates: dict[str, bool]) -> OfflineReceipt:
    candidate = registry.get(candidate_id)
    production_before = aliases["production"]
    failed = []
    policy = POLICIES.get(candidate.promotion_policy_version) if candidate else None
    if candidate is None:
        failed.append("candidate_not_registered")
    elif policy is None:
        failed.append("promotion_policy_not_registered")
    else:
        failed.extend(
            gate for gate in policy.required_offline_gates if not gates.get(gate, False)
        )
    if candidate is not None and candidate.previous_release != production_before:
        failed.append("previous_release_mismatch")
    if aliases.get("canary") not in (None, candidate_id):
        failed.append("another_canary_is_active")

    receipt = OfflineReceipt(
        receipt_id=f"offline-receipt-{len(offline_receipts) + 1}",
        candidate=candidate_id,
        production_before=production_before,
        promotion_policy_version=(
            candidate.promotion_policy_version if candidate is not None else None
        ),
        failed_gates=tuple(sorted(failed)),
        decision="hold_offline" if failed else "open_canary",
    )
    offline_receipts[receipt.receipt_id] = receipt
    if receipt.decision == "open_canary":
        aliases["canary"] = candidate_id
    return receipt

bad_gates = {
    "schema_valid": True,
    "no_leakage": True,
    "critical_slice_pass": False,
    "cost_improves": True,
}
good_gates = {**bad_gates, "critical_slice_pass": True}

bad_offline_receipt = open_canary("delivery-risk-v1", bad_gates)
accepted_offline_receipt = open_canary("delivery-risk-v1", good_gates)
print(json.dumps(asdict(bad_offline_receipt), indent=2))
print(json.dumps(asdict(accepted_offline_receipt), indent=2))
print("aliases:", json.dumps(aliases, sort_keys=True))

Output

{
  "receipt_id": "offline-receipt-1",
  "candidate": "delivery-risk-v1",
  "production_before": "delivery-risk-v0",
  "promotion_policy_version": "eta-promotion-v1",
  "failed_gates": [
    "critical_slice_pass"
  ],
  "decision": "hold_offline"
}
{
  "receipt_id": "offline-receipt-2",
  "candidate": "delivery-risk-v1",
  "production_before": "delivery-risk-v0",
  "promotion_policy_version": "eta-promotion-v1",
  "failed_gates": [],
  "decision": "open_canary"
}
aliases: {"canary": "delivery-risk-v1", "production": "delivery-risk-v0"}

Wait for Canary Evidence

03-canary-window-contract.py

@dataclass(frozen=True)
class CanaryWindow:
    window_id: str
    release_id: str
    observed_day: int
    requests: int
    error_rate: float
    p95_latency_ms: int
    delayed_labels_ready: bool
    late_warning_cost_delta: float | None

@dataclass(frozen=True)
class CanaryReceipt:
    receipt_id: str
    candidate: str
    production_before: str
    offline_receipt_id: str
    promotion_policy_version: str | None
    windows: tuple[CanaryWindow, ...]
    failed_gates: tuple[str, ...]
    decision: str

canary_receipts: dict[str, CanaryReceipt] = {}

print("canary policy windows:", POLICIES["eta-promotion-v1"].min_canary_windows)

04-evaluate-canary-evidence.py

def evaluate_canary(
    candidate_id: str,
    offline_receipt_id: str,
    windows: list[CanaryWindow],
) -> CanaryReceipt:
    candidate = registry.get(candidate_id)
    policy = POLICIES.get(candidate.promotion_policy_version) if candidate else None
    offline_receipt = offline_receipts.get(offline_receipt_id)
    failed = []
    abort_reasons = []
    if candidate is None:
        failed.append("candidate_not_registered")
        abort_reasons.append("candidate_not_registered")
    elif policy is None:
        failed.append("promotion_policy_not_registered")
        abort_reasons.append("promotion_policy_not_registered")
    if aliases.get("canary") != candidate_id:
        failed.append("canary_alias_missing")
    if (
        offline_receipt is None
        or offline_receipt.candidate != candidate_id
        or offline_receipt.decision != "open_canary"
    ):
        failed.append("accepted_offline_receipt_missing")
        abort_reasons.append("accepted_offline_receipt_missing")
    if candidate is not None and candidate.previous_release != aliases["production"]:
        failed.append("production_changed_during_canary")
        abort_reasons.append("production_changed_during_canary")
    if policy is not None and len(windows) < policy.min_canary_windows:
        failed.append("observation_window_incomplete")
    if len({window.window_id for window in windows}) != len(windows):
        failed.append("duplicate_window_id")
        abort_reasons.append("duplicate_window_id")
    observed_days = [window.observed_day for window in windows]
    if observed_days != sorted(set(observed_days)):
        failed.append("window_order_invalid")
        abort_reasons.append("window_order_invalid")
    if any(window.release_id != candidate_id for window in windows):
        failed.append("mixed_release_windows")
        abort_reasons.append("mixed_release_windows")
    if any(window.requests <= 0 for window in windows):
        failed.append("request_count_missing")
        abort_reasons.append("request_count_missing")
    if policy is not None and any(window.error_rate > policy.max_error_rate for window in windows):
        failed.append("error_rate_regression")
        abort_reasons.append("error_rate_regression")
    if policy is not None and any(
        window.p95_latency_ms > policy.max_p95_latency_ms for window in windows
    ):
        failed.append("latency_regression")
        abort_reasons.append("latency_regression")

    latest = windows[-1] if windows else None
    if latest is None or not latest.delayed_labels_ready:
        failed.append("delayed_quality_not_ready")
    elif (
        policy is not None
        and (
            latest.late_warning_cost_delta is None
            or latest.late_warning_cost_delta > policy.max_late_warning_cost_delta
        )
    ):
        failed.append("late_warning_cost_regression")
        abort_reasons.append("late_warning_cost_regression")

    decision = (
        "abort_canary"
        if abort_reasons
        else "hold_canary"
        if failed
        else "ready_for_promotion"
    )
    receipt = CanaryReceipt(
        receipt_id=f"canary-receipt-{len(canary_receipts) + 1}",
        candidate=candidate_id,
        production_before=aliases["production"],
        offline_receipt_id=offline_receipt_id,
        promotion_policy_version=(
            candidate.promotion_policy_version if candidate is not None else None
        ),
        windows=tuple(windows),
        failed_gates=tuple(sorted(failed)),
        decision=decision,
    )
    canary_receipts[receipt.receipt_id] = receipt
    if decision == "abort_canary":
        aliases.pop("canary", None)
    return receipt

first_hour = CanaryWindow("first-hour", "delivery-risk-v1", 0, 500, 0.002, 118, False, None)
day_seven = CanaryWindow("day-seven", "delivery-risk-v1", 7, 4200, 0.003, 124, True, -0.08)

empty_receipt = evaluate_canary("delivery-risk-v1", accepted_offline_receipt.receipt_id, [])
early_receipt = evaluate_canary(
    "delivery-risk-v1", accepted_offline_receipt.receipt_id, [first_hour]
)
ready_receipt = evaluate_canary(
    "delivery-risk-v1", accepted_offline_receipt.receipt_id, [first_hour, day_seven]
)
print("empty decision:", empty_receipt.decision)
print("early decision:", early_receipt.decision)
print("ready decision:", ready_receipt.decision)

05-print-canary-receipts.py

print("empty:", json.dumps(asdict(empty_receipt), indent=2))
print("early:", json.dumps(asdict(early_receipt), indent=2))
print("ready:", json.dumps(asdict(ready_receipt), indent=2))

Output

empty: {
  "receipt_id": "canary-receipt-1",
  "candidate": "delivery-risk-v1",
  "production_before": "delivery-risk-v0",
  "offline_receipt_id": "offline-receipt-2",
  "promotion_policy_version": "eta-promotion-v1",
  "windows": [],
  "failed_gates": [
    "delayed_quality_not_ready",
    "observation_window_incomplete"
  ],
  "decision": "hold_canary"
}
early: {
  "receipt_id": "canary-receipt-2",
  "candidate": "delivery-risk-v1",
  "production_before": "delivery-risk-v0",
  "offline_receipt_id": "offline-receipt-2",
  "promotion_policy_version": "eta-promotion-v1",
  "windows": [
    {
      "window_id": "first-hour",
      "release_id": "delivery-risk-v1",
      "observed_day": 0,
      "requests": 500,
      "error_rate": 0.002,
      "p95_latency_ms": 118,
      "delayed_labels_ready": false,
      "late_warning_cost_delta": null
    }
  ],
  "failed_gates": [
    "delayed_quality_not_ready",
    "observation_window_incomplete"
  ],
  "decision": "hold_canary"
}
ready: {
  "receipt_id": "canary-receipt-3",
  "candidate": "delivery-risk-v1",
  "production_before": "delivery-risk-v0",
  "offline_receipt_id": "offline-receipt-2",
  "promotion_policy_version": "eta-promotion-v1",
  "windows": [
    {
      "window_id": "first-hour",
      "release_id": "delivery-risk-v1",
      "observed_day": 0,
      "requests": 500,
      "error_rate": 0.002,
      "p95_latency_ms": 118,
      "delayed_labels_ready": false,
      "late_warning_cost_delta": null
    },
    {
      "window_id": "day-seven",
      "release_id": "delivery-risk-v1",
      "observed_day": 7,
      "requests": 4200,
      "error_rate": 0.003,
      "p95_latency_ms": 124,
      "delayed_labels_ready": true,
      "late_warning_cost_delta": -0.08
    }
  ],
  "failed_gates": [],
  "decision": "ready_for_promotion"
}

Promote Last, Then Prove Rollback

06-promote-with-stored-receipt.py

@dataclass(frozen=True)
class AliasEvent:
    action: str
    from_release: str
    to_release: str
    evidence_receipt_id: str
    reasons: tuple[str, ...]

@dataclass(frozen=True)
class ProductionReceipt:
    receipt_id: str
    window: CanaryWindow
    failed_gates: tuple[str, ...]
    decision: str

audit_events: list[AliasEvent] = []
production_receipts: dict[str, ProductionReceipt] = {}

def promote(candidate_id: str, canary_receipt_id: str) -> dict[str, object]:
    candidate = registry.get(candidate_id)
    canary_receipt = canary_receipts.get(canary_receipt_id)
    failed = []
    if candidate is None:
        failed.append("candidate_not_registered")
    if aliases.get("canary") != candidate_id:
        failed.append("canary_alias_missing")
    if canary_receipt is None:
        failed.append("canary_receipt_not_registered")
    elif canary_receipt.candidate != candidate_id:
        failed.append("canary_receipt_candidate_mismatch")
    elif canary_receipt.decision != "ready_for_promotion":
        failed.append("canary_receipt_not_ready")
    elif (
        candidate is not None
        and canary_receipt.promotion_policy_version != candidate.promotion_policy_version
    ):
        failed.append("canary_receipt_policy_mismatch")
    elif (
        offline_receipts.get(canary_receipt.offline_receipt_id) is None
        or offline_receipts[canary_receipt.offline_receipt_id].decision != "open_canary"
    ):
        failed.append("offline_receipt_not_registered")
    if candidate is not None and candidate.previous_release != aliases["production"]:
        failed.append("production_changed_since_canary_open")
    if canary_receipt is not None and canary_receipt.production_before != aliases["production"]:
        failed.append("production_changed_since_canary_receipt")
    if failed:
        return {"action": "hold_promotion", "reasons": sorted(failed)}

    previous = aliases["production"]
    aliases["previous_production"] = previous
    aliases["production"] = candidate_id
    aliases.pop("canary")
    event = AliasEvent("promote", previous, candidate_id, canary_receipt_id, ())
    audit_events.append(event)
    return asdict(event)

print("promote helper ready")

07-monitor-production-and-rollback.py

def rollback_reasons(window: CanaryWindow) -> list[str]:
    release = registry.get(window.release_id)
    if release is None:
        return ["production_window_release_not_registered"]
    policy = POLICIES.get(release.promotion_policy_version)
    if policy is None:
        return ["production_policy_not_registered"]
    failed = []
    if window.error_rate > policy.max_error_rate:
        failed.append("error_rate_regression")
    if window.p95_latency_ms > policy.max_p95_latency_ms:
        failed.append("latency_regression")
    if not window.delayed_labels_ready:
        failed.append("delayed_quality_not_ready")
    elif (
        window.late_warning_cost_delta is None
        or window.late_warning_cost_delta > policy.max_late_warning_cost_delta
    ):
        failed.append("late_warning_cost_regression")
    return failed

def evaluate_production(window: CanaryWindow) -> ProductionReceipt:
    failed = rollback_reasons(window)
    release_mismatch = window.release_id != aliases["production"]
    if release_mismatch:
        failed.append("production_window_release_mismatch")
    decision = (
        "hold_rollback"
        if release_mismatch
        else "rollback_required"
        if failed
        else "keep_production"
    )
    receipt = ProductionReceipt(
        receipt_id=f"production-receipt-{len(production_receipts) + 1}",
        window=window,
        failed_gates=tuple(sorted(failed)),
        decision=decision,
    )
    production_receipts[receipt.receipt_id] = receipt
    return receipt

def rollback_if_needed(production_receipt_id: str) -> dict[str, object]:
    receipt = production_receipts.get(production_receipt_id)
    if receipt is None:
        return {"action": "hold_rollback", "reason": "production_receipt_not_registered"}
    if receipt.decision == "hold_rollback":
        return {"action": "hold_rollback", "reasons": list(receipt.failed_gates)}
    if receipt.window.release_id != aliases["production"]:
        return {"action": "hold_rollback", "reason": "production_changed_since_monitor_receipt"}
    if receipt.decision == "keep_production":
        return {"action": "keep_production", "release": aliases["production"]}
    if receipt.decision != "rollback_required":
        return {"action": "hold_rollback", "reason": "production_receipt_decision_invalid"}

    previous = aliases.get("previous_production")
    if previous is None or previous not in registry:
        return {"action": "hold_rollback", "reason": "previous_production_not_registered"}
    failed_release = aliases["production"]
    aliases["production"] = previous
    aliases["rollback_from"] = failed_release
    event = AliasEvent(
        "rollback",
        failed_release,
        previous,
        receipt.receipt_id,
        receipt.failed_gates,
    )
    audit_events.append(event)
    return asdict(event)

print("rollback helper ready")

08-attempt-promotion-paths.py

print("fabricated promotion:", promote("delivery-risk-v1", "canary-receipt-missing"))
print("early promotion:", promote("delivery-risk-v1", early_receipt.receipt_id))
print("approved promotion:", promote("delivery-risk-v1", ready_receipt.receipt_id))

09-detect-production-regression.py

degraded = CanaryWindow("production-day-eight", "delivery-risk-v1", 8, 900, 0.004, 130, True, 0.14)
degraded_receipt = evaluate_production(degraded)
rollback_result = rollback_if_needed(degraded_receipt.receipt_id)
print("production decision:", degraded_receipt.decision)
print("rollback action:", rollback_result["action"])

10-audit-alias-movements.py

print("production receipt:", json.dumps(asdict(degraded_receipt), indent=2))
print("rollback:", rollback_result)
print("aliases:", json.dumps(aliases, sort_keys=True))
print("audit:", json.dumps([asdict(event) for event in audit_events], indent=2))

Output

production receipt: {
  "receipt_id": "production-receipt-1",
  "window": {
    "window_id": "production-day-eight",
    "release_id": "delivery-risk-v1",
    "observed_day": 8,
    "requests": 900,
    "error_rate": 0.004,
    "p95_latency_ms": 130,
    "delayed_labels_ready": true,
    "late_warning_cost_delta": 0.14
  },
  "failed_gates": [
    "late_warning_cost_regression"
  ],
  "decision": "rollback_required"
}
rollback: {'action': 'rollback', 'from_release': 'delivery-risk-v1', 'to_release': 'delivery-risk-v0', 'evidence_receipt_id': 'production-receipt-1', 'reasons': ('late_warning_cost_regression',)}
aliases: {"previous_production": "delivery-risk-v0", "production": "delivery-risk-v0", "rollback_from": "delivery-risk-v1"}
audit: [
  {
    "action": "promote",
    "from_release": "delivery-risk-v0",
    "to_release": "delivery-risk-v1",
    "evidence_receipt_id": "canary-receipt-3",
    "reasons": []
  },
  {
    "action": "rollback",
    "from_release": "delivery-risk-v1",
    "to_release": "delivery-risk-v0",
    "evidence_receipt_id": "production-receipt-1",
    "reasons": [
      "late_warning_cost_regression"
    ]
  }
]

Join Fast and Delayed Monitoring

Live checks differ by product, but the promotion controller handles the same categories:

Gate type	ETA	Ranking	Forecast	Vision
immediate data health	scan freshness	eligible candidate supply	latest counts loaded	photo quality
immediate service health	latency/errors	scoring latency	forecast API availability	image scoring latency
delayed quality	late warning cost	purchase/return experiment	MAE and peak residual	reviewer-confirmed damage
rollback event	stale warning spike	blocked listing exposure	broken alert flood	unsupported escalations

For scoring systems with delayed labels, canary monitoring should pause wider promotion until enough outcomes arrive. A model that hasn't failed yet isn't the same as a model that has passed.

Practice: break the release controller

Run the runnable examples again after each mutation. Predict which receipt or alias changes before reading output.

Change the delivery-risk-v1 constructor's previous_release from "delivery-risk-v0" to "delivery-risk-v-missing".
Set good_gates["critical_slice_pass"] back to False.
Pass [day_seven, first_hour] to one evaluate_canary call. Confirm that the receipt records window_order_invalid.
Change the day_seven constructor's delayed_labels_ready argument from True to False.
Change the day_seven constructor's late_warning_cost_delta argument from -0.08 to 0.05.
Replace ready_receipt.receipt_id with "canary-receipt-missing" in the approved promotion call.
Before approved promotion, set aliases["production"] = "delivery-risk-v1" to simulate an out-of-band alias move. Confirm that promotion holds, then reset it to "delivery-risk-v0".
Change the degraded constructor's late_warning_cost_delta argument from 0.14 to -0.01.
Change the degraded constructor's release ID from "delivery-risk-v1" to "delivery-risk-v0". Confirm that monitoring evidence for a different release can't move the current alias.
Replace degraded_receipt.receipt_id with "production-receipt-missing" in the rollback call. Which authorization check holds the alias?

Practice answer sketches

Submission checklist

Artifact	Acceptance condition
release schema	identifies data, features, model, action policy, promotion policy, and previous release
registry	contains immutable stable and candidate releases
append-only receipts	record why each candidate passed, held, aborted, promoted, or rolled back
alias promotion code	fetches stored receipt and rechecks production before moving alias
monitor policy	defines canary pause, promote, abort, rollback
tests	execute empty-window, failed-gate, fabricated-receipt, alias-race, and rollback paths

Mastery check

Evaluation rubric

Artifact	Strong submission demonstrates
reproducible run	versioned data, features, model artifact, threshold policy, and evaluation evidence
controlled promotion	candidate alias, automated gates, canary criteria, and explicit production move
recovery	monitoring tied to actions, rollback trigger, and deployable prior release

Common failures

Symptom	Cause	Fix
Retrain job changes behavior with no review	training and promotion merged	separate candidate registry from aliases
Rollback restores weights but not threshold	policy omitted from release bundle	version complete release tuple
Empty canary window crashes controller	latest observation indexed before evidence exists	return a stored hold receipt for missing evidence
Canary promotes before outcomes exist	only latency checked	require delayed quality window
Fabricated `ready` dictionary promotes candidate	alias mover trusts caller-owned state	fetch immutable receipt by stored ID
Production changed after canary opened	rollback base checked only once	recheck current alias before promotion
Old monitoring window rolls back current release	rollback trusts reasons without release identity	verify monitor window matches production alias
Caller fabricates rollback reasons	alias mover accepts an unregistered reason list	evaluate monitoring once, store receipt, and fetch it by ID
Rollback points at an unknown state	candidate omits previous release	verify rollback target before opening canary traffic
Production changes but incident review has no history	alias moves aren't audited	append promotion and rollback events

Next Step

Continue to Capstone: Document QA

PreviousCapstone: Image Damage Classifier

Share this article

X Facebook LinkedIn Bluesky Reddit Hacker News Email

References

Hidden Technical Debt in Machine Learning Systems.

Sculley et al. · 2015

MLOps: Continuous Delivery and Automation Pipelines in Machine Learning.

Google Cloud. · 2026 · Official documentation

Discussion

Questions and insights from fellow learners.

Discussion loads when you reach this section.

Capstone: Production ML Pipeline

Define the Shared Release Tuple

Build the Portfolio Repository

Register a Candidate Without Moving Production

Wait for Canary Evidence

Promote Last, Then Prove Rollback

Join Fast and Delayed Monitoring

Practice: break the release controller

Practice answer sketches

Submission checklist

Mastery check

Evaluation rubric

Common failures

Mastery Check

Discussion

Capstone: Production ML Pipeline

Define the Shared Release Tuple

Build the Portfolio Repository

Register a Candidate Without Moving Production

Wait for Canary Evidence

Promote Last, Then Prove Rollback

Join Fast and Delayed Monitoring

Practice: break the release controller

Practice answer sketches

Submission checklist

Mastery check

Evaluation rubric

Common failures

Mastery Check

Discussion

Capstone: Production ML Pipeline

Define the Shared Release Tuple

Build the Portfolio Repository

Register a Candidate Without Moving Production

Wait for Canary Evidence

Promote Last, Then Prove Rollback

Join Fast and Delayed Monitoring

Practice: break the release controller

Practice answer sketches

Why does a mismatched previous_release block canary traffic?

What happens when the critical offline slice fails?

Why do missing delayed labels and a positive cost delta produce different canary decisions?

Why can't promotion accept any dictionary whose decision field says ready_for_promotion?

Why does rollback fetch a stored production receipt instead of accepting caller-supplied reasons?

What happens when the degraded production delta becomes -0.01?

Submission checklist

Mastery check

Why is continuous training allowed to create a candidate but not to replace production automatically?

What makes the release tuple more useful than storing model weights alone?

Why may a canary need to wait for delayed labels?

Evaluation rubric

Common failures

Mastery Check

Discussion

Capstone: Production ML Pipeline

Define the Shared Release Tuple

Build the Portfolio Repository

Register a Candidate Without Moving Production

Wait for Canary Evidence

Promote Last, Then Prove Rollback

Join Fast and Delayed Monitoring

Practice: break the release controller

Practice answer sketches

Why does a mismatched previous_release block canary traffic?

What happens when the critical offline slice fails?

Why do missing delayed labels and a positive cost delta produce different canary decisions?

Why can't promotion accept any dictionary whose decision field says ready_for_promotion?

Why does rollback fetch a stored production receipt instead of accepting caller-supplied reasons?

What happens when the degraded production delta becomes -0.01?

Submission checklist

Mastery check

Why is continuous training allowed to create a candidate but not to replace production automatically?

What makes the release tuple more useful than storing model weights alone?

Why may a canary need to wait for delayed labels?

Evaluation rubric

Common failures

Mastery Check

Discussion