LearnPortfolio CapstonesCapstone: Image Damage Classifier

👁️HardMultimodal Models

Capstone: Image Damage Classifier

Ship a damaged-package photo triage service with quality checks, slice evaluation, serving bundles, and review monitoring.

17 min read

Learning path

Step 82 of 158 in the full curriculum

Capstone: Demand Forecasting Capstone: Production ML Pipeline

Shipment rows, ranked items, and warehouse time series all fit into tables. A customer return adds a new input type: a photo of a package that may be crushed, torn, blurred, dark, or unrelated to the order.

Earlier, you traced a convolutional neural network (CNN) over a damaged-package image patch. This capstone turns that spatial reasoning into a product: an image triage endpoint that flags likely visible damage, rejects unusable photos, preserves evidence for human review, and never turns an uncertain image score directly into a refund.

Quality-first routing evidence for seven damaged-package test photos. A brightness-versus-blur chart marks the usable region at brightness at least 0.20 and blur at most 0.45. R-406-a has a high damage score of 0.94 but is too dark, and R-409-a scores 0.81 but has no visible package, so both request a new photo. Usable R-404-a scores 0.83 and enters priority review. Four guideline-matched specialist outcomes form a case-level confusion matrix with one true positive, one false positive, one false negative, and one true negative, giving precision and recall of 0.50. Customer-phone recapture rate is 1 of 4, warehouse-camera rate is 1 of 3, all 19 gates pass, and damage-cnn-v1 advances only to specialist shadow review with damage-cnn-v0 as rollback. — High damage scores don't override dark or missing-package evidence; only quality-passing photos enter case-level review metrics and a versioned shadow receipt.

Define the photo decision first

ShopFlow receives return photos from customers and warehouse intake stations. The useful product question isn't "does the model recognize every defect?" It's: which photo should a specialist inspect first, and when is the photo too weak to support any decision?

Use three operational outcomes:

Action	Evidence	Product behavior
`request_new_photo`	image is too blurred, dark, or incomplete	ask for a clearer upload before assessing damage
`normal_review`	usable image, low damage score	keep ordinary return workflow
`priority_damage_review`	usable image, high damage score	surface to specialist with photo and score trace

The classifier isn't a refund policy. Product eligibility still depends on order ownership, item type, return window, and specialist judgment. This separation prevents a shadow or reflection in a photo from issuing a costly action.

A model card should state the intended use, input constraints, decision threshold, evaluated slices, and known failure cases. Model cards were proposed as structured reports for exactly this type of deployed-model context: users need more than a metric without its operating conditions.^{[1]Reference 1Model Cards for Model Reportinghttps://arxiv.org/abs/1810.03993}

Diagram showing Photo manifest shipment groups + time, Quality check visible + blur + light, Damage scorer versioned threshold, and Immutable route trace policy + thresholds. — Photo manifest shipment groups + time, Quality check visible + blur + light, Damage scorer versioned threshold, and Immutable route trace policy + thresholds.

Build a dataset that can't leak

For tabular models, leakage may be a future delivery timestamp. For photos, leakage often hides in nearly identical pixels. A customer may upload three bursts of the same crushed box. A warehouse may photograph one parcel from four angles. If related images land in both train and test sets, the model can memorize one package rather than generalize to new damage.

Your manifest should contain:

Field	Why it matters
`case_id` and `capture_day`	group all photos for one physical package and preserve time ordering
`source`	separate customer phone uploads from warehouse inspection cameras
`quality_label`	distinguish unusable evidence from visible damage
`damage_label`	record specialist-confirmed visible damage only on usable photos
`split`	hold out later shipments, never random photos from the same case
`reviewer_id` and `guideline_version`	audit disagreement or changed label definitions

Evaluate at least daylight versus dark uploads, customer versus warehouse source, packaging type, and visible-defect size. A global score can hide the exact failure that matters: small tears disappearing in dark phone images.

Use the CNN learned earlier as a baseline, then fine-tune a pretrained image encoder only if you record its preprocessing and measure it under the same split. A later deep-dive explains Vision Transformer image encoders; this capstone doesn't require that architecture.^{[2]Reference 2An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.https://arxiv.org/abs/2010.11929}

Freeze a grouped photo manifest

Start with the dataset boundary. Multiple photos from one physical package belong in one split even when filenames differ. Keep time direction intact too: later return cases should test a model trained on earlier cases.

The local manifest below is small enough to inspect line by line. This historical evaluation fixture contains frozen specialist labels, reviewer IDs, and guideline versions for dataset audit. Damage labels appear only when a specialist judged the image usable. Unusable photos keep confirmed_damage=None because poor evidence shouldn't become supervision. Candidate scores arrive in a separate object, so route code can't quietly read labels as model output.

01-freeze-grouped-photo-manifest.py

from collections import Counter, defaultdict
from dataclasses import dataclass
import json

@dataclass(frozen=True)
class PhotoManifestRow:
    case_id: str
    photo_id: str
    capture_day: int
    split: str
    source: str
    packaging: str
    quality_label: str
    confirmed_damage: bool | None
    reviewer_id: str
    guideline_version: str

PHOTOS = [
    PhotoManifestRow("R-401", "R-401-a", 1, "train", "customer_phone", "corrugated", "usable", True, "S-12", "visible-damage-v1"),
    PhotoManifestRow("R-401", "R-401-b", 2, "train", "customer_phone", "corrugated", "usable", True, "S-12", "visible-damage-v1"),
    PhotoManifestRow("R-402", "R-402-a", 3, "validation", "customer_phone", "corrugated", "unusable", None, "S-08", "visible-damage-v1"),
    PhotoManifestRow("R-403", "R-403-a", 4, "validation", "warehouse_camera", "mailer", "usable", False, "S-08", "visible-damage-v1"),
    PhotoManifestRow("R-404", "R-404-a", 5, "test", "customer_phone", "mailer", "usable", True, "S-12", "visible-damage-v1"),
    PhotoManifestRow("R-404", "R-404-b", 6, "test", "customer_phone", "mailer", "usable", True, "S-12", "visible-damage-v1"),
    PhotoManifestRow("R-405", "R-405-a", 7, "test", "warehouse_camera", "corrugated", "usable", False, "S-08", "visible-damage-v1"),
    PhotoManifestRow("R-406", "R-406-a", 8, "test", "customer_phone", "corrugated", "unusable", None, "S-12", "visible-damage-v1"),
    PhotoManifestRow("R-407", "R-407-a", 9, "test", "customer_phone", "corrugated", "usable", False, "S-12", "visible-damage-v1"),
    PhotoManifestRow("R-408", "R-408-a", 10, "test", "warehouse_camera", "corrugated", "usable", True, "S-08", "visible-damage-v1"),
    PhotoManifestRow("R-409", "R-409-a", 11, "test", "warehouse_camera", "corrugated", "unusable", None, "S-08", "visible-damage-v1"),
]

print("photos:", len(PHOTOS))
print("cases:", len({photo.case_id for photo in PHOTOS}))

Output

photos: 11
cases: 9

02-audit-grouped-split-manifest.py

splits_by_case = defaultdict(set)
for photo in PHOTOS:
    splits_by_case[photo.case_id].add(photo.split)

split_days = {
    split: [photo.capture_day for photo in PHOTOS if photo.split == split]
    for split in ("train", "validation", "test")
}
manifest_checks = {
    "case_groups_do_not_cross_splits": all(len(splits) == 1 for splits in splits_by_case.values()),
    "time_ordered_splits": (
        all(split_days.values())
        and max(split_days["train"]) < min(split_days["validation"])
        and max(split_days["validation"]) < min(split_days["test"])
    ),
    "usable_labels_complete": all(photo.confirmed_damage is not None for photo in PHOTOS if photo.quality_label == "usable"),
    "unusable_labels_abstain": all(photo.confirmed_damage is None for photo in PHOTOS if photo.quality_label == "unusable"),
    "label_lineage_recorded": all(photo.reviewer_id and photo.guideline_version for photo in PHOTOS),
}

print("split photo counts:", dict(Counter(photo.split for photo in PHOTOS)))
print("R-401 splits:", sorted(splits_by_case["R-401"]))
print(json.dumps(manifest_checks, indent=2))

Output

split photo counts: {'train': 2, 'validation': 2, 'test': 7}
R-401 splits: ['train']
{
  "case_groups_do_not_cross_splits": true,
  "time_ordered_splits": true,
  "usable_labels_complete": true,
  "unusable_labels_abstain": true,
  "label_lineage_recorded": true
}

The fixture uses explicit splits so the invariant stays visible. A larger pipeline can use a group-aware splitter, then freeze and audit the resulting manifest. The important claim isn't that one splitter solves every dataset: no physical case may cross evaluation boundaries, and later cases remain later.

Encode quality-first review traces

The model endpoint should receive a preprocessing result and a damage score, then choose a review route. Quality checks run before the damage threshold. Otherwise a confidently scored blur, dark frame, or unrelated object can create an unsupported escalation.

The score fixture below is separate from the manifest. Labels stay available for offline evaluation, but the router never reads them. A production system may run its cheap quality checks before an expensive damage model; this compact fixture keeps both outputs so you can test that an unsupported high damage score still abstains.

03-load-model-output-bundle.py

@dataclass(frozen=True)
class ModelOutput:
    damage_probability: float
    blur_score: float
    brightness: float
    box_visible: bool

MODEL_OUTPUTS = {
    "R-401-a": ModelOutput(0.91, 0.12, 0.66, True),
    "R-401-b": ModelOutput(0.88, 0.10, 0.70, True),
    "R-402-a": ModelOutput(0.93, 0.71, 0.51, True),
    "R-403-a": ModelOutput(0.18, 0.08, 0.75, True),
    "R-404-a": ModelOutput(0.83, 0.10, 0.64, True),
    "R-404-b": ModelOutput(0.79, 0.14, 0.61, True),
    "R-405-a": ModelOutput(0.32, 0.11, 0.68, True),
    "R-406-a": ModelOutput(0.94, 0.12, 0.12, True),
    "R-407-a": ModelOutput(0.73, 0.09, 0.65, True),
    "R-408-a": ModelOutput(0.63, 0.07, 0.72, True),
    "R-409-a": ModelOutput(0.81, 0.08, 0.74, False),
}

BUNDLE = {
    "bundle_id": "damage-cnn-v1",
    "previous_bundle": "damage-cnn-v0",
    "preprocessing": "parcel-rgb-224-center-crop-v1",
    "label_guideline": "visible-damage-v1",
    "route_policy": "quality-first-review-v1",
    "damage_threshold": 0.70,
    "max_blur": 0.45,
    "min_brightness": 0.20,
}

print("bundle:", BUNDLE["bundle_id"], "threshold=", BUNDLE["damage_threshold"])

04-quality-first-route.py

def route(photo: PhotoManifestRow, model_output: ModelOutput) -> dict[str, object]:
    if not model_output.box_visible:
        action, reason = "request_new_photo", "package_not_visible"
    elif model_output.blur_score > BUNDLE["max_blur"] or model_output.brightness < BUNDLE["min_brightness"]:
        action, reason = "request_new_photo", "image_quality_gate"
    elif model_output.damage_probability >= BUNDLE["damage_threshold"]:
        action, reason = "priority_damage_review", "damage_threshold"
    else:
        action, reason = "normal_review", "below_threshold"

    return {
        "route_id": f"{BUNDLE['bundle_id']}:{photo.photo_id}",
        "photo_id": photo.photo_id,
        "case_id": photo.case_id,
        "routed_day": photo.capture_day,
        "source": photo.source,
        "packaging": photo.packaging,
        "bundle_id": BUNDLE["bundle_id"],
        "previous_bundle": BUNDLE["previous_bundle"],
        "preprocessing": BUNDLE["preprocessing"],
        "label_guideline": BUNDLE["label_guideline"],
        "route_policy": BUNDLE["route_policy"],
        "damage_threshold": BUNDLE["damage_threshold"],
        "max_blur": BUNDLE["max_blur"],
        "min_brightness": BUNDLE["min_brightness"],
        "damage_probability": model_output.damage_probability,
        "blur_score": model_output.blur_score,
        "brightness": model_output.brightness,
        "box_visible": model_output.box_visible,
        "action": action,
        "reason": reason,
    }

print("route keys:", len(route(PHOTOS[0], MODEL_OUTPUTS["R-401-a"])))

05-route-test-photos.py

test_photos = [photo for photo in PHOTOS if photo.split == "test"]
photo_by_id = {photo.photo_id: photo for photo in PHOTOS}
traces = [route(photo, MODEL_OUTPUTS[photo.photo_id]) for photo in test_photos]
trace_by_photo = {trace["photo_id"]: trace for trace in traces}
unsafe_priority_routes = [
    trace["photo_id"]
    for photo, trace in zip(test_photos, traces)
    if trace["action"] == "priority_damage_review" and photo.quality_label != "usable"
]

for photo_id in ("R-404-a", "R-406-a", "R-409-a"):
    trace = trace_by_photo[photo_id]
    print(photo_id, trace["action"], trace["reason"], f"score={trace['damage_probability']}")
print("route counts:", dict(Counter(trace["action"] for trace in traces)))
print("unsafe priority routes:", unsafe_priority_routes)

Output

R-404-a priority_damage_review damage_threshold score=0.83
R-406-a request_new_photo image_quality_gate score=0.94
R-409-a request_new_photo package_not_visible score=0.81
route counts: {'priority_damage_review': 3, 'normal_review': 2, 'request_new_photo': 2}
unsafe priority routes: []

Cases R-406-a and R-409-a are the important failure tests. Both have high damage scores. Neither score counts as usable evidence because quality checks fail first. The endpoint asks for another photo instead of escalating an unsupported claim.

Package the Vision Service

Submit an inspectable repository, not a notebook screenshot:

text

damage-vision-service/
  data/
    label_guidelines.md
    photo_manifest.parquet
    split_manifest.json
  model/
    train_cnn_baseline.py
    evaluate_slices.py
    model_card.md
  service/
    preprocess.py
    route_review.py
    trace_schema.json
  monitoring/
    input_quality_report.py
    delayed_review_outcomes.py
    specialist_shadow_receipt.py
  tests/
    test_case_groups_do_not_cross_splits.py
    test_blurry_photo_never_escalates.py
    test_later_outcomes_join_routes.py
    test_empty_review_window_holds.py
    test_route_trace_is_versioned.py
    test_previous_bundle_required.py

The serving bundle must pin image resize and crop behavior, color normalization, model weights, label version, route policy, damage threshold, quality-gate thresholds, and previous-bundle pointer. A change from center crop to full-frame resize may change whether a torn corner remains visible; it's a model behavior change even when weights remain constant.

The routing cells emit that trace shape directly. They let a reviewer reconstruct the route:

Response field	Example
bundle and preprocessing	`damage-cnn-v1`, `parcel-rgb-224-center-crop-v1`
rollback and label contract	`damage-cnn-v0`, `visible-damage-v1`
quality values	blur `0.12`, brightness `0.66`, box visible `true`
score and action policy	damage `0.91`, threshold `0.70`, blur maximum `0.45`, brightness minimum `0.20`
route	`priority_damage_review`
human outcome later	`confirmed_damage` or `not_supported`

Join delayed outcomes before promotion

Photo models drift when the image source changes. A new warehouse camera, winter lighting, a mobile upload compressor, or new packaging graphics can alter pixels before a confirmed-damage label exists.

Separate immediate checks from delayed quality:

Window	Monitor	Trigger
immediate	unreadable image rate, brightness, blur, missing package, latency	investigate capture path or fail to manual intake
delayed	specialist-confirmed precision, missed visible damage, route rate by source and packaging	hold promotion or create retraining candidate
safety review	unsupported escalations, policy actions attempted without specialist approval	rollback and audit workflow

Google Cloud's MLOps guidance treats serving, monitoring, validation, metadata, and continuous training as connected stages rather than a one-time deploy step.^{[3]Reference 3MLOps: Continuous Delivery and Automation Pipelines in Machine Learning.https://docs.cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning} Apply that same discipline here: a change in image quality creates an investigation or candidate run, never an automatic production replacement.

The final local receipt appends specialist outcomes only after routing, then joins each outcome to a physical case. Each delayed outcome also carries its reviewer and label guideline, so precision and recall can't silently mix definitions. It measures review precision and recall once per usable test case, records immediate request rates by source, proves unusable photos abstain, and keeps the previous bundle beside the candidate. When one package has multiple usable photos, the case escalates if any usable photo crosses the priority threshold. These tiny counts teach the contract; production promotion still needs larger slices and shadow traffic.

06-append-specialist-outcomes.py

@dataclass(frozen=True)
class SpecialistOutcome:
    case_id: str
    reviewed_day: int
    confirmed_damage: bool
    reviewer_id: str
    guideline_version: str

SPECIALIST_OUTCOMES = [
    SpecialistOutcome("R-404", 12, True, "S-12", "visible-damage-v1"),
    SpecialistOutcome("R-405", 12, False, "S-08", "visible-damage-v1"),
    SpecialistOutcome("R-407", 12, False, "S-12", "visible-damage-v1"),
    SpecialistOutcome("R-408", 12, True, "S-08", "visible-damage-v1"),
]

usable_test_case_ids = {
    photo.case_id
    for photo in test_photos
    if photo.quality_label == "usable"
}
traces_by_case = defaultdict(list)
for trace in traces:
    traces_by_case[trace["case_id"]].append(trace)

print("usable test cases:", sorted(usable_test_case_ids))
print("specialist outcomes:", len(SPECIALIST_OUTCOMES))

07-aggregate-case-actions.py

def aggregate_usable_case_action(case_id: str) -> str:
    usable_traces = [
        trace
        for trace in traces_by_case[case_id]
        if photo_by_id[trace["photo_id"]].quality_label == "usable"
    ]
    if not usable_traces:
        raise ValueError(f"no usable traces for {case_id}")
    if any(trace["action"] == "priority_damage_review" for trace in usable_traces):
        return "priority_damage_review"
    return "normal_review"

case_actions = {
    case_id: aggregate_usable_case_action(case_id)
    for case_id in sorted(usable_test_case_ids)
}
outcome_by_case = {outcome.case_id: outcome for outcome in SPECIALIST_OUTCOMES}
joined_outcomes = [
    (case_id, case_actions[case_id], outcome_by_case[case_id].confirmed_damage)
    for case_id in sorted(case_actions.keys() & outcome_by_case.keys())
]

print("case_actions:", case_actions)
print("joined usable cases:", len(joined_outcomes))

08-measure-delayed-quality.py

def rate_or_none(numerator: int, denominator: int) -> float | None:
    return round(numerator / denominator, 3) if denominator else None

true_positives = sum(
    action == "priority_damage_review" and confirmed_damage
    for _, action, confirmed_damage in joined_outcomes
)
false_positives = sum(
    action == "priority_damage_review" and not confirmed_damage
    for _, action, confirmed_damage in joined_outcomes
)
false_negatives = sum(
    action != "priority_damage_review" and confirmed_damage
    for _, action, confirmed_damage in joined_outcomes
)
delayed_quality = {
    "priority_precision": rate_or_none(true_positives, true_positives + false_positives),
    "priority_recall": rate_or_none(true_positives, true_positives + false_negatives),
    "reviewed_usable_cases": len(joined_outcomes),
}

source_slices = {
    source: {
        "photos": len(rows),
        "request_new_photo_rate": round(
            sum(trace["action"] == "request_new_photo" for trace in rows) / len(rows),
            3,
        ),
    }
    for source in sorted({trace["source"] for trace in traces})
    for rows in [[trace for trace in traces if trace["source"] == source]]
}

print("delayed_quality:", delayed_quality)
print("source_slices:", source_slices)

09-release-gate-checklist.py

required_trace_fields = {
    "route_id", "photo_id", "case_id", "routed_day", "source", "packaging",
    "bundle_id", "previous_bundle", "preprocessing", "label_guideline",
    "route_policy", "damage_threshold", "max_blur", "min_brightness",
    "damage_probability", "blur_score", "brightness", "box_visible",
    "action", "reason",
}
outcome_case_ids = [outcome.case_id for outcome in SPECIALIST_OUTCOMES]
release_gates = {
    **manifest_checks,
    "route_ids_unique": len({trace["route_id"] for trace in traces}) == len(traces),
    "all_test_photos_routed": len(traces) == len(test_photos),
    "unusable_photos_abstain": all(
        trace_by_photo[photo.photo_id]["action"] == "request_new_photo"
        for photo in test_photos
        if photo.quality_label == "unusable"
    ),
    "unsafe_priority_routes_absent": not unsafe_priority_routes,
    "route_traces_replayable": all(required_trace_fields <= trace.keys() for trace in traces),
    "specialist_outcome_case_ids_unique": len(set(outcome_case_ids)) == len(outcome_case_ids),
    "specialist_outcomes_join_usable_test_cases": set(outcome_case_ids) <= usable_test_case_ids,
    "usable_test_cases_have_specialist_outcomes": usable_test_case_ids <= set(outcome_case_ids),
    "specialist_outcomes_arrive_after_capture": all(
        outcome.reviewed_day > max(
            photo.capture_day for photo in test_photos if photo.case_id == outcome.case_id
        )
        for outcome in SPECIALIST_OUTCOMES
        if outcome.case_id in usable_test_case_ids
    ),
    "specialist_outcome_label_contract_matches_bundle": all(
        outcome.reviewer_id
        and outcome.guideline_version == BUNDLE["label_guideline"]
        for outcome in SPECIALIST_OUTCOMES
    ),
    "priority_precision_evidence_at_least_0_50": (
        delayed_quality["priority_precision"] is not None
        and delayed_quality["priority_precision"] >= 0.50
    ),
    "priority_recall_evidence_at_least_0_50": (
        delayed_quality["priority_recall"] is not None
        and delayed_quality["priority_recall"] >= 0.50
    ),
    "source_slices_recorded": set(source_slices) == {"customer_phone", "warehouse_camera"},
    "rollback_pointer_recorded": bool(BUNDLE["previous_bundle"]),
}

print("release_gates_pass:", all(release_gates.values()))

10-publish-specialist-shadow-receipt.py

receipt = {
    "candidate_bundle": BUNDLE["bundle_id"],
    "previous_bundle": BUNDLE["previous_bundle"],
    "preprocessing": BUNDLE["preprocessing"],
    "label_guideline": BUNDLE["label_guideline"],
    "route_policy": BUNDLE["route_policy"],
    "route_traces": len(traces),
    "later_specialist_outcomes": len(SPECIALIST_OUTCOMES),
    "joined_usable_cases": len(joined_outcomes),
    "case_actions": case_actions,
    "delayed_quality": delayed_quality,
    "source_slices": source_slices,
    "release_gates": release_gates,
    "candidate_decision": "candidate_for_specialist_shadow_review" if all(release_gates.values()) else "hold",
}

print(json.dumps(receipt, indent=2))

Output

{
  "candidate_bundle": "damage-cnn-v1",
  "previous_bundle": "damage-cnn-v0",
  "preprocessing": "parcel-rgb-224-center-crop-v1",
  "label_guideline": "visible-damage-v1",
  "route_policy": "quality-first-review-v1",
  "route_traces": 7,
  "later_specialist_outcomes": 4,
  "joined_usable_cases": 4,
  "case_actions": {
    "R-404": "priority_damage_review",
    "R-405": "normal_review",
    "R-407": "priority_damage_review",
    "R-408": "normal_review"
  },
  "delayed_quality": {
    "priority_precision": 0.5,
    "priority_recall": 0.5,
    "reviewed_usable_cases": 4
  },
  "source_slices": {
    "customer_phone": {
      "photos": 4,
      "request_new_photo_rate": 0.25
    },
    "warehouse_camera": {
      "photos": 3,
      "request_new_photo_rate": 0.333
    }
  },
  "release_gates": {
    "case_groups_do_not_cross_splits": true,
    "time_ordered_splits": true,
    "usable_labels_complete": true,
    "unusable_labels_abstain": true,
    "label_lineage_recorded": true,
    "route_ids_unique": true,
    "all_test_photos_routed": true,
    "unusable_photos_abstain": true,
    "unsafe_priority_routes_absent": true,
    "route_traces_replayable": true,
    "specialist_outcome_case_ids_unique": true,
    "specialist_outcomes_join_usable_test_cases": true,
    "usable_test_cases_have_specialist_outcomes": true,
    "specialist_outcomes_arrive_after_capture": true,
    "specialist_outcome_label_contract_matches_bundle": true,
    "priority_precision_evidence_at_least_0_50": true,
    "priority_recall_evidence_at_least_0_50": true,
    "source_slices_recorded": true,
    "rollback_pointer_recorded": true
  },
  "candidate_decision": "candidate_for_specialist_shadow_review"
}

candidate_for_specialist_shadow_review is narrower than launch approval. The receipt says this frozen bundle deserves comparison beside current production routing. It doesn't claim that four usable local cases prove every camera, lighting condition, package type, or damage shape.

Practice: break the vision contract

Use the runnable examples as a release harness. Change one condition at a time, predict the failure, then rerun the examples.

Change R-401-b split from train to validation. Which dataset gate fails?
Change R-406-a brightness in MODEL_OUTPUTS from 0.12 to 0.30. Why does this expose a quality-detector failure rather than prove the escalation is safe?
Remove damage_threshold from the response trace. Which receipt gate fails?
Change R-408-a damage probability in MODEL_OUTPUTS from 0.63 to 0.75. Which delayed metrics improve?
Set BUNDLE["previous_bundle"] = "". Why is shadow evidence no longer promotion-ready?
Replace SPECIALIST_OUTCOMES with an empty list. Why do evidence gates fail instead of crashing or passing?
Add SpecialistOutcome("R-999", 12, True, "S-12", "visible-damage-v1"). Which join gate fails?
Change R-408 specialist outcome to guideline visible-damage-v2. Which provenance gate fails?

Practice answer sketches

Mastery check

Evaluation rubric

Artifact	Strong submission demonstrates
dataset contract	case-grouped time split, quality labels, damage labels, and reviewed slices
service	versioned preprocessing and safe quality-first routing with abstention
operations	immutable route traces, joined delayed outcomes, model card, source slices, shadow review, and rollback

Common failures

Symptom	Cause	Fix
Holdout score is unrealistically high	photos from one case crossed splits	group by physical case and time
Blurry image triggers damage escalation	score evaluated before quality	gate evidence quality first
Specialist metric changes after a later edit	route trace was overwritten with outcome data	append specialist outcomes and join by case ID
Precision shifts after guideline update	delayed outcomes omit reviewer or label version	bind each outcome to reviewer and bundle label contract
Multi-photo case result depends on row order	case aggregation policy is implicit	define deterministic case-level priority routing
New camera changes decisions silently	preprocessing and source drift untracked	log source/quality slices and version bundle
Candidate advances without rollback target	receipt omits previous alias	publish immutable candidate and rollback pointer together

Next Step

Continue to Capstone: Production ML Pipeline

You have shipped tabular, ranking, forecasting, and vision artifacts with their own action gates. Next you'll manage them under one validated promotion, monitoring, and rollback workflow.

PreviousCapstone: Demand Forecasting

Share this article

X Facebook LinkedIn Bluesky Reddit Hacker News Email

References

Model Cards for Model Reporting

Mitchell, M., Wu, S., Zaldivar, A., et al. · 2019 · FAT* 2019

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Dosovitskiy, A., et al. · 2020 · ICLR 2021

MLOps: Continuous Delivery and Automation Pipelines in Machine Learning.

Google Cloud. · 2026 · Official documentation

Discussion

Questions and insights from fellow learners.

Discussion loads when you reach this section.

Back to Topics

LearnPortfolio CapstonesCapstone: Image Damage Classifier

👁️HardMultimodal Models

Capstone: Image Damage Classifier

Ship a damaged-package photo triage service with quality checks, slice evaluation, serving bundles, and review monitoring.

17 min read

Learning path

Step 82 of 158 in the full curriculum

Capstone: Demand Forecasting Capstone: Production ML Pipeline

Define the photo decision first

Use three operational outcomes:

Action	Evidence	Product behavior
`request_new_photo`	image is too blurred, dark, or incomplete	ask for a clearer upload before assessing damage
`normal_review`	usable image, low damage score	keep ordinary return workflow
`priority_damage_review`	usable image, high damage score	surface to specialist with photo and score trace

Build a dataset that can't leak

Your manifest should contain:

Field	Why it matters
`case_id` and `capture_day`	group all photos for one physical package and preserve time ordering
`source`	separate customer phone uploads from warehouse inspection cameras
`quality_label`	distinguish unusable evidence from visible damage
`damage_label`	record specialist-confirmed visible damage only on usable photos
`split`	hold out later shipments, never random photos from the same case
`reviewer_id` and `guideline_version`	audit disagreement or changed label definitions

Freeze a grouped photo manifest

01-freeze-grouped-photo-manifest.py

from collections import Counter, defaultdict
from dataclasses import dataclass
import json

@dataclass(frozen=True)
class PhotoManifestRow:
    case_id: str
    photo_id: str
    capture_day: int
    split: str
    source: str
    packaging: str
    quality_label: str
    confirmed_damage: bool | None
    reviewer_id: str
    guideline_version: str

PHOTOS = [
    PhotoManifestRow("R-401", "R-401-a", 1, "train", "customer_phone", "corrugated", "usable", True, "S-12", "visible-damage-v1"),
    PhotoManifestRow("R-401", "R-401-b", 2, "train", "customer_phone", "corrugated", "usable", True, "S-12", "visible-damage-v1"),
    PhotoManifestRow("R-402", "R-402-a", 3, "validation", "customer_phone", "corrugated", "unusable", None, "S-08", "visible-damage-v1"),
    PhotoManifestRow("R-403", "R-403-a", 4, "validation", "warehouse_camera", "mailer", "usable", False, "S-08", "visible-damage-v1"),
    PhotoManifestRow("R-404", "R-404-a", 5, "test", "customer_phone", "mailer", "usable", True, "S-12", "visible-damage-v1"),
    PhotoManifestRow("R-404", "R-404-b", 6, "test", "customer_phone", "mailer", "usable", True, "S-12", "visible-damage-v1"),
    PhotoManifestRow("R-405", "R-405-a", 7, "test", "warehouse_camera", "corrugated", "usable", False, "S-08", "visible-damage-v1"),
    PhotoManifestRow("R-406", "R-406-a", 8, "test", "customer_phone", "corrugated", "unusable", None, "S-12", "visible-damage-v1"),
    PhotoManifestRow("R-407", "R-407-a", 9, "test", "customer_phone", "corrugated", "usable", False, "S-12", "visible-damage-v1"),
    PhotoManifestRow("R-408", "R-408-a", 10, "test", "warehouse_camera", "corrugated", "usable", True, "S-08", "visible-damage-v1"),
    PhotoManifestRow("R-409", "R-409-a", 11, "test", "warehouse_camera", "corrugated", "unusable", None, "S-08", "visible-damage-v1"),
]

print("photos:", len(PHOTOS))
print("cases:", len({photo.case_id for photo in PHOTOS}))

Output

photos: 11
cases: 9

02-audit-grouped-split-manifest.py

splits_by_case = defaultdict(set)
for photo in PHOTOS:
    splits_by_case[photo.case_id].add(photo.split)

split_days = {
    split: [photo.capture_day for photo in PHOTOS if photo.split == split]
    for split in ("train", "validation", "test")
}
manifest_checks = {
    "case_groups_do_not_cross_splits": all(len(splits) == 1 for splits in splits_by_case.values()),
    "time_ordered_splits": (
        all(split_days.values())
        and max(split_days["train"]) < min(split_days["validation"])
        and max(split_days["validation"]) < min(split_days["test"])
    ),
    "usable_labels_complete": all(photo.confirmed_damage is not None for photo in PHOTOS if photo.quality_label == "usable"),
    "unusable_labels_abstain": all(photo.confirmed_damage is None for photo in PHOTOS if photo.quality_label == "unusable"),
    "label_lineage_recorded": all(photo.reviewer_id and photo.guideline_version for photo in PHOTOS),
}

print("split photo counts:", dict(Counter(photo.split for photo in PHOTOS)))
print("R-401 splits:", sorted(splits_by_case["R-401"]))
print(json.dumps(manifest_checks, indent=2))

Output

split photo counts: {'train': 2, 'validation': 2, 'test': 7}
R-401 splits: ['train']
{
  "case_groups_do_not_cross_splits": true,
  "time_ordered_splits": true,
  "usable_labels_complete": true,
  "unusable_labels_abstain": true,
  "label_lineage_recorded": true
}

Encode quality-first review traces

03-load-model-output-bundle.py

@dataclass(frozen=True)
class ModelOutput:
    damage_probability: float
    blur_score: float
    brightness: float
    box_visible: bool

MODEL_OUTPUTS = {
    "R-401-a": ModelOutput(0.91, 0.12, 0.66, True),
    "R-401-b": ModelOutput(0.88, 0.10, 0.70, True),
    "R-402-a": ModelOutput(0.93, 0.71, 0.51, True),
    "R-403-a": ModelOutput(0.18, 0.08, 0.75, True),
    "R-404-a": ModelOutput(0.83, 0.10, 0.64, True),
    "R-404-b": ModelOutput(0.79, 0.14, 0.61, True),
    "R-405-a": ModelOutput(0.32, 0.11, 0.68, True),
    "R-406-a": ModelOutput(0.94, 0.12, 0.12, True),
    "R-407-a": ModelOutput(0.73, 0.09, 0.65, True),
    "R-408-a": ModelOutput(0.63, 0.07, 0.72, True),
    "R-409-a": ModelOutput(0.81, 0.08, 0.74, False),
}

BUNDLE = {
    "bundle_id": "damage-cnn-v1",
    "previous_bundle": "damage-cnn-v0",
    "preprocessing": "parcel-rgb-224-center-crop-v1",
    "label_guideline": "visible-damage-v1",
    "route_policy": "quality-first-review-v1",
    "damage_threshold": 0.70,
    "max_blur": 0.45,
    "min_brightness": 0.20,
}

print("bundle:", BUNDLE["bundle_id"], "threshold=", BUNDLE["damage_threshold"])

04-quality-first-route.py

def route(photo: PhotoManifestRow, model_output: ModelOutput) -> dict[str, object]:
    if not model_output.box_visible:
        action, reason = "request_new_photo", "package_not_visible"
    elif model_output.blur_score > BUNDLE["max_blur"] or model_output.brightness < BUNDLE["min_brightness"]:
        action, reason = "request_new_photo", "image_quality_gate"
    elif model_output.damage_probability >= BUNDLE["damage_threshold"]:
        action, reason = "priority_damage_review", "damage_threshold"
    else:
        action, reason = "normal_review", "below_threshold"

    return {
        "route_id": f"{BUNDLE['bundle_id']}:{photo.photo_id}",
        "photo_id": photo.photo_id,
        "case_id": photo.case_id,
        "routed_day": photo.capture_day,
        "source": photo.source,
        "packaging": photo.packaging,
        "bundle_id": BUNDLE["bundle_id"],
        "previous_bundle": BUNDLE["previous_bundle"],
        "preprocessing": BUNDLE["preprocessing"],
        "label_guideline": BUNDLE["label_guideline"],
        "route_policy": BUNDLE["route_policy"],
        "damage_threshold": BUNDLE["damage_threshold"],
        "max_blur": BUNDLE["max_blur"],
        "min_brightness": BUNDLE["min_brightness"],
        "damage_probability": model_output.damage_probability,
        "blur_score": model_output.blur_score,
        "brightness": model_output.brightness,
        "box_visible": model_output.box_visible,
        "action": action,
        "reason": reason,
    }

print("route keys:", len(route(PHOTOS[0], MODEL_OUTPUTS["R-401-a"])))

05-route-test-photos.py

test_photos = [photo for photo in PHOTOS if photo.split == "test"]
photo_by_id = {photo.photo_id: photo for photo in PHOTOS}
traces = [route(photo, MODEL_OUTPUTS[photo.photo_id]) for photo in test_photos]
trace_by_photo = {trace["photo_id"]: trace for trace in traces}
unsafe_priority_routes = [
    trace["photo_id"]
    for photo, trace in zip(test_photos, traces)
    if trace["action"] == "priority_damage_review" and photo.quality_label != "usable"
]

for photo_id in ("R-404-a", "R-406-a", "R-409-a"):
    trace = trace_by_photo[photo_id]
    print(photo_id, trace["action"], trace["reason"], f"score={trace['damage_probability']}")
print("route counts:", dict(Counter(trace["action"] for trace in traces)))
print("unsafe priority routes:", unsafe_priority_routes)

Output

R-404-a priority_damage_review damage_threshold score=0.83
R-406-a request_new_photo image_quality_gate score=0.94
R-409-a request_new_photo package_not_visible score=0.81
route counts: {'priority_damage_review': 3, 'normal_review': 2, 'request_new_photo': 2}
unsafe priority routes: []

Package the Vision Service

Submit an inspectable repository, not a notebook screenshot:

text

damage-vision-service/
  data/
    label_guidelines.md
    photo_manifest.parquet
    split_manifest.json
  model/
    train_cnn_baseline.py
    evaluate_slices.py
    model_card.md
  service/
    preprocess.py
    route_review.py
    trace_schema.json
  monitoring/
    input_quality_report.py
    delayed_review_outcomes.py
    specialist_shadow_receipt.py
  tests/
    test_case_groups_do_not_cross_splits.py
    test_blurry_photo_never_escalates.py
    test_later_outcomes_join_routes.py
    test_empty_review_window_holds.py
    test_route_trace_is_versioned.py
    test_previous_bundle_required.py

The routing cells emit that trace shape directly. They let a reviewer reconstruct the route:

Response field	Example
bundle and preprocessing	`damage-cnn-v1`, `parcel-rgb-224-center-crop-v1`
rollback and label contract	`damage-cnn-v0`, `visible-damage-v1`
quality values	blur `0.12`, brightness `0.66`, box visible `true`
score and action policy	damage `0.91`, threshold `0.70`, blur maximum `0.45`, brightness minimum `0.20`
route	`priority_damage_review`
human outcome later	`confirmed_damage` or `not_supported`

Join delayed outcomes before promotion

Separate immediate checks from delayed quality:

Window	Monitor	Trigger
immediate	unreadable image rate, brightness, blur, missing package, latency	investigate capture path or fail to manual intake
delayed	specialist-confirmed precision, missed visible damage, route rate by source and packaging	hold promotion or create retraining candidate
safety review	unsupported escalations, policy actions attempted without specialist approval	rollback and audit workflow

06-append-specialist-outcomes.py

@dataclass(frozen=True)
class SpecialistOutcome:
    case_id: str
    reviewed_day: int
    confirmed_damage: bool
    reviewer_id: str
    guideline_version: str

SPECIALIST_OUTCOMES = [
    SpecialistOutcome("R-404", 12, True, "S-12", "visible-damage-v1"),
    SpecialistOutcome("R-405", 12, False, "S-08", "visible-damage-v1"),
    SpecialistOutcome("R-407", 12, False, "S-12", "visible-damage-v1"),
    SpecialistOutcome("R-408", 12, True, "S-08", "visible-damage-v1"),
]

usable_test_case_ids = {
    photo.case_id
    for photo in test_photos
    if photo.quality_label == "usable"
}
traces_by_case = defaultdict(list)
for trace in traces:
    traces_by_case[trace["case_id"]].append(trace)

print("usable test cases:", sorted(usable_test_case_ids))
print("specialist outcomes:", len(SPECIALIST_OUTCOMES))

07-aggregate-case-actions.py

def aggregate_usable_case_action(case_id: str) -> str:
    usable_traces = [
        trace
        for trace in traces_by_case[case_id]
        if photo_by_id[trace["photo_id"]].quality_label == "usable"
    ]
    if not usable_traces:
        raise ValueError(f"no usable traces for {case_id}")
    if any(trace["action"] == "priority_damage_review" for trace in usable_traces):
        return "priority_damage_review"
    return "normal_review"

case_actions = {
    case_id: aggregate_usable_case_action(case_id)
    for case_id in sorted(usable_test_case_ids)
}
outcome_by_case = {outcome.case_id: outcome for outcome in SPECIALIST_OUTCOMES}
joined_outcomes = [
    (case_id, case_actions[case_id], outcome_by_case[case_id].confirmed_damage)
    for case_id in sorted(case_actions.keys() & outcome_by_case.keys())
]

print("case_actions:", case_actions)
print("joined usable cases:", len(joined_outcomes))

08-measure-delayed-quality.py

def rate_or_none(numerator: int, denominator: int) -> float | None:
    return round(numerator / denominator, 3) if denominator else None

true_positives = sum(
    action == "priority_damage_review" and confirmed_damage
    for _, action, confirmed_damage in joined_outcomes
)
false_positives = sum(
    action == "priority_damage_review" and not confirmed_damage
    for _, action, confirmed_damage in joined_outcomes
)
false_negatives = sum(
    action != "priority_damage_review" and confirmed_damage
    for _, action, confirmed_damage in joined_outcomes
)
delayed_quality = {
    "priority_precision": rate_or_none(true_positives, true_positives + false_positives),
    "priority_recall": rate_or_none(true_positives, true_positives + false_negatives),
    "reviewed_usable_cases": len(joined_outcomes),
}

source_slices = {
    source: {
        "photos": len(rows),
        "request_new_photo_rate": round(
            sum(trace["action"] == "request_new_photo" for trace in rows) / len(rows),
            3,
        ),
    }
    for source in sorted({trace["source"] for trace in traces})
    for rows in [[trace for trace in traces if trace["source"] == source]]
}

print("delayed_quality:", delayed_quality)
print("source_slices:", source_slices)

09-release-gate-checklist.py

required_trace_fields = {
    "route_id", "photo_id", "case_id", "routed_day", "source", "packaging",
    "bundle_id", "previous_bundle", "preprocessing", "label_guideline",
    "route_policy", "damage_threshold", "max_blur", "min_brightness",
    "damage_probability", "blur_score", "brightness", "box_visible",
    "action", "reason",
}
outcome_case_ids = [outcome.case_id for outcome in SPECIALIST_OUTCOMES]
release_gates = {
    **manifest_checks,
    "route_ids_unique": len({trace["route_id"] for trace in traces}) == len(traces),
    "all_test_photos_routed": len(traces) == len(test_photos),
    "unusable_photos_abstain": all(
        trace_by_photo[photo.photo_id]["action"] == "request_new_photo"
        for photo in test_photos
        if photo.quality_label == "unusable"
    ),
    "unsafe_priority_routes_absent": not unsafe_priority_routes,
    "route_traces_replayable": all(required_trace_fields <= trace.keys() for trace in traces),
    "specialist_outcome_case_ids_unique": len(set(outcome_case_ids)) == len(outcome_case_ids),
    "specialist_outcomes_join_usable_test_cases": set(outcome_case_ids) <= usable_test_case_ids,
    "usable_test_cases_have_specialist_outcomes": usable_test_case_ids <= set(outcome_case_ids),
    "specialist_outcomes_arrive_after_capture": all(
        outcome.reviewed_day > max(
            photo.capture_day for photo in test_photos if photo.case_id == outcome.case_id
        )
        for outcome in SPECIALIST_OUTCOMES
        if outcome.case_id in usable_test_case_ids
    ),
    "specialist_outcome_label_contract_matches_bundle": all(
        outcome.reviewer_id
        and outcome.guideline_version == BUNDLE["label_guideline"]
        for outcome in SPECIALIST_OUTCOMES
    ),
    "priority_precision_evidence_at_least_0_50": (
        delayed_quality["priority_precision"] is not None
        and delayed_quality["priority_precision"] >= 0.50
    ),
    "priority_recall_evidence_at_least_0_50": (
        delayed_quality["priority_recall"] is not None
        and delayed_quality["priority_recall"] >= 0.50
    ),
    "source_slices_recorded": set(source_slices) == {"customer_phone", "warehouse_camera"},
    "rollback_pointer_recorded": bool(BUNDLE["previous_bundle"]),
}

print("release_gates_pass:", all(release_gates.values()))

10-publish-specialist-shadow-receipt.py

receipt = {
    "candidate_bundle": BUNDLE["bundle_id"],
    "previous_bundle": BUNDLE["previous_bundle"],
    "preprocessing": BUNDLE["preprocessing"],
    "label_guideline": BUNDLE["label_guideline"],
    "route_policy": BUNDLE["route_policy"],
    "route_traces": len(traces),
    "later_specialist_outcomes": len(SPECIALIST_OUTCOMES),
    "joined_usable_cases": len(joined_outcomes),
    "case_actions": case_actions,
    "delayed_quality": delayed_quality,
    "source_slices": source_slices,
    "release_gates": release_gates,
    "candidate_decision": "candidate_for_specialist_shadow_review" if all(release_gates.values()) else "hold",
}

print(json.dumps(receipt, indent=2))

Output

{
  "candidate_bundle": "damage-cnn-v1",
  "previous_bundle": "damage-cnn-v0",
  "preprocessing": "parcel-rgb-224-center-crop-v1",
  "label_guideline": "visible-damage-v1",
  "route_policy": "quality-first-review-v1",
  "route_traces": 7,
  "later_specialist_outcomes": 4,
  "joined_usable_cases": 4,
  "case_actions": {
    "R-404": "priority_damage_review",
    "R-405": "normal_review",
    "R-407": "priority_damage_review",
    "R-408": "normal_review"
  },
  "delayed_quality": {
    "priority_precision": 0.5,
    "priority_recall": 0.5,
    "reviewed_usable_cases": 4
  },
  "source_slices": {
    "customer_phone": {
      "photos": 4,
      "request_new_photo_rate": 0.25
    },
    "warehouse_camera": {
      "photos": 3,
      "request_new_photo_rate": 0.333
    }
  },
  "release_gates": {
    "case_groups_do_not_cross_splits": true,
    "time_ordered_splits": true,
    "usable_labels_complete": true,
    "unusable_labels_abstain": true,
    "label_lineage_recorded": true,
    "route_ids_unique": true,
    "all_test_photos_routed": true,
    "unusable_photos_abstain": true,
    "unsafe_priority_routes_absent": true,
    "route_traces_replayable": true,
    "specialist_outcome_case_ids_unique": true,
    "specialist_outcomes_join_usable_test_cases": true,
    "usable_test_cases_have_specialist_outcomes": true,
    "specialist_outcomes_arrive_after_capture": true,
    "specialist_outcome_label_contract_matches_bundle": true,
    "priority_precision_evidence_at_least_0_50": true,
    "priority_recall_evidence_at_least_0_50": true,
    "source_slices_recorded": true,
    "rollback_pointer_recorded": true
  },
  "candidate_decision": "candidate_for_specialist_shadow_review"
}

Practice: break the vision contract

Use the runnable examples as a release harness. Change one condition at a time, predict the failure, then rerun the examples.

Change R-401-b split from train to validation. Which dataset gate fails?
Change R-406-a brightness in MODEL_OUTPUTS from 0.12 to 0.30. Why does this expose a quality-detector failure rather than prove the escalation is safe?
Remove damage_threshold from the response trace. Which receipt gate fails?
Change R-408-a damage probability in MODEL_OUTPUTS from 0.63 to 0.75. Which delayed metrics improve?
Set BUNDLE["previous_bundle"] = "". Why is shadow evidence no longer promotion-ready?
Replace SPECIALIST_OUTCOMES with an empty list. Why do evidence gates fail instead of crashing or passing?
Add SpecialistOutcome("R-999", 12, True, "S-12", "visible-damage-v1"). Which join gate fails?
Change R-408 specialist outcome to guideline visible-damage-v2. Which provenance gate fails?

Practice answer sketches

Mastery check

Evaluation rubric

Artifact	Strong submission demonstrates
dataset contract	case-grouped time split, quality labels, damage labels, and reviewed slices
service	versioned preprocessing and safe quality-first routing with abstention
operations	immutable route traces, joined delayed outcomes, model card, source slices, shadow review, and rollback

Common failures

Symptom	Cause	Fix
Holdout score is unrealistically high	photos from one case crossed splits	group by physical case and time
Blurry image triggers damage escalation	score evaluated before quality	gate evidence quality first
Specialist metric changes after a later edit	route trace was overwritten with outcome data	append specialist outcomes and join by case ID
Precision shifts after guideline update	delayed outcomes omit reviewer or label version	bind each outcome to reviewer and bundle label contract
Multi-photo case result depends on row order	case aggregation policy is implicit	define deterministic case-level priority routing
New camera changes decisions silently	preprocessing and source drift untracked	log source/quality slices and version bundle
Candidate advances without rollback target	receipt omits previous alias	publish immutable candidate and rollback pointer together

Next Step

Continue to Capstone: Production ML Pipeline

You have shipped tabular, ranking, forecasting, and vision artifacts with their own action gates. Next you'll manage them under one validated promotion, monitoring, and rollback workflow.

PreviousCapstone: Demand Forecasting

Share this article

X Facebook LinkedIn Bluesky Reddit Hacker News Email

References

Model Cards for Model Reporting

Mitchell, M., Wu, S., Zaldivar, A., et al. · 2019 · FAT* 2019

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Dosovitskiy, A., et al. · 2020 · ICLR 2021

MLOps: Continuous Delivery and Automation Pipelines in Machine Learning.

Google Cloud. · 2026 · Official documentation

Discussion

Questions and insights from fellow learners.

Discussion loads when you reach this section.

Capstone: Image Damage Classifier

Define the photo decision first

Build a dataset that can't leak

Freeze a grouped photo manifest

Encode quality-first review traces

Package the Vision Service

Join delayed outcomes before promotion

Practice: break the vision contract

Practice answer sketches

Mastery check

Evaluation rubric

Common failures

Mastery Check

Discussion

Capstone: Image Damage Classifier

Define the photo decision first

Build a dataset that can't leak

Freeze a grouped photo manifest

Encode quality-first review traces

Package the Vision Service

Join delayed outcomes before promotion

Practice: break the vision contract

Practice answer sketches

Mastery check

Evaluation rubric

Common failures

Mastery Check

Discussion

Capstone: Image Damage Classifier

Define the photo decision first

Build a dataset that can't leak

Freeze a grouped photo manifest

Encode quality-first review traces

Package the Vision Service

Join delayed outcomes before promotion

Practice: break the vision contract

Practice answer sketches

Which gate fails when R-401-b moves to validation?

Why isn't raising R-406-a brightness a safe escalation?

Which gate fails when route traces omit damage_threshold?

What changes when R-408-a score rises from 0.63 to 0.75?

Why keep previous_bundle beside candidate?

What happens when no specialist outcomes have arrived?

Which gate fails when outcome R-999 appears?

Which gate fails when R-408 uses visible-damage-v2?

Mastery check

Why must multiple photos of one returned package stay in a single split?

Why does the quality check run before the damage threshold?

Why append specialist outcomes separately instead of writing them into route traces?

What belongs in the deployed vision bundle besides model weights?

Why are request-new-photo rate and specialist-confirmed precision separate metrics?

Evaluation rubric

Common failures

Mastery Check

Discussion

Capstone: Image Damage Classifier

Define the photo decision first

Build a dataset that can't leak

Freeze a grouped photo manifest

Encode quality-first review traces

Package the Vision Service

Join delayed outcomes before promotion

Practice: break the vision contract

Practice answer sketches

Which gate fails when R-401-b moves to validation?

Why isn't raising R-406-a brightness a safe escalation?

Which gate fails when route traces omit damage_threshold?

What changes when R-408-a score rises from 0.63 to 0.75?

Why keep previous_bundle beside candidate?

What happens when no specialist outcomes have arrived?

Which gate fails when outcome R-999 appears?

Which gate fails when R-408 uses visible-damage-v2?

Mastery check

Why must multiple photos of one returned package stay in a single split?

Why does the quality check run before the damage threshold?

Why append specialist outcomes separately instead of writing them into route traces?

What belongs in the deployed vision bundle besides model weights?

Why are request-new-photo rate and specialist-confirmed precision separate metrics?

Evaluation rubric

Common failures

Mastery Check

Discussion