Ship a damaged-package photo triage service with quality gates, slice evaluation, serving bundles, and review monitoring.
You have shipped models over shipment rows, ranked items, and warehouse time series. A customer return adds a new input type: a photo of a package that may be crushed, torn, blurred, dark, or unrelated to the order.
Earlier, you traced a convolutional neural network (CNN) over a damaged-package image patch. This capstone turns that spatial reasoning into a product: an image triage endpoint that flags likely visible damage, rejects unusable photos, preserves evidence for human review, and never turns an uncertain image score directly into a refund.
ShopFlow receives return photos from customers and warehouse intake stations. The useful product question is not "does the model recognize every defect?" It is: which photo should a specialist inspect first, and when is the photo too weak to support any decision?
Use three operational outcomes:
| Action | Evidence | Product behavior |
|---|---|---|
request_new_photo | image is too blurred, dark, or incomplete | ask for a clearer upload before assessing damage |
normal_review | usable image, low damage score | keep ordinary return workflow |
priority_damage_review | usable image, high damage score | surface to specialist with photo and score trace |
The classifier isn't a refund policy. Product eligibility still depends on order ownership, item type, return window, and specialist judgment. This separation prevents a shadow or reflection in a photo from issuing a costly action.
A model card should state the intended use, input constraints, decision threshold, evaluated slices, and known failure cases. Model cards were proposed as structured reports for exactly this type of deployed-model context: users need more than a metric without its operating conditions.[1]
For tabular models, leakage may be a future delivery timestamp. For photos, leakage often hides in nearly identical pixels. A customer may upload three bursts of the same crushed box. A warehouse may photograph one parcel from four angles. If related images land in both train and test sets, the model can memorize one package rather than generalize to new damage.
Your manifest should contain:
| Field | Why it matters |
|---|---|
shipment_id and capture_at | group all photos for one physical case and preserve time ordering |
source | separate customer phone uploads from warehouse inspection cameras |
quality_label | distinguish unusable evidence from visible damage |
damage_label | record specialist-confirmed visible damage only on usable photos |
split | hold out later shipments, never random photos from the same case |
reviewer_id and guideline version | audit disagreement or changed label definitions |
Evaluate at least daylight versus dark uploads, customer versus warehouse source, packaging type, and visible-defect size. A global score can hide the exact failure that matters: small tears disappearing in dark phone images.
Use the CNN learned earlier as a baseline, then fine-tune a pretrained image encoder only if you record its preprocessing and measure it under the same split. A later deep-dive explains Vision Transformer image encoders; this capstone doesn't require that architecture.[2]
The model endpoint should receive a preprocessing result and a damage score, then choose a review route. The gate below refuses to use high damage confidence when the image evidence is unusable.
1from dataclasses import dataclass
2
3@dataclass(frozen=True)
4class PhotoScore:
5 case_id: str
6 damage_probability: float
7 blur_score: float
8 brightness: float
9 box_visible: bool
10
11POLICY = {
12 "damage_threshold": 0.70,
13 "max_blur": 0.45,
14 "min_brightness": 0.20,
15 "model": "damage_cnn_v1",
16 "preprocess": "parcel_rgb_224_v1",
17}
18
19def route(score: PhotoScore) -> dict[str, str]:
20 if not score.box_visible:
21 return {"action": "request_new_photo", "reason": "package_not_visible"}
22 if score.blur_score > POLICY["max_blur"] or score.brightness < POLICY["min_brightness"]:
23 return {"action": "request_new_photo", "reason": "image_quality_gate"}
24 if score.damage_probability >= POLICY["damage_threshold"]:
25 return {"action": "priority_damage_review", "reason": "damage_threshold"}
26 return {"action": "normal_review", "reason": "below_threshold"}
27
28photos = [
29 PhotoScore("R-401", 0.91, 0.12, 0.66, True),
30 PhotoScore("R-402", 0.93, 0.71, 0.51, True),
31 PhotoScore("R-403", 0.18, 0.08, 0.75, True),
32]
33
34for photo in photos:
35 result = route(photo)
36 print(photo.case_id, result["action"], result["reason"])1R-401 priority_damage_review damage_threshold
2R-402 request_new_photo image_quality_gate
3R-403 normal_review below_thresholdCase R-402 is the important failure test: an apparent high damage probability isn't usable evidence because blur fails first. The endpoint asks for another photo rather than escalating an unsupported claim.
Submit an inspectable repository, not a notebook screenshot:
1damage-vision-service/
2 data/
3 label_guidelines.md
4 photo_manifest.parquet
5 split_manifest.json
6 model/
7 train_cnn_baseline.py
8 evaluate_slices.py
9 model_card.md
10 service/
11 preprocess.py
12 route_review.py
13 response_schema.json
14 monitoring/
15 input_quality_report.py
16 delayed_review_outcomes.py
17 tests/
18 test_shipment_groups_do_not_cross_splits.py
19 test_blurry_photo_never_escalates.pyThe serving bundle must pin image resize and crop behavior, color normalization, model weights, label version, damage threshold, and quality-gate thresholds. A change from center crop to full-frame resize may change whether a torn corner remains visible; it is a model behavior change even when weights remain constant.
Return a trace that lets a reviewer reconstruct the route:
| Response field | Example |
|---|---|
| model and preprocessing | damage_cnn_v1, parcel_rgb_224_v1 |
| quality values | blur 0.12, brightness 0.66, box visible true |
| score and action policy | damage 0.91, threshold 0.70 |
| route | priority_damage_review |
| human outcome later | confirmed_damage or not_supported |
Photo models drift when the image source changes. A new warehouse camera, winter lighting, a mobile upload compressor, or new packaging graphics can alter pixels before a confirmed-damage label exists.
Separate immediate checks from delayed quality:
| Window | Monitor | Trigger |
|---|---|---|
| immediate | unreadable image rate, brightness, blur, missing package, latency | investigate capture path or fail to manual intake |
| delayed | specialist-confirmed precision, missed visible damage, route rate by source and packaging | hold promotion or create retraining candidate |
| safety review | unsupported escalations, policy actions attempted without specialist approval | rollback and audit workflow |
Google Cloud's MLOps guidance treats serving, monitoring, validation, metadata, and continuous training as connected stages rather than a one-time deploy step.[3] Apply that same discipline here: a change in image quality creates an investigation or candidate run, never an automatic production replacement.
| Artifact | Strong submission demonstrates |
|---|---|
| dataset contract | shipment-grouped time split, quality labels, damage labels, and reviewed slices |
| service | versioned preprocessing and safe quality-first routing with abstention |
| operations | model card, delayed specialist outcomes, drift checks, candidate promotion, and rollback |
| Symptom | Cause | Fix |
|---|---|---|
| Holdout score is unrealistically high | photos from one shipment crossed splits | group by physical case and time |
| Blurry image triggers damage escalation | score evaluated before quality | gate evidence quality first |
| New camera changes decisions silently | preprocessing and source drift untracked | log source/quality slices and version bundle |
Model Cards for Model Reporting
Mitchell, M., Wu, S., Zaldivar, A., et al. ยท 2019 ยท FAT* 2019
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
Dosovitskiy, A., et al. ยท 2020 ยท ICLR 2021
MLOps: Continuous Delivery and Automation Pipelines in Machine Learning.
Google Cloud. ยท 2026 ยท Official documentation