LeetLLM
LearnFeaturesBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Features
  • Blog

Legal

  • Terms of Service
  • Privacy Policy

ยฉ 2026 LeetLLM. All rights reserved.

All Topics
Your Progress
0%

0 of 151 articles completed

๐Ÿ› ๏ธComputing Foundations0/6
NumPy and Tensor ShapesCUDA for ML TrainingMPS & Metal for ML on MacData Structures for AISQL and Data ModelingAlgorithms for ML Engineers
๐Ÿ“ŠMath & Statistics0/8
Gradients and BackpropVectors, Matrices & TensorsLinear Algebra for MLAdam, Momentum, SchedulersProbability for Machine LearningStatistics and UncertaintyDistributions and SamplingHypothesis Tests, Intervals, and pass@k
๐Ÿ“šPreparation & Prerequisites0/13
Neural Networks from ScratchCNNs from ScratchTraining & BackpropagationSoftmax, Cross-Entropy & OptimizationRNNs, LSTMs, GRUs, and Sequence ModelingAutoencoders and VAEsThe Transformer Architecture End-to-EndLanguage Modeling & Next TokensFrom GPT to Modern LLMsPrompt Engineering FundamentalsCalling LLM APIs in ProductionFirst AI App End-to-EndThe LLM Lifecycle
๐ŸงฎML Algorithms & Evaluation0/11
Linear Regression from ScratchLogistic Regression and MetricsDecision Trees, Forests, and BoostingReinforcement Learning BasicsValidation and LeakageClustering and PCACore Retrieval AlgorithmsDecoding AlgorithmsExperiment Design and A/B TestingPyTorch Training LoopsDataset Pipelines and Data Quality
๐Ÿ“ฆProduction ML Systems0/6
Feature Engineering for Production MLBatch and Streaming Feature PipelinesGradient Boosted Trees in ProductionRanking and Recommendation SystemsForecasting and Anomaly DetectionMonitoring Predictive Models
๐ŸงชCore LLM Foundations0/8
The Bitter Lesson & ComputeBPE, WordPiece, and SentencePieceStatic to Contextual EmbeddingsPerplexity & Model EvaluationFile Ingestion for AIChunking StrategiesLLM Benchmarks & LimitationsInstruction Tuning & Chat Templates
๐ŸงฐApplied LLM Engineering0/23
Dimensionality Reduction for EmbeddingsCoT, ToT & Self-Consistency PromptingFunction Calling & Tool UseMCP & Tool Protocol StandardsPrompt Injection DefenseResponsible AI GovernanceData Labeling and Human FeedbackEvaluating AI AgentsProduction RAG PipelinesHybrid Search: Dense + SparseReranking and Cross-Encoders for RAGRAG Evaluation for Reliable AnswersLLM-as-a-Judge EvaluationBias & Fairness in LLMsHallucination Detection & MitigationLLM Observability & MonitoringExperiment Tracking with MLflow and W&BMixed Precision TrainingModel Versioning & DeploymentSemantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsModel Gateways, Routing, and FallbacksDesign an Automated Support Agent
๐ŸŽ“Portfolio Capstones0/9
Capstone: Delivery ETA PredictionCapstone: Product RankingCapstone: Demand ForecastingCapstone: Image Damage ClassifierCapstone: Production ML PipelineCapstone: Document QACapstone: Eval DashboardCapstone: Fine-Tuned ClassifierCapstone: Production Agent
๐Ÿง Transformer Deep Dives0/8
Sentence Embeddings & Contrastive LossEmbedding Similarity & QuantizationScaled Dot-Product AttentionVision Transformers and Image EncodersPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNMechanistic InterpretabilityDecoding Strategies: Greedy to Nucleus
๐ŸงฌAdvanced Training & Adaptation0/16
Scaling Laws & Compute-Optimal TrainingPre-training Data at ScaleBuild GPT from Scratch LabContinued Pretraining for Domain ShiftSynthetic Data PipelinesSupervised Fine-Tuning PipelineDistributed Training: FSDP & ZeROLoRA & Parameter-Efficient TuningReward Modeling from Preference DataRLHF & DPO AlignmentConstitutional AI & Red TeamingRLVR & Verifiable RewardsKnowledge Distillation for LLMsModel Merging and Weight InterpolationPrompt Optimization with DSPyRecursive Language Models (RLM)
๐Ÿค–Advanced Agents & Retrieval0/14
Vector DB Internals: HNSW & IVFAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access ControlStructured Output GenerationReAct & Plan-and-ExecuteGuardrails & Safety FiltersCode Generation & SandboxingComputer-Use / GUI / Browser AgentsHuman-in-the-Loop Agent ArchitectureAI Coding Workflow with AgentsAgent Memory & PersistenceAgent Failure & RecoveryMulti-Agent Orchestration
โšกInference & Production Scale0/20
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionPrefix Caching and Prompt CachingFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceModel Parallelism for LLM InferenceModel Quantization: GPTQ, AWQ & GGUFLocal LLM DeploymentSLM Specialization & Edge DeploymentSpeculative DecodingLong Context Window ManagementContext EngineeringMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time ComputeAdvanced MLOps & DevOps for AIGPU Serving & AutoscalingA/B Testing for LLMs
๐Ÿ—๏ธSystem Design Capstones0/9
Content Moderation SystemCode Completion SystemMulti-Tenant LLM PlatformLLM-Powered Search EngineVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models & Image GenerationReal-Time Voice AI AgentReasoning & Test-Time Compute
Back to Topics
LearnPortfolio CapstonesCapstone: Image Damage Classifier
๐Ÿ‘๏ธHardMultimodal Models

Capstone: Image Damage Classifier

Ship a damaged-package photo triage service with quality gates, slice evaluation, serving bundles, and review monitoring.

9 min read
Learning path
Step 79 of 151 in the full curriculum
Capstone: Demand ForecastingCapstone: Production ML Pipeline

Capstone: Image Damage Classifier

You have shipped models over shipment rows, ranked items, and warehouse time series. A customer return adds a new input type: a photo of a package that may be crushed, torn, blurred, dark, or unrelated to the order.

Earlier, you traced a convolutional neural network (CNN) over a damaged-package image patch. This capstone turns that spatial reasoning into a product: an image triage endpoint that flags likely visible damage, rejects unusable photos, preserves evidence for human review, and never turns an uncertain image score directly into a refund.

Damaged-package vision capstone taking a return photo through image-quality checks, a damage classifier, review routing, and delayed monitoring. Damaged-package vision capstone taking a return photo through image-quality checks, a damage classifier, review routing, and delayed monitoring.
A damage model can prioritize review, but a blurry or low-confidence upload must be routed for better evidence rather than treated as proof of damage.

Define the Photo Decision First

ShopFlow receives return photos from customers and warehouse intake stations. The useful product question is not "does the model recognize every defect?" It is: which photo should a specialist inspect first, and when is the photo too weak to support any decision?

Use three operational outcomes:

ActionEvidenceProduct behavior
request_new_photoimage is too blurred, dark, or incompleteask for a clearer upload before assessing damage
normal_reviewusable image, low damage scorekeep ordinary return workflow
priority_damage_reviewusable image, high damage scoresurface to specialist with photo and score trace

The classifier isn't a refund policy. Product eligibility still depends on order ownership, item type, return window, and specialist judgment. This separation prevents a shadow or reflection in a photo from issuing a costly action.

A model card should state the intended use, input constraints, decision threshold, evaluated slices, and known failure cases. Model cards were proposed as structured reports for exactly this type of deployed-model context: users need more than a metric without its operating conditions.[1]

Diagram showing Return photo + labels split by shipment, CNN candidate + gates preprocessing + slices, and Review route + monitor human outcomes. Diagram showing Return photo + labels split by shipment, CNN candidate + gates preprocessing + slices, and Review route + monitor human outcomes.
Return photo + labels split by shipment, CNN candidate + gates preprocessing + slices, and Review route + monitor human outcomes.

Build a Dataset That Cannot Leak

For tabular models, leakage may be a future delivery timestamp. For photos, leakage often hides in nearly identical pixels. A customer may upload three bursts of the same crushed box. A warehouse may photograph one parcel from four angles. If related images land in both train and test sets, the model can memorize one package rather than generalize to new damage.

Your manifest should contain:

FieldWhy it matters
shipment_id and capture_atgroup all photos for one physical case and preserve time ordering
sourceseparate customer phone uploads from warehouse inspection cameras
quality_labeldistinguish unusable evidence from visible damage
damage_labelrecord specialist-confirmed visible damage only on usable photos
splithold out later shipments, never random photos from the same case
reviewer_id and guideline versionaudit disagreement or changed label definitions

Evaluate at least daylight versus dark uploads, customer versus warehouse source, packaging type, and visible-defect size. A global score can hide the exact failure that matters: small tears disappearing in dark phone images.

Use the CNN learned earlier as a baseline, then fine-tune a pretrained image encoder only if you record its preprocessing and measure it under the same split. A later deep-dive explains Vision Transformer image encoders; this capstone doesn't require that architecture.[2]

Encode a Safe Routing Policy

The model endpoint should receive a preprocessing result and a damage score, then choose a review route. The gate below refuses to use high damage confidence when the image evidence is unusable.

damage-photo-routing-policy.py
1from dataclasses import dataclass 2 3@dataclass(frozen=True) 4class PhotoScore: 5 case_id: str 6 damage_probability: float 7 blur_score: float 8 brightness: float 9 box_visible: bool 10 11POLICY = { 12 "damage_threshold": 0.70, 13 "max_blur": 0.45, 14 "min_brightness": 0.20, 15 "model": "damage_cnn_v1", 16 "preprocess": "parcel_rgb_224_v1", 17} 18 19def route(score: PhotoScore) -> dict[str, str]: 20 if not score.box_visible: 21 return {"action": "request_new_photo", "reason": "package_not_visible"} 22 if score.blur_score > POLICY["max_blur"] or score.brightness < POLICY["min_brightness"]: 23 return {"action": "request_new_photo", "reason": "image_quality_gate"} 24 if score.damage_probability >= POLICY["damage_threshold"]: 25 return {"action": "priority_damage_review", "reason": "damage_threshold"} 26 return {"action": "normal_review", "reason": "below_threshold"} 27 28photos = [ 29 PhotoScore("R-401", 0.91, 0.12, 0.66, True), 30 PhotoScore("R-402", 0.93, 0.71, 0.51, True), 31 PhotoScore("R-403", 0.18, 0.08, 0.75, True), 32] 33 34for photo in photos: 35 result = route(photo) 36 print(photo.case_id, result["action"], result["reason"])
Output
1R-401 priority_damage_review damage_threshold 2R-402 request_new_photo image_quality_gate 3R-403 normal_review below_threshold

Case R-402 is the important failure test: an apparent high damage probability isn't usable evidence because blur fails first. The endpoint asks for another photo rather than escalating an unsupported claim.

Package the Vision Service

Submit an inspectable repository, not a notebook screenshot:

text
1damage-vision-service/ 2 data/ 3 label_guidelines.md 4 photo_manifest.parquet 5 split_manifest.json 6 model/ 7 train_cnn_baseline.py 8 evaluate_slices.py 9 model_card.md 10 service/ 11 preprocess.py 12 route_review.py 13 response_schema.json 14 monitoring/ 15 input_quality_report.py 16 delayed_review_outcomes.py 17 tests/ 18 test_shipment_groups_do_not_cross_splits.py 19 test_blurry_photo_never_escalates.py

The serving bundle must pin image resize and crop behavior, color normalization, model weights, label version, damage threshold, and quality-gate thresholds. A change from center crop to full-frame resize may change whether a torn corner remains visible; it is a model behavior change even when weights remain constant.

Return a trace that lets a reviewer reconstruct the route:

Response fieldExample
model and preprocessingdamage_cnn_v1, parcel_rgb_224_v1
quality valuesblur 0.12, brightness 0.66, box visible true
score and action policydamage 0.91, threshold 0.70
routepriority_damage_review
human outcome laterconfirmed_damage or not_supported

Monitor Photos and Human Outcomes

Photo models drift when the image source changes. A new warehouse camera, winter lighting, a mobile upload compressor, or new packaging graphics can alter pixels before a confirmed-damage label exists.

Separate immediate checks from delayed quality:

WindowMonitorTrigger
immediateunreadable image rate, brightness, blur, missing package, latencyinvestigate capture path or fail to manual intake
delayedspecialist-confirmed precision, missed visible damage, route rate by source and packaginghold promotion or create retraining candidate
safety reviewunsupported escalations, policy actions attempted without specialist approvalrollback and audit workflow

Google Cloud's MLOps guidance treats serving, monitoring, validation, metadata, and continuous training as connected stages rather than a one-time deploy step.[3] Apply that same discipline here: a change in image quality creates an investigation or candidate run, never an automatic production replacement.

Mastery Check

Evaluation rubric

ArtifactStrong submission demonstrates
dataset contractshipment-grouped time split, quality labels, damage labels, and reviewed slices
serviceversioned preprocessing and safe quality-first routing with abstention
operationsmodel card, delayed specialist outcomes, drift checks, candidate promotion, and rollback

Common Failures

SymptomCauseFix
Holdout score is unrealistically highphotos from one shipment crossed splitsgroup by physical case and time
Blurry image triggers damage escalationscore evaluated before qualitygate evidence quality first
New camera changes decisions silentlypreprocessing and source drift untrackedlog source/quality slices and version bundle
Next Step
Continue to Capstone: Production ML Pipeline

You have shipped tabular, ranking, forecasting, and vision artifacts with their own action gates. Next you will manage them under one validated promotion, monitoring, and rollback workflow.

PreviousCapstone: Demand Forecasting
Share this article
XFacebookLinkedInBlueskyRedditHacker NewsEmail
References

Model Cards for Model Reporting

Mitchell, M., Wu, S., Zaldivar, A., et al. ยท 2019 ยท FAT* 2019

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Dosovitskiy, A., et al. ยท 2020 ยท ICLR 2021

MLOps: Continuous Delivery and Automation Pipelines in Machine Learning.

Google Cloud. ยท 2026 ยท Official documentation