LeetLLM
LearnFeaturesBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Features
  • Blog

Legal

  • Terms of Service
  • Privacy Policy

ยฉ 2026 LeetLLM. All rights reserved.

All Topics
Your Progress
0%

0 of 151 articles completed

๐Ÿ› ๏ธComputing Foundations0/6
NumPy and Tensor ShapesCUDA for ML TrainingMPS & Metal for ML on MacData Structures for AISQL and Data ModelingAlgorithms for ML Engineers
๐Ÿ“ŠMath & Statistics0/8
Gradients and BackpropVectors, Matrices & TensorsLinear Algebra for MLAdam, Momentum, SchedulersProbability for Machine LearningStatistics and UncertaintyDistributions and SamplingHypothesis Tests, Intervals, and pass@k
๐Ÿ“šPreparation & Prerequisites0/13
Neural Networks from ScratchCNNs from ScratchTraining & BackpropagationSoftmax, Cross-Entropy & OptimizationRNNs, LSTMs, GRUs, and Sequence ModelingAutoencoders and VAEsThe Transformer Architecture End-to-EndLanguage Modeling & Next TokensFrom GPT to Modern LLMsPrompt Engineering FundamentalsCalling LLM APIs in ProductionFirst AI App End-to-EndThe LLM Lifecycle
๐ŸงฎML Algorithms & Evaluation0/11
Linear Regression from ScratchLogistic Regression and MetricsDecision Trees, Forests, and BoostingReinforcement Learning BasicsValidation and LeakageClustering and PCACore Retrieval AlgorithmsDecoding AlgorithmsExperiment Design and A/B TestingPyTorch Training LoopsDataset Pipelines and Data Quality
๐Ÿ“ฆProduction ML Systems0/6
Feature Engineering for Production MLBatch and Streaming Feature PipelinesGradient Boosted Trees in ProductionRanking and Recommendation SystemsForecasting and Anomaly DetectionMonitoring Predictive Models
๐ŸงชCore LLM Foundations0/8
The Bitter Lesson & ComputeBPE, WordPiece, and SentencePieceStatic to Contextual EmbeddingsPerplexity & Model EvaluationFile Ingestion for AIChunking StrategiesLLM Benchmarks & LimitationsInstruction Tuning & Chat Templates
๐ŸงฐApplied LLM Engineering0/23
Dimensionality Reduction for EmbeddingsCoT, ToT & Self-Consistency PromptingFunction Calling & Tool UseMCP & Tool Protocol StandardsPrompt Injection DefenseResponsible AI GovernanceData Labeling and Human FeedbackEvaluating AI AgentsProduction RAG PipelinesHybrid Search: Dense + SparseReranking and Cross-Encoders for RAGRAG Evaluation for Reliable AnswersLLM-as-a-Judge EvaluationBias & Fairness in LLMsHallucination Detection & MitigationLLM Observability & MonitoringExperiment Tracking with MLflow and W&BMixed Precision TrainingModel Versioning & DeploymentSemantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsModel Gateways, Routing, and FallbacksDesign an Automated Support Agent
๐ŸŽ“Portfolio Capstones0/9
Capstone: Delivery ETA PredictionCapstone: Product RankingCapstone: Demand ForecastingCapstone: Image Damage ClassifierCapstone: Production ML PipelineCapstone: Document QACapstone: Eval DashboardCapstone: Fine-Tuned ClassifierCapstone: Production Agent
๐Ÿง Transformer Deep Dives0/8
Sentence Embeddings & Contrastive LossEmbedding Similarity & QuantizationScaled Dot-Product AttentionVision Transformers and Image EncodersPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNMechanistic InterpretabilityDecoding Strategies: Greedy to Nucleus
๐ŸงฌAdvanced Training & Adaptation0/16
Scaling Laws & Compute-Optimal TrainingPre-training Data at ScaleBuild GPT from Scratch LabContinued Pretraining for Domain ShiftSynthetic Data PipelinesSupervised Fine-Tuning PipelineDistributed Training: FSDP & ZeROLoRA & Parameter-Efficient TuningReward Modeling from Preference DataRLHF & DPO AlignmentConstitutional AI & Red TeamingRLVR & Verifiable RewardsKnowledge Distillation for LLMsModel Merging and Weight InterpolationPrompt Optimization with DSPyRecursive Language Models (RLM)
๐Ÿค–Advanced Agents & Retrieval0/14
Vector DB Internals: HNSW & IVFAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access ControlStructured Output GenerationReAct & Plan-and-ExecuteGuardrails & Safety FiltersCode Generation & SandboxingComputer-Use / GUI / Browser AgentsHuman-in-the-Loop Agent ArchitectureAI Coding Workflow with AgentsAgent Memory & PersistenceAgent Failure & RecoveryMulti-Agent Orchestration
โšกInference & Production Scale0/20
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionPrefix Caching and Prompt CachingFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceModel Parallelism for LLM InferenceModel Quantization: GPTQ, AWQ & GGUFLocal LLM DeploymentSLM Specialization & Edge DeploymentSpeculative DecodingLong Context Window ManagementContext EngineeringMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time ComputeAdvanced MLOps & DevOps for AIGPU Serving & AutoscalingA/B Testing for LLMs
๐Ÿ—๏ธSystem Design Capstones0/9
Content Moderation SystemCode Completion SystemMulti-Tenant LLM PlatformLLM-Powered Search EngineVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models & Image GenerationReal-Time Voice AI AgentReasoning & Test-Time Compute
Back to Topics
LearnPortfolio CapstonesCapstone: Delivery ETA Prediction
โš™๏ธHardMLOps & Deployment

Capstone: Delivery ETA Prediction

Ship a delivery-delay prediction service with time-safe features, threshold gates, API contract, and drift evidence.

9 min read
Learning path
Step 76 of 151 in the full curriculum
Design an Automated Support AgentCapstone: Product Ranking

Capstone: Delivery ETA Prediction

The production ML lessons gave you each component in isolation. This capstone packages them into a service another engineer can evaluate: given an in-transit order at a defined timestamp, estimate late-delivery risk and decide whether the product may show a proactive delay warning.

The product contract is intentionally narrow. The model doesn't promise an exact arrival minute, issue refunds, or change carrier routing. It returns a risk score with a controlled action: normal_tracking, warn_customer, or manual_review when inputs are unreliable.

Delivery ETA capstone pipeline combining point-in-time carrier features, boosted delay model, threshold and freshness gate, prediction API, and drift monitor. Delivery ETA capstone pipeline combining point-in-time carrier features, boosted delay model, threshold and freshness gate, prediction API, and drift monitor.
The service serves a warning only when both feature freshness and held-out delay-risk evidence satisfy the published release contract.

Define the Contract Before Choosing a Model

Use one decision moment: two hours after carrier pickup. Use one label: whether delivery occurred after the promised date. A prediction stored without those definitions can't be replayed later.

Contract fieldPinned value
prediction eventtwo hours after first carrier pickup
labeldelivered after promised end-of-day
score outputlate_risk between zero and one
displayed actionwarn only when threshold passes
unavailable data actionroute to manual_review, no narrow ETA claim

The feature bundle includes route distance, service tier, origin backlog, scan age, weekday, and carrier code. Every field must be reconstructed as of prediction time. The earlier feature-store lesson explained why point-in-time joins protect training from future scans; Feast documents this historical retrieval pattern for production feature data.[1]

Diagram showing Carrier events as of pickup + 2h, Feature bundle v1 freshness checked, Delay model v1 risk score, and Policy gate fresh + threshold?. Diagram showing Carrier events as of pickup + 2h, Feature bundle v1 freshness checked, Delay model v1 risk score, and Policy gate fresh + threshold?.
Carrier events as of pickup + 2h, Feature bundle v1 freshness checked, Delay model v1 risk score, and Policy gate fresh + threshold?.

Establish Baseline and Release Evidence

Your repository artifact should contain this layout:

text
1eta-prediction/ 2 data/ 3 feature_contract.json 4 train_snapshot_manifest.json 5 training/ 6 baseline.py 7 train_booster.py 8 evaluate_slices.py 9 artifacts/ 10 delay_model_v1.json 11 threshold_policy_v1.json 12 metrics_v1.json 13 service/ 14 api.py 15 schemas.py 16 monitoring/ 17 drift_window.py 18 tests/ 19 test_point_in_time_features.py 20 test_warning_gate.py

First fit a rule baseline such as hours_since_last_scan >= 18. Then fit the tree candidate using the same time-ordered splits. XGBoost is a defensible implementation for structured features because its boosted-tree system is designed for sparse, scalable tabular learning.[2] It still must beat the baseline on the exact action policy, not only on a model metric.

Required release rows:

GateRequirement
no feature leakagereplay test excludes post-prediction scans
expedited shipmentsno missed warning in required validation slice
expected warning costbetter than rule baseline
feature freshnessstale scan/backlog returns fallback
API schemamodel, feature, and threshold versions emitted

Make the Decision Executable

The service below focuses on the boundary. A real late_risk would be produced by the trained model artifact; this compact example tests the policy the service must preserve.

eta-release-boundary.py
1from dataclasses import dataclass 2 3@dataclass(frozen=True) 4class Prediction: 5 order_id: str 6 late_risk: float 7 scan_age_hours: float 8 tier: str 9 10POLICY = {"threshold": 0.40, "max_scan_age_hours": 24, "model": "delay_model_v1"} 11 12def route(prediction): 13 if prediction.scan_age_hours > POLICY["max_scan_age_hours"]: 14 return {"action": "manual_review", "reason": "stale_features", "model": POLICY["model"]} 15 if prediction.late_risk >= POLICY["threshold"]: 16 return {"action": "warn_customer", "reason": "late_risk_threshold", "model": POLICY["model"]} 17 return {"action": "normal_tracking", "reason": "below_threshold", "model": POLICY["model"]} 18 19cases = [ 20 Prediction("O-201", 0.62, 5, "expedited"), 21 Prediction("O-202", 0.25, 3, "standard"), 22 Prediction("O-203", 0.81, 31, "standard"), 23] 24for case in cases: 25 print(case.order_id, route(case)["action"], route(case)["reason"])
Output
1O-201 warn_customer late_risk_threshold 2O-202 normal_tracking below_threshold 3O-203 manual_review stale_features

Case O-203 is the key design result. A high model score isn't authority to message a customer when the supporting scan features are stale. The product needs a safe fallback independent of model confidence.

Operate the Service After Release

The deployment emits one row per score: request timestamp, feature version, model version, threshold version, feature freshness, score, action, and eventually the delivery label. Immediate monitoring catches nulls, stale scans, error rate, and score-distribution shift. Delayed monitoring computes missed-warning cost, calibration by score bucket, and slice performance.

Promotion should be an alias move from delay_model_v1 to a separately evaluated candidate. Google Cloud's MLOps guidance describes this separation between validation, metadata, serving, monitoring, and continuous training stages.[3] A triggered retraining job creates evidence; it doesn't silently rewrite live behavior.

Submission Checklist

ArtifactReviewer should verify
feature contractevery field has type, timestamp boundary, and missing policy
training manifesttime split and dataset fingerprint exist
baseline comparisoncandidate improves declared cost without required-slice misses
service APIstale inputs fail to a safer route
monitoring planinput checks and delayed label metrics are distinct
rollback planprior artifact and threshold remain deployable

Mastery Check

Evaluation rubric

ArtifactStrong submission demonstrates
model packagetime-safe feature contract, baseline comparison, and calibrated warning threshold
serviceversioned response trace and safe behavior for missing or stale scans
operationsinput monitoring, delayed-label evaluation, candidate promotion, and rollback

Common Failures

SymptomCauseFix
Warning appears accurate offline but misses live disruptionsfuture scans leaked into trainingenforce as-of tests
Customer receives unsupported ETA warningservice trusts score despite stale inputsgate freshness before action
Team can't reproduce a warningartifact versions absent from response tracelog full release tuple
Next Step
Continue to Capstone: Product Ranking

You have shipped one prediction service with time-safe features and release gates. Next you will ship a ranked marketplace surface whose exposures must be measured as carefully as its scores.

PreviousDesign an Automated Support Agent
Share this article
XFacebookLinkedInBlueskyRedditHacker NewsEmail
References

Feast: Production Feature Store for Machine Learning

Feast Contributors ยท 2024

XGBoost: A Scalable Tree Boosting System.

Chen, T. & Guestrin, C. ยท 2016 ยท KDD 2016

MLOps: Continuous Delivery and Automation Pipelines in Machine Learning.

Google Cloud. ยท 2026 ยท Official documentation