LeetLLM
LearnFeaturesBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Features
  • Blog

Legal

  • Terms of Service
  • Privacy Policy

ยฉ 2026 LeetLLM. All rights reserved.

All Topics
Your Progress
0%

0 of 151 articles completed

๐Ÿ› ๏ธComputing Foundations0/6
NumPy and Tensor ShapesCUDA for ML TrainingMPS & Metal for ML on MacData Structures for AISQL and Data ModelingAlgorithms for ML Engineers
๐Ÿ“ŠMath & Statistics0/8
Gradients and BackpropVectors, Matrices & TensorsLinear Algebra for MLAdam, Momentum, SchedulersProbability for Machine LearningStatistics and UncertaintyDistributions and SamplingHypothesis Tests, Intervals, and pass@k
๐Ÿ“šPreparation & Prerequisites0/13
Neural Networks from ScratchCNNs from ScratchTraining & BackpropagationSoftmax, Cross-Entropy & OptimizationRNNs, LSTMs, GRUs, and Sequence ModelingAutoencoders and VAEsThe Transformer Architecture End-to-EndLanguage Modeling & Next TokensFrom GPT to Modern LLMsPrompt Engineering FundamentalsCalling LLM APIs in ProductionFirst AI App End-to-EndThe LLM Lifecycle
๐ŸงฎML Algorithms & Evaluation0/11
Linear Regression from ScratchLogistic Regression and MetricsDecision Trees, Forests, and BoostingReinforcement Learning BasicsValidation and LeakageClustering and PCACore Retrieval AlgorithmsDecoding AlgorithmsExperiment Design and A/B TestingPyTorch Training LoopsDataset Pipelines and Data Quality
๐Ÿ“ฆProduction ML Systems0/6
Feature Engineering for Production MLBatch and Streaming Feature PipelinesGradient Boosted Trees in ProductionRanking and Recommendation SystemsForecasting and Anomaly DetectionMonitoring Predictive Models
๐ŸงชCore LLM Foundations0/8
The Bitter Lesson & ComputeBPE, WordPiece, and SentencePieceStatic to Contextual EmbeddingsPerplexity & Model EvaluationFile Ingestion for AIChunking StrategiesLLM Benchmarks & LimitationsInstruction Tuning & Chat Templates
๐ŸงฐApplied LLM Engineering0/23
Dimensionality Reduction for EmbeddingsCoT, ToT & Self-Consistency PromptingFunction Calling & Tool UseMCP & Tool Protocol StandardsPrompt Injection DefenseResponsible AI GovernanceData Labeling and Human FeedbackEvaluating AI AgentsProduction RAG PipelinesHybrid Search: Dense + SparseReranking and Cross-Encoders for RAGRAG Evaluation for Reliable AnswersLLM-as-a-Judge EvaluationBias & Fairness in LLMsHallucination Detection & MitigationLLM Observability & MonitoringExperiment Tracking with MLflow and W&BMixed Precision TrainingModel Versioning & DeploymentSemantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsModel Gateways, Routing, and FallbacksDesign an Automated Support Agent
๐ŸŽ“Portfolio Capstones0/9
Capstone: Delivery ETA PredictionCapstone: Product RankingCapstone: Demand ForecastingCapstone: Image Damage ClassifierCapstone: Production ML PipelineCapstone: Document QACapstone: Eval DashboardCapstone: Fine-Tuned ClassifierCapstone: Production Agent
๐Ÿง Transformer Deep Dives0/8
Sentence Embeddings & Contrastive LossEmbedding Similarity & QuantizationScaled Dot-Product AttentionVision Transformers and Image EncodersPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNMechanistic InterpretabilityDecoding Strategies: Greedy to Nucleus
๐ŸงฌAdvanced Training & Adaptation0/16
Scaling Laws & Compute-Optimal TrainingPre-training Data at ScaleBuild GPT from Scratch LabContinued Pretraining for Domain ShiftSynthetic Data PipelinesSupervised Fine-Tuning PipelineDistributed Training: FSDP & ZeROLoRA & Parameter-Efficient TuningReward Modeling from Preference DataRLHF & DPO AlignmentConstitutional AI & Red TeamingRLVR & Verifiable RewardsKnowledge Distillation for LLMsModel Merging and Weight InterpolationPrompt Optimization with DSPyRecursive Language Models (RLM)
๐Ÿค–Advanced Agents & Retrieval0/14
Vector DB Internals: HNSW & IVFAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access ControlStructured Output GenerationReAct & Plan-and-ExecuteGuardrails & Safety FiltersCode Generation & SandboxingComputer-Use / GUI / Browser AgentsHuman-in-the-Loop Agent ArchitectureAI Coding Workflow with AgentsAgent Memory & PersistenceAgent Failure & RecoveryMulti-Agent Orchestration
โšกInference & Production Scale0/20
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionPrefix Caching and Prompt CachingFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceModel Parallelism for LLM InferenceModel Quantization: GPTQ, AWQ & GGUFLocal LLM DeploymentSLM Specialization & Edge DeploymentSpeculative DecodingLong Context Window ManagementContext EngineeringMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time ComputeAdvanced MLOps & DevOps for AIGPU Serving & AutoscalingA/B Testing for LLMs
๐Ÿ—๏ธSystem Design Capstones0/9
Content Moderation SystemCode Completion SystemMulti-Tenant LLM PlatformLLM-Powered Search EngineVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models & Image GenerationReal-Time Voice AI AgentReasoning & Test-Time Compute
Back to Topics
LearnProduction ML SystemsFeature Engineering for Production ML
โš™๏ธMediumMLOps & Deployment

Feature Engineering for Production ML

Turn delivery events into stable prediction inputs while preventing leakage and training-serving mismatch.

7 min read
Learning path
Step 39 of 151 in the full curriculum
Dataset Pipelines and Data QualityBatch and Streaming Feature Pipelines

The previous lesson produced versioned datasets with clean splits. A prediction model still can't consume a raw event log directly. To predict whether a parcel will be late, it needs a fixed row of measurements available at the moment the promise is made.

Those measurements are features. A feature such as hours_since_last_scan compresses many warehouse events into one input value. The hard part isn't inventing columns. It's ensuring each value means the same thing during training and while serving live requests.

Delivery ETA feature contract turning shipment events at prediction time into stable age, carrier, distance, and backlog fields while rejecting post-delivery leakage. Delivery ETA feature contract turning shipment events at prediction time into stable age, carrier, distance, and backlog fields while rejecting post-delivery leakage.
A useful feature row contains only information known at prediction time, with the same transformation used for offline training and online serving.

Start With a Prediction Timestamp

Suppose the product asks at 2026-05-01 09:00: will order O-204 arrive later than its promised day? Carrier scans after that timestamp aren't available to the prediction service and can't appear in its training row.

Candidate fieldKnown at prediction time?Use as feature?
carrier codeyesyes, categorical
route distance in kmyesyes, numeric
hours since most recent scanyesyes, numeric
warehouse backlog at originyesyes, numeric
delivered timestampnono, it defines the eventual label
customer complaint after delaynono, it leaks the outcome

The label may be computed later as delivered_after_promise = 1. A feature must be computed from history ending at the prediction timestamp. If a training row contains the later complaint, offline accuracy will reward a model for reading the answer.

Sculley et al. describe production ML systems as networks of data and configuration dependencies where hidden feedback and undeclared consumers create technical debt.[1] Feature definitions are one of those dependencies: when their time boundary is unclear, the model's impressive score doesn't survive deployment.

Diagram showing Shipment events through prediction time, Feature contract types + missing policy, Training snapshot label joins later, and Online request same computation. Diagram showing Shipment events through prediction time, Feature contract types + missing policy, Training snapshot label joins later, and Online request same computation.
Shipment events through prediction time, Feature contract types + missing policy, Training snapshot label joins later, and Online request same computation.

Define One Row Before Training

Build a small contract for late-delivery prediction:

FeatureTypeMissing ruleWhy it can help
distance_kmnumericreject if absentlonger routes allow more disruption
hours_since_last_scannumericcap at 168stale movement signals delay risk
origin_backlognumericuse measured queue onlycongestion affects departure time
carriercategoricalmap unseen to othercarriers have different networks
expeditedbooleandefault false only when source guarantees itservice class changes promise

A missing value is a product decision. Filling missing origin_backlog with zero says "unknown congestion means no congestion," which is rarely defensible. Store an additional origin_backlog_missing indicator or stop scoring until the feed recovers.

Categorical values need a policy too. If a new carrier appears after training, the online encoder can't invent a new model column. An other bucket provides stable behavior while a new model candidate is evaluated.

Build a Leakage Gate

The following tiny transformation creates a feature row using only facts visible at prediction time and leaves post-delivery fields out of the model input.

build-feature-row.py
1from datetime import datetime 2 3prediction_time = datetime.fromisoformat("2026-05-01T09:00:00") 4shipment = { 5 "order_id": "O-204", 6 "carrier": "northline", 7 "distance_km": 620, 8 "origin_backlog": 18, 9 "expedited": False, 10 "last_scan_at": "2026-05-01T01:00:00", 11 "delivered_at": "2026-05-03T14:00:00", 12 "late_complaint": True, 13} 14 15future_only = {"delivered_at", "late_complaint"} 16 17def make_features(row, at): 18 assert future_only.isdisjoint({"carrier", "distance_km", "origin_backlog", "expedited", "last_scan_at"}) 19 last_scan = datetime.fromisoformat(row["last_scan_at"]) 20 return { 21 "carrier": row.get("carrier", "other"), 22 "distance_km": float(row["distance_km"]), 23 "hours_since_last_scan": (at - last_scan).total_seconds() / 3600, 24 "origin_backlog": float(row["origin_backlog"]), 25 "expedited": int(row.get("expedited", False)), 26 } 27 28features = make_features(shipment, prediction_time) 29print(features) 30print("contains future outcome:", any(key in features for key in future_only))
Output
1{'carrier': 'northline', 'distance_km': 620.0, 'hours_since_last_scan': 8.0, 'origin_backlog': 18.0, 'expedited': 0} 2contains future outcome: False

That assertion looks simple, but it expresses a release requirement: no candidate model may train on fields unavailable to live scoring. A stronger pipeline also records data source, event timestamp, transformation version, allowed range, and missing-value rate for each feature.

Parity Is Part of the Model

An offline notebook might compute backlog by scanning a completed daily table. The service might read an hourly cache. Even when both columns are named origin_backlog, differences in freshness or aggregation can change predictions. This failure is training-serving skew.

A feature platform can keep definitions versioned and obtain historical features using point-in-time correct joins. Feast documents this separation between historical retrieval for training and online retrieval for serving.[2] The tool isn't the lesson: the contract is. A model release must identify the feature definition and snapshot that produced its score.

Monitor the contract before monitoring accuracy:

Production checkFailure it catchesAction
null rate by featureupstream feed disappearedfail closed or fallback
unseen-category ratecarrier catalog changedcollect labels and retrain
freshness lagonline values are stalepause promotions
offline/online parity sampletransformations disagreerepair feature path

Mastery Check

Evaluation rubric

EvidenceA production-ready answer demonstrates
prediction contractidentifies prediction time, allowed fields, labels, and missing-value meaning
leakage controlproves future-only events cannot enter feature construction
parity planversions transformations and monitors online/offline disagreement

Common Pitfalls

SymptomCauseFix
Offline score is excellent, live accuracy collapsesfuture event entered featuresenforce prediction timestamps and leakage gates
New carrier causes errors or silent zeroscategorical mapping wasn't versionedreserve other and monitor its rate
Predictions shift after a data job rewritefeature meaning changed without model releaseversion transformation and test parity
Next Step
Continue to Batch and Streaming Feature Pipelines

You can now define one trustworthy prediction row. Next you will construct those rows from event history without joining future information or serving stale features.

PreviousDataset Pipelines and Data Quality
Share this article
XFacebookLinkedInBlueskyRedditHacker NewsEmail
References

Hidden Technical Debt in Machine Learning Systems.

Sculley et al. ยท 2015

Feast: Production Feature Store for Machine Learning

Feast Contributors ยท 2024