LeetLLM
LearnFeaturesBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Features
  • Blog

Legal

  • Terms of Service
  • Privacy Policy

ยฉ 2026 LeetLLM. All rights reserved.

All Topics
Your Progress
0%

0 of 151 articles completed

๐Ÿ› ๏ธComputing Foundations0/6
NumPy and Tensor ShapesCUDA for ML TrainingMPS & Metal for ML on MacData Structures for AISQL and Data ModelingAlgorithms for ML Engineers
๐Ÿ“ŠMath & Statistics0/8
Gradients and BackpropVectors, Matrices & TensorsLinear Algebra for MLAdam, Momentum, SchedulersProbability for Machine LearningStatistics and UncertaintyDistributions and SamplingHypothesis Tests, Intervals, and pass@k
๐Ÿ“šPreparation & Prerequisites0/13
Neural Networks from ScratchCNNs from ScratchTraining & BackpropagationSoftmax, Cross-Entropy & OptimizationRNNs, LSTMs, GRUs, and Sequence ModelingAutoencoders and VAEsThe Transformer Architecture End-to-EndLanguage Modeling & Next TokensFrom GPT to Modern LLMsPrompt Engineering FundamentalsCalling LLM APIs in ProductionFirst AI App End-to-EndThe LLM Lifecycle
๐ŸงฎML Algorithms & Evaluation0/11
Linear Regression from ScratchLogistic Regression and MetricsDecision Trees, Forests, and BoostingReinforcement Learning BasicsValidation and LeakageClustering and PCACore Retrieval AlgorithmsDecoding AlgorithmsExperiment Design and A/B TestingPyTorch Training LoopsDataset Pipelines and Data Quality
๐Ÿ“ฆProduction ML Systems0/6
Feature Engineering for Production MLBatch and Streaming Feature PipelinesGradient Boosted Trees in ProductionRanking and Recommendation SystemsForecasting and Anomaly DetectionMonitoring Predictive Models
๐ŸงชCore LLM Foundations0/8
The Bitter Lesson & ComputeBPE, WordPiece, and SentencePieceStatic to Contextual EmbeddingsPerplexity & Model EvaluationFile Ingestion for AIChunking StrategiesLLM Benchmarks & LimitationsInstruction Tuning & Chat Templates
๐ŸงฐApplied LLM Engineering0/23
Dimensionality Reduction for EmbeddingsCoT, ToT & Self-Consistency PromptingFunction Calling & Tool UseMCP & Tool Protocol StandardsPrompt Injection DefenseResponsible AI GovernanceData Labeling and Human FeedbackEvaluating AI AgentsProduction RAG PipelinesHybrid Search: Dense + SparseReranking and Cross-Encoders for RAGRAG Evaluation for Reliable AnswersLLM-as-a-Judge EvaluationBias & Fairness in LLMsHallucination Detection & MitigationLLM Observability & MonitoringExperiment Tracking with MLflow and W&BMixed Precision TrainingModel Versioning & DeploymentSemantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsModel Gateways, Routing, and FallbacksDesign an Automated Support Agent
๐ŸŽ“Portfolio Capstones0/9
Capstone: Delivery ETA PredictionCapstone: Product RankingCapstone: Demand ForecastingCapstone: Image Damage ClassifierCapstone: Production ML PipelineCapstone: Document QACapstone: Eval DashboardCapstone: Fine-Tuned ClassifierCapstone: Production Agent
๐Ÿง Transformer Deep Dives0/8
Sentence Embeddings & Contrastive LossEmbedding Similarity & QuantizationScaled Dot-Product AttentionVision Transformers and Image EncodersPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNMechanistic InterpretabilityDecoding Strategies: Greedy to Nucleus
๐ŸงฌAdvanced Training & Adaptation0/16
Scaling Laws & Compute-Optimal TrainingPre-training Data at ScaleBuild GPT from Scratch LabContinued Pretraining for Domain ShiftSynthetic Data PipelinesSupervised Fine-Tuning PipelineDistributed Training: FSDP & ZeROLoRA & Parameter-Efficient TuningReward Modeling from Preference DataRLHF & DPO AlignmentConstitutional AI & Red TeamingRLVR & Verifiable RewardsKnowledge Distillation for LLMsModel Merging and Weight InterpolationPrompt Optimization with DSPyRecursive Language Models (RLM)
๐Ÿค–Advanced Agents & Retrieval0/14
Vector DB Internals: HNSW & IVFAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access ControlStructured Output GenerationReAct & Plan-and-ExecuteGuardrails & Safety FiltersCode Generation & SandboxingComputer-Use / GUI / Browser AgentsHuman-in-the-Loop Agent ArchitectureAI Coding Workflow with AgentsAgent Memory & PersistenceAgent Failure & RecoveryMulti-Agent Orchestration
โšกInference & Production Scale0/20
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionPrefix Caching and Prompt CachingFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceModel Parallelism for LLM InferenceModel Quantization: GPTQ, AWQ & GGUFLocal LLM DeploymentSLM Specialization & Edge DeploymentSpeculative DecodingLong Context Window ManagementContext EngineeringMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time ComputeAdvanced MLOps & DevOps for AIGPU Serving & AutoscalingA/B Testing for LLMs
๐Ÿ—๏ธSystem Design Capstones0/9
Content Moderation SystemCode Completion SystemMulti-Tenant LLM PlatformLLM-Powered Search EngineVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models & Image GenerationReal-Time Voice AI AgentReasoning & Test-Time Compute
Back to Topics
LearnProduction ML SystemsBatch and Streaming Feature Pipelines
โš™๏ธMediumMLOps & Deployment

Batch and Streaming Feature Pipelines

Build point-in-time delivery features from events and preserve the same meaning in online serving.

7 min read
Learning path
Step 40 of 151 in the full curriculum
Feature Engineering for Production MLGradient Boosted Trees in Production

The last lesson defined a valid feature row for a late-delivery model. Now the system must produce millions of those rows for training and fresh values for live orders. That requires two paths: a historical batch path and a low-latency streaming or online path.

Both paths must answer the same question: what was known about this shipment at the prediction timestamp? If the batch job sees events that happened later, training receives a cleaner view of history than production ever can.

Point-in-time feature pipeline showing shipment events feeding historical batch rows and online feature updates under the same feature definition and freshness rule. Point-in-time feature pipeline showing shipment events feeding historical batch rows and online feature updates under the same feature definition and freshness rule.
Historical training rows and live online rows can use different infrastructure, but they must share one time-aware feature definition.

Event Time Changes the Join

Consider one order:

TimeEvent
May 1 08:00parcel leaves warehouse
May 1 10:00prediction requested
May 1 13:00carrier records weather disruption
May 2 17:00parcel delivered late

For the 10:00 prediction, the disruption at 13:00 can't become a feature. A normal database join that selects the latest row today will accidentally attach it during retraining. A point-in-time join selects only records with event_time <= prediction_time.

Feast documents point-in-time correct retrieval for historical training features so later values aren't joined onto earlier entities.[1] This is broader than a particular feature-store product: any batch pipeline needs this temporal rule.

Diagram showing Event log event timestamps, Feature definition v1 as-of join, Batch snapshot historical training rows, and Online updater fresh known values. Diagram showing Event log event timestamps, Feature definition v1 as-of join, Batch snapshot historical training rows, and Online updater fresh known values.
Event log event timestamps, Feature definition v1 as-of join, Batch snapshot historical training rows, and Online updater fresh known values.

Compute Historical Rows Correctly

The code below creates hours_since_last_scan for two prediction requests. The scan at 13:00 exists in storage by retraining time, but it can't affect a request scored at 10:00.

point-in-time-features.py
1from datetime import datetime 2 3def dt(value): 4 return datetime.fromisoformat(value) 5 6scans = [ 7 {"order": "O-204", "time": dt("2026-05-01T08:00:00"), "status": "departed"}, 8 {"order": "O-204", "time": dt("2026-05-01T13:00:00"), "status": "weather_delay"}, 9 {"order": "O-204", "time": dt("2026-05-02T17:00:00"), "status": "delivered"}, 10] 11requests = [ 12 {"order": "O-204", "at": dt("2026-05-01T10:00:00")}, 13 {"order": "O-204", "at": dt("2026-05-01T16:00:00")}, 14] 15 16def as_of_scan(order, at): 17 visible = [s for s in scans if s["order"] == order and s["time"] <= at] 18 latest = max(visible, key=lambda s: s["time"]) 19 return latest["status"], int((at - latest["time"]).total_seconds() / 3600) 20 21for request in requests: 22 status, age = as_of_scan(request["order"], request["at"]) 23 print(request["at"].strftime("%H:%M"), status, age)
Output
110:00 departed 2 216:00 weather_delay 3

The same event is correctly absent from the first row and present in the second. A unit test should preserve that behavior permanently: backfilling storage mustn't rewrite what a model could have known in the past.

Batch Builds History, Online Serves Now

A batch job is appropriate for training snapshots, daily aggregate features, and recomputation after a bug fix. It can scan complete partitions and write immutable artifacts. An online path is appropriate for hours_since_last_scan or current warehouse queue length when a customer asks for an ETA right now.

ConcernBatch pathOnline path
purposetrain and evaluate modelsscore current order
latencyminutes or hoursmilliseconds
correctionbackfill a versioned snapshotprocess late event or invalidate cache
evidencesnapshot manifest and statisticsfreshness timestamp and trace

Different stores are acceptable. Different definitions are not. If batch uses a seven-day backlog mean while online serves a one-hour queue count under the same column name, a validation score can't predict live behavior.

Handle Late Events and Freshness

Carrier events arrive late. An event recorded at 11:00 might describe a scan that happened at 09:30. The pipeline should keep both event_time and ingested_at. Historical training can rebuild the row according to event time, while live scoring at 10:00 can only use events ingested before 10:00 unless the product explicitly allows retroactive correction.

Freshness requires a serving policy:

ConditionServing decision
backlog feature updated within 15 minutesscore normally
backlog feature 45 minutes stalescore with fallback and log degraded mode
no carrier scans for expected laneabstain from narrow ETA promise

Google Cloud's production ML guidance separates automated data validation, model validation, pipeline triggers, metadata, and online validation before broad promotion.[2] Feature freshness belongs in that first validation boundary: no model fixes a missing or temporally invalid input.

Mastery Check

Evaluation rubric

EvidenceA production-ready answer demonstrates
temporal correctnessconstructs training rows with as-of joins and explicit event-time boundaries
freshness behaviordistinguishes late data, stale online features, and safe fallback actions
operational replayexplains versioned backfills without silently rewriting historical meaning

Common Pitfalls

SymptomCauseFix
Training gets better after a backfill but serving does notlatest-value leakageuse as-of joins
Online results disagree with offline replaytransformations divergedpublish one feature definition and parity samples
Reliable model emits bad ETAs during upstream lagfreshness wasn't part of serving policytrace age and fail to safer response
Next Step
Continue to Gradient Boosted Trees in Production

You can now generate point-in-time correct feature rows. Next you will train a strong tabular baseline and decide when its validation evidence earns deployment.

PreviousFeature Engineering for Production ML
Share this article
XFacebookLinkedInBlueskyRedditHacker NewsEmail
References

Feast: Production Feature Store for Machine Learning

Feast Contributors ยท 2024

MLOps: Continuous Delivery and Automation Pipelines in Machine Learning.

Google Cloud. ยท 2026 ยท Official documentation