LeetLLM
LearnFeaturesBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Features
  • Blog

Legal

  • Terms of Service
  • Privacy Policy

ยฉ 2026 LeetLLM. All rights reserved.

All Topics
Your Progress
0%

0 of 151 articles completed

๐Ÿ› ๏ธComputing Foundations0/6
NumPy and Tensor ShapesCUDA for ML TrainingMPS & Metal for ML on MacData Structures for AISQL and Data ModelingAlgorithms for ML Engineers
๐Ÿ“ŠMath & Statistics0/8
Gradients and BackpropVectors, Matrices & TensorsLinear Algebra for MLAdam, Momentum, SchedulersProbability for Machine LearningStatistics and UncertaintyDistributions and SamplingHypothesis Tests, Intervals, and pass@k
๐Ÿ“šPreparation & Prerequisites0/13
Neural Networks from ScratchCNNs from ScratchTraining & BackpropagationSoftmax, Cross-Entropy & OptimizationRNNs, LSTMs, GRUs, and Sequence ModelingAutoencoders and VAEsThe Transformer Architecture End-to-EndLanguage Modeling & Next TokensFrom GPT to Modern LLMsPrompt Engineering FundamentalsCalling LLM APIs in ProductionFirst AI App End-to-EndThe LLM Lifecycle
๐ŸงฎML Algorithms & Evaluation0/11
Linear Regression from ScratchLogistic Regression and MetricsDecision Trees, Forests, and BoostingReinforcement Learning BasicsValidation and LeakageClustering and PCACore Retrieval AlgorithmsDecoding AlgorithmsExperiment Design and A/B TestingPyTorch Training LoopsDataset Pipelines and Data Quality
๐Ÿ“ฆProduction ML Systems0/6
Feature Engineering for Production MLBatch and Streaming Feature PipelinesGradient Boosted Trees in ProductionRanking and Recommendation SystemsForecasting and Anomaly DetectionMonitoring Predictive Models
๐ŸงชCore LLM Foundations0/8
The Bitter Lesson & ComputeBPE, WordPiece, and SentencePieceStatic to Contextual EmbeddingsPerplexity & Model EvaluationFile Ingestion for AIChunking StrategiesLLM Benchmarks & LimitationsInstruction Tuning & Chat Templates
๐ŸงฐApplied LLM Engineering0/23
Dimensionality Reduction for EmbeddingsCoT, ToT & Self-Consistency PromptingFunction Calling & Tool UseMCP & Tool Protocol StandardsPrompt Injection DefenseResponsible AI GovernanceData Labeling and Human FeedbackEvaluating AI AgentsProduction RAG PipelinesHybrid Search: Dense + SparseReranking and Cross-Encoders for RAGRAG Evaluation for Reliable AnswersLLM-as-a-Judge EvaluationBias & Fairness in LLMsHallucination Detection & MitigationLLM Observability & MonitoringExperiment Tracking with MLflow and W&BMixed Precision TrainingModel Versioning & DeploymentSemantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsModel Gateways, Routing, and FallbacksDesign an Automated Support Agent
๐ŸŽ“Portfolio Capstones0/9
Capstone: Delivery ETA PredictionCapstone: Product RankingCapstone: Demand ForecastingCapstone: Image Damage ClassifierCapstone: Production ML PipelineCapstone: Document QACapstone: Eval DashboardCapstone: Fine-Tuned ClassifierCapstone: Production Agent
๐Ÿง Transformer Deep Dives0/8
Sentence Embeddings & Contrastive LossEmbedding Similarity & QuantizationScaled Dot-Product AttentionVision Transformers and Image EncodersPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNMechanistic InterpretabilityDecoding Strategies: Greedy to Nucleus
๐ŸงฌAdvanced Training & Adaptation0/16
Scaling Laws & Compute-Optimal TrainingPre-training Data at ScaleBuild GPT from Scratch LabContinued Pretraining for Domain ShiftSynthetic Data PipelinesSupervised Fine-Tuning PipelineDistributed Training: FSDP & ZeROLoRA & Parameter-Efficient TuningReward Modeling from Preference DataRLHF & DPO AlignmentConstitutional AI & Red TeamingRLVR & Verifiable RewardsKnowledge Distillation for LLMsModel Merging and Weight InterpolationPrompt Optimization with DSPyRecursive Language Models (RLM)
๐Ÿค–Advanced Agents & Retrieval0/14
Vector DB Internals: HNSW & IVFAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access ControlStructured Output GenerationReAct & Plan-and-ExecuteGuardrails & Safety FiltersCode Generation & SandboxingComputer-Use / GUI / Browser AgentsHuman-in-the-Loop Agent ArchitectureAI Coding Workflow with AgentsAgent Memory & PersistenceAgent Failure & RecoveryMulti-Agent Orchestration
โšกInference & Production Scale0/20
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionPrefix Caching and Prompt CachingFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceModel Parallelism for LLM InferenceModel Quantization: GPTQ, AWQ & GGUFLocal LLM DeploymentSLM Specialization & Edge DeploymentSpeculative DecodingLong Context Window ManagementContext EngineeringMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time ComputeAdvanced MLOps & DevOps for AIGPU Serving & AutoscalingA/B Testing for LLMs
๐Ÿ—๏ธSystem Design Capstones0/9
Content Moderation SystemCode Completion SystemMulti-Tenant LLM PlatformLLM-Powered Search EngineVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models & Image GenerationReal-Time Voice AI AgentReasoning & Test-Time Compute
Back to Topics
LearnProduction ML SystemsForecasting and Anomaly Detection
๐Ÿ“ŠMediumEvaluation & Benchmarks

Forecasting and Anomaly Detection

Forecast parcel demand with time-aware evaluation and turn large residuals into reviewable operational alerts.

7 min read
Learning path
Step 43 of 151 in the full curriculum
Ranking and Recommendation SystemsMonitoring Predictive Models

A ranking model selects which products appear now. Warehouse planners face a future question: how many parcels will leave a fulfillment center tomorrow, and which observed counts signal an unexpected disruption?

Forecasting differs from ordinary random-split prediction because observations are ordered. Tomorrow isn't exchangeable with last month. A useful evaluation must train on the past and predict a later window, repeatedly if possible.

Warehouse volume forecast timeline showing past daily parcel counts, a weekly seasonal baseline, future forecast interval, and an observed spike sent to operations review. Warehouse volume forecast timeline showing past daily parcel counts, a weekly seasonal baseline, future forecast interval, and an observed spike sent to operations review.
A forecast uses only earlier counts to predict later capacity; anomaly alerts compare new observations with an expected range rather than with intuition.

Establish a Seasonal Baseline

Suppose Saturday parcel volume is normally lower than weekday volume. Predicting tomorrow using yesterday alone will overreact each Friday and Monday. A small baseline can predict each weekday from the same weekday last week.

DayLast week parcelsThis week parcelsWeekly-lag error
Monday1001044
Tuesday112110-2
Wednesday1151194
Thursday118116-2
Friday13216028

The Friday miss might be an anomaly, or it might be a promotion planned outside the dataset. A monitoring system shouldn't announce a warehouse incident from one residual alone; it should surface the deviation with context and a review path.

Hyndman and Athanasopoulos emphasize that forecast accuracy must be evaluated on genuinely future observations, with time-series cross-validation expanding or rolling the training origin forward.[1] That rule prevents tomorrow's demand from influencing the model that claims it could have predicted tomorrow.

Diagram showing Daily parcel counts ordered history, Seasonal baseline same weekday, Future forecast expected range, and Observed volume. Diagram showing Daily parcel counts ordered history, Seasonal baseline same weekday, Future forecast expected range, and Observed volume.
Daily parcel counts ordered history, Seasonal baseline same weekday, Future forecast expected range, and Observed volume.

Calculate a Forecast and Alert Threshold

Mean absolute error (MAE) averages absolute forecast mistakes:

MAE=1nโˆ‘t=1nโˆฃytโˆ’y^tโˆฃ\text{MAE} = \frac{1}{n}\sum_{t=1}^{n}|y_t - \hat{y}_t|MAE=n1โ€‹t=1โˆ‘nโ€‹โˆฃytโ€‹โˆ’y^โ€‹tโ€‹โˆฃ

Here yty_tytโ€‹ is observed volume and y^t\hat{y}_ty^โ€‹tโ€‹ is its forecast. MAE remains in parcels, which makes it understandable to operations.

weekly-lag-forecast-alert.py
1history = [100, 112, 115, 118, 132, 82, 76] 2observed = [104, 110, 119, 116, 160, 84, 78] 3 4forecasts = history[:] # same weekday last week 5errors = [actual - forecast for actual, forecast in zip(observed, forecasts)] 6mae = sum(abs(error) for error in errors) / len(errors) 7alert_limit = 20 8alerts = [ 9 (day, actual, forecast, error) 10 for day, (actual, forecast, error) in enumerate(zip(observed, forecasts, errors), start=1) 11 if abs(error) >= alert_limit 12] 13 14print("MAE parcels:", round(mae, 1)) 15for day, actual, forecast, error in alerts: 16 print(f"alert day={day} observed={actual} forecast={forecast} residual={error:+d}")
Output
1MAE parcels: 6.3 2alert day=5 observed=160 forecast=132 residual=+28

The example gives a baseline and one reviewable alert. In a real system, an alert threshold should reflect historical residual variation and capacity cost. A prediction interval or empirically measured quantile of prior residuals is more defensible than picking 20 without evidence.

Split by Time, Then Backtest

A single future week is fragile evidence. Run several rolling-origin evaluations:

Training historyValidation windowQuestion
Januaryfirst week of Februaryworks after initial launch?
January through Februaryfirst week of Marchadapts after more data?
January through Marchfirst week of Aprilsurvives seasonal shift?

Record MAE by fulfillment center, service tier, weekday, and promotion status. A national average can hide a warehouse that routinely underforecasts peak demand. For inventory decisions, also consider asymmetry: underpredicting demand may cost more than reserving excess capacity.

The target must match the decision. Forecasting shipped parcel count helps staffing; forecasting late parcels helps customer notification; detecting an unusual scan-drop rate helps incident response. Don't combine those labels into one ambiguous "operations score."

Anomalies Need Explanation and Ownership

An anomaly is a residual large enough to justify inspection, not proof that the model or warehouse failed. The Friday spike may correspond to a campaign, a bulk seller import, duplicated event ingestion, or genuine demand growth.

Log an alert artifact:

FieldPurpose
series and warehouseroute investigation
forecast version and training cutoffreproduce expectation
observed value and residualquantify deviation
interval or threshold versionexplain alert policy
known promotion flagreduce false alarms
owner and resolutionteach future models and policies

A model that silently swallows anomalies is hard to operate. A model that pages on every seasonal pattern is equally unhelpful. Forecast evaluation and alert evaluation are related but separate: one measures prediction error, the other measures whether the alert led to useful action.

Mastery Check

Evaluation rubric

EvidenceA production-ready answer demonstrates
temporal evaluationuses rolling or future-window backtests rather than shuffled rows
forecast usemaps prediction intervals and error costs to inventory or capacity decisions
alert policydetects actionable residual anomalies with escalation and review controls

Common Pitfalls

SymptomCauseFix
Test results collapse at launchrandom split leaked future behavioruse rolling-origin validation
Alerts fire every Mondayweekly seasonality absentadd seasonal baseline first
Operations ignores alertsno ownership or contextlog threshold, residual, and resolution
Next Step
Continue to Monitoring Predictive Models

You can now measure forecasts and surface surprising observations. Next you will monitor deployed models for input shift, delayed outcome degradation, and retraining decisions.

PreviousRanking and Recommendation Systems
Share this article
XFacebookLinkedInBlueskyRedditHacker NewsEmail
References

Forecasting: Principles and Practice, Third Edition.

Hyndman, R. J. & Athanasopoulos, G. ยท 2021