LeetLLM
LearnFeaturesBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Features
  • Blog

Legal

  • Terms of Service
  • Privacy Policy

ยฉ 2026 LeetLLM. All rights reserved.

1๐Ÿ› ๏ธComputing Foundations
NumPy and Tensor ShapesCUDA for ML TrainingMPS & Metal for ML on MacData Structures for AISQL and Data ModelingAlgorithms for ML Engineers
2๐Ÿ“ŠMath & Statistics
Gradients and BackpropVectors, Matrices & TensorsLinear Algebra for MLAdam, Momentum, SchedulersProbability for Machine LearningStatistics and UncertaintyDistributions and SamplingHypothesis Tests, Intervals, and pass@k
3๐Ÿ“šPreparation & Prerequisites
Neural Networks from ScratchCNNs from ScratchTraining & BackpropagationSoftmax, Cross-Entropy & OptimizationRNNs, LSTMs, GRUs, and Sequence ModelingAutoencoders and VAEsThe Transformer Architecture End-to-EndLanguage Modeling & Next TokensFrom GPT to Modern LLMsPrompt Engineering FundamentalsCalling LLM APIs in ProductionFirst AI App End-to-EndThe LLM Lifecycle
4๐ŸงฎML Algorithms & Evaluation
Linear Regression from ScratchLogistic Regression and MetricsDecision Trees, Forests, and BoostingReinforcement Learning BasicsValidation and LeakageClustering and PCACore Retrieval AlgorithmsDecoding AlgorithmsExperiment Design and A/B TestingPyTorch Training LoopsDataset Pipelines and Data Quality
5๐Ÿ“ฆProduction ML Systems
Feature Engineering for Production MLBatch and Streaming Feature PipelinesGradient Boosted Trees in ProductionRanking and Recommendation SystemsForecasting and Anomaly DetectionMonitoring Predictive Models
6๐ŸงชCore LLM Foundations
The Bitter Lesson & ComputeBPE, WordPiece, and SentencePieceStatic to Contextual EmbeddingsPerplexity & Model EvaluationFile Ingestion for AIChunking StrategiesLLM Benchmarks & LimitationsInstruction Tuning & Chat Templates
7๐ŸงฐApplied LLM Engineering
Dimensionality Reduction for EmbeddingsCoT, ToT & Self-Consistency PromptingFunction Calling & Tool UseMCP & Tool Protocol StandardsPrompt Injection DefenseResponsible AI GovernanceData Labeling and Human FeedbackEvaluating AI AgentsProduction RAG PipelinesHybrid Search: Dense + SparseReranking and Cross-Encoders for RAGRAG Evaluation for Reliable AnswersLLM-as-a-Judge EvaluationBias & Fairness in LLMsHallucination Detection & MitigationLLM Observability & MonitoringExperiment Tracking with MLflow and W&BMixed Precision TrainingModel Versioning & DeploymentSemantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsModel Gateways, Routing, and FallbacksDesign an Automated Support Agent
8๐ŸŽ“Portfolio Capstones
Capstone: Delivery ETA PredictionCapstone: Product RankingCapstone: Demand ForecastingCapstone: Image Damage ClassifierCapstone: Production ML PipelineCapstone: Document QACapstone: Eval DashboardCapstone: Fine-Tuned ClassifierCapstone: Production Agent
9๐Ÿง Transformer Deep Dives
Sentence Embeddings & Contrastive LossEmbedding Similarity & QuantizationScaled Dot-Product AttentionVision Transformers and Image EncodersPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNMechanistic InterpretabilityDecoding Strategies: Greedy to Nucleus
10๐ŸงฌAdvanced Training & Adaptation
Scaling Laws & Compute-Optimal TrainingPre-training Data at ScaleBuild GPT from Scratch LabContinued Pretraining for Domain ShiftSynthetic Data PipelinesSupervised Fine-Tuning PipelineDistributed Training: FSDP & ZeROLoRA & Parameter-Efficient TuningReward Modeling from Preference DataRLHF & DPO AlignmentConstitutional AI & Red TeamingRLVR & Verifiable RewardsKnowledge Distillation for LLMsModel Merging and Weight InterpolationPrompt Optimization with DSPyRecursive Language Models (RLM)
11๐Ÿค–Advanced Agents & Retrieval
Vector DB Internals: HNSW & IVFAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access ControlStructured Output GenerationReAct & Plan-and-ExecuteGuardrails & Safety FiltersCode Generation & SandboxingComputer-Use / GUI / Browser AgentsHuman-in-the-Loop Agent ArchitectureAI Coding Workflow with AgentsAgent Memory & PersistenceAgent Failure & RecoveryMulti-Agent Orchestration
12โšกInference & Production Scale
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionPrefix Caching and Prompt CachingFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceModel Parallelism for LLM InferenceModel Quantization: GPTQ, AWQ & GGUFLocal LLM DeploymentSLM Specialization & Edge DeploymentSpeculative DecodingLong Context Window ManagementContext EngineeringMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time ComputeAdvanced MLOps & DevOps for AIGPU Serving & AutoscalingA/B Testing for LLMs
13๐Ÿ—๏ธSystem Design Capstones
Content Moderation SystemCode Completion SystemMulti-Tenant LLM PlatformLLM-Powered Search EngineVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models & Image GenerationReal-Time Voice AI AgentReasoning & Test-Time Compute
14๐ŸŽคAI Lab Interviewing
AI Lab Coding Interview: Python SystemsAI Lab System Design InterviewAI Lab Behavioral InterviewAI Lab Technical Presentation
All 155 articles free ยท Login only tracks progress

Learn LLM Engineering

Master the concepts that power modern AI systems. From foundational transformer architecture to production system design โ€” structured to take you from basics to expert-level.

155 topics14 modules60h total contentOpen curriculum

Step-by-step roadmap

Follow these modules in order. Each step builds directly on the previous one.

  1. 1
    Step 1 of 14

    ๐Ÿ› ๏ธComputing Foundations

    Current step

    NumPy shapes, accelerator basics, data structures, SQL, and algorithmic cost for practical ML systems

    6 topics~1hNext: Step 2
  2. 2
    Step 2 of 14

    ๐Ÿ“ŠMath & Statistics

    Upcoming

    Probability, statistics, distributions, uncertainty, hypothesis testing, bootstrap, and pass@k

    8 topics~2hNext: Step 3
  3. 3
    Step 3 of 14

    ๐Ÿ“šPreparation & Prerequisites

    Upcoming

    Background knowledge for readers new to ML. Skip ahead if you already know neural networks and how models train.

    13 topics~3hNext: Step 4
  4. 4
    Step 4 of 14

    ๐ŸงฎML Algorithms & Evaluation

    Upcoming

    Regression, validation, PCA, retrieval, decoding, experiments, PyTorch loops, and dataset quality

    11 topics~3hNext: Step 5
  5. 5
    Step 5 of 14

    ๐Ÿ“ฆProduction ML Systems

    Upcoming

    Feature pipelines, tabular prediction, ranking, forecasting, monitoring, and continuous training for production ML

    6 topics~1hNext: Step 6
  6. 6
    Step 6 of 14

    ๐ŸงชCore LLM Foundations

    Upcoming

    First working mental models: tokenization, embeddings, evaluation basics, file ingestion, chunking, and instruction-tuned chat

    8 topics~2hNext: Step 7
  7. 7
    Step 7 of 14

    ๐ŸงฐApplied LLM Engineering

    Upcoming

    Practical medium-depth patterns for reasoning, tool use, agents, RAG, evaluation, observability, data, caching, cost, and first product design

    23 topics~6hNext: Step 8
  8. 8
    Step 8 of 14

    ๐ŸŽ“Portfolio Capstones

    Upcoming

    Shippable predictive ML and LLM products: ETA, ranking, forecasting, vision, pipelines, QA, evaluation, classifiers, and agents

    9 topics~2hNext: Step 9
  9. 9
    Step 9 of 14

    ๐Ÿง Transformer Deep Dives

    Upcoming

    Harder internals: sentence embeddings, vector scoring, attention, positions, normalization, and decoding

    8 topics~4hNext: Step 10
  10. 10
    Step 10 of 14

    ๐ŸงฌAdvanced Training & Adaptation

    Upcoming

    Scaling laws, distributed training, fine-tuning, alignment, rewards, distillation, merging, and prompt optimization

    16 topics~9hNext: Step 11
  11. 11
    Step 11 of 14

    ๐Ÿค–Advanced Agents & Retrieval

    Upcoming

    Production-grade retrieval and agent systems: vector indexes, GraphRAG, security, orchestration, memory, HITL, and failure recovery

    14 topics~9hNext: Step 12
  12. 12
    Step 12 of 14

    โšกInference & Production Scale

    Upcoming

    Serving architecture, KV cache mechanics, batching, quantization, long context, advanced architectures, deployment, and experiments

    20 topics~11hNext: Step 13
  13. 13
    Step 13 of 14

    ๐Ÿ—๏ธSystem Design Capstones

    Upcoming

    End-to-end hard system design breakdowns for real AI products

    9 topics~6hNext: Step 14
  14. 14
    Step 14 of 14

    ๐ŸŽคAI Lab Interviewing

    Upcoming

    Final interview practice for frontier AI labs: Python systems, design, behavioral evidence, and technical presentation

    4 topics~1hFinal step
๐Ÿ› ๏ธ

Computing Foundations

NumPy shapes, accelerator basics, data structures, SQL, and algorithmic cost for practical ML systems

1

NumPy and Tensor Shapes

A beginner-focused NumPy chapter that teaches axis naming, indexing, broadcasting, reductions, reshape vs transpose, attention score shapes, keepdims safety, and shape assertions.

Easy17m
2

CUDA for ML Training

Build beginner-first CUDA intuition for model training: CPU vs GPU roles, host-device copies, asynchronous execution, PyTorch device placement, and first-line debugging of OOM and performance issues.

Easy14m
3

MPS & Metal for ML on Mac

Build beginner-first intuition for training on Apple silicon: what Metal and MPS are, why unified memory changes the CUDA mental model, how PyTorch exposes the `mps` device, how to check availability, where CPU fallback appears, and how synchronization and memory pressure still shape performance.

Easy13m
4

Data Structures for AI

A beginner-first data-structures chapter that starts with a list scan, then teaches inverted indexes, heaps, queues, and caches through one support-search story.

Easy15m
5

SQL and Data Modeling

Turn an in-memory support retriever into durable SQL tables. Create rows and keys, query with parameters and joins, enforce permissions, use transactions, inspect indexes, and see where pgvector fits.

Easy12m
6

Algorithms for ML Engineers

Learn to count retrieval work, express growth with Big-O, avoid wasteful selection and pairwise loops, and enforce a latency budget with runnable Python.

Easy11m
๐Ÿ“Š

Math & Statistics

Probability, statistics, distributions, uncertainty, hypothesis testing, bootstrap, and pass@k

1

Gradients and Backprop

Learn why training works by nudging one delivery-time weight, tracing and summing chain-rule paths, checking gradients, and confirming them with PyTorch.

Easy13m
2

Vectors, Matrices & Tensors

Turn one gradient vector into batches of model inputs while learning dot products, matrix transforms, tensor axes, and shape debugging.

Easy12m
3

Linear Algebra for ML

Find hidden directions in a support-ticket matrix with SVD, then use rank, PCA, truncation, and condition numbers without losing sight of what the numbers mean.

Easy12m
4

Adam, Momentum, Schedulers

Trace SGD, momentum, Adam, AdamW, schedules, and gradient clipping on one uneven loss surface. Learn what each optimizer buffer measures and how to validate a training choice.

Easy15m
5

Probability for Machine Learning

A beginner-first probability article that teaches events, priors, conditional probability, independence, Bayes rule, and base-rate mistakes through one e-commerce order-risk detector story.

Easy19m
6

Statistics and Uncertainty

Estimate fraud risk in a flagged review queue from finite labels, using bootstrap intuition, score intervals, sampling bias checks, and calibrated reporting.

Easy12m
7

Distributions and Sampling

Model an e-commerce support agent with binary outcomes, request intents, tool-call counts, and tail latency, then challenge each simulation before trusting it.

Easy12m
8

Hypothesis Tests, Intervals, and pass@k

Compare an order-operations coding assistant with paired evidence, uncertainty for lift, and pass@k under a fixed sampling budget.

Easy11m
๐Ÿ“š

Preparation & Prerequisites

Background knowledge for readers new to ML. Skip ahead if you already know neural networks and how models train.

1

Neural Networks from Scratch

Trace a delivery-risk network from one neuron to a batched NumPy forward pass, then diagnose activation, shape, scale, and numerical-stability failures.

Easy11m
2

CNNs from Scratch

Trace a CNN over a damaged-package photo patch: shared kernels, feature-map shapes, pooling, padding failures, and a NumPy-to-PyTorch forward pass.

Easy10m
3

Training & Backpropagation

Follow a shipment-delay model through prediction, loss, gradients, parameter updates, scalar autograd, mini-batches, validation checks, and PyTorch.

Easy11m
4

Softmax, Cross-Entropy & Optimization

Turn raw action scores into stable probabilities and a useful learning signal, then apply the same loss to next-token predictions.

Easy12m
5

RNNs, LSTMs, GRUs, and Sequence Modeling

Trace an RNN over ordered events, see why gradients fade or grow, and use LSTM and GRU gates to control memory.

Easy11m
6

Autoencoders and VAEs

Compress a claim-photo patch, turn its latent code into a sampleable distribution, and implement VAE loss and training.

Easy12m
7

The Transformer Architecture End-to-End

Trace a support reply through masked attention, a decoder block, and next-token logits with readable NumPy and PyTorch code.

Easy12m
8

Language Modeling & Next Tokens

Learn how next-token prediction becomes a trainable language model, from bigram counts and neural n-grams to causal Transformer generation and KV-cache serving.

Easy24m
9

From GPT to Modern LLMs

Trace how decoder-only models grew into modern LLMs, then inspect scaling, instruction tuning, open weights, MoE, and serving tradeoffs with runnable examples.

Easy21m
10

Prompt Engineering Fundamentals

Build and test grounded prompts with clear roles, few-shot examples, structured outputs, evidence checks, and failure-focused evaluation.

Easy17m
11

Calling LLM APIs in Production

Turn a grounded prompt into a reliable API boundary with server-side secrets, typed results, bounded retries, safe actions, and useful telemetry.

Easy15m
12

First AI App End-to-End

Ship one traceable return-decision workflow: validated input, model boundary, stored status, clear UI states, failure tests, and deploy checks.

Easy12m
13

The LLM Lifecycle

Follow one return-decision assistant from base-model training to post-training, retrieval, serving, evaluation, and the fix chosen after a real failure.

Easy13m
๐Ÿงฎ

ML Algorithms & Evaluation

Regression, validation, PCA, retrieval, decoding, experiments, PyTorch loops, and dataset quality

1

Linear Regression from Scratch

Fit return-assistant latency by hand, implement least squares and gradient descent in NumPy, then test failure cases and held-out behavior.

Medium20m
2

Logistic Regression and Metrics

Route damaged-return requests with logistic regression from scratch: derive sigmoid and log loss, fit NumPy weights, select a cost-aware threshold on validation data, audit ranking and calibration, then compare with scikit-learn.

Medium24m
3

Decision Trees, Forests, and Boosting

Model damaged-return review with decision trees from scratch: compute impurity, test a non-perfect stump on held-out cases, compare forests and boosting, and audit feature explanations.

Medium17m
4

Reinforcement Learning Basics

Learn reinforcement learning through the damaged-return workflow from earlier lessons. Define an MDP, compute discounted returns and Bellman backups, implement value iteration and Q-learning, model abandonment risk, and connect policy gradients to LLM post-training.

Medium13m
5

Validation and Leakage

Make model and policy claims honestly: define the decision moment, split return episodes by time and customer, expose feature and preprocessing leakage, and audit LLM evaluation contamination.

Medium15m
6

Clustering and PCA

Inspect unlabeled support-message embeddings with k-means and PCA, then stress-test whether apparent neighborhoods survive scale, metric, and compression choices.

Medium17m
7

Core Retrieval Algorithms

Build and evaluate the evidence-selection stage of a support assistant with BM25, dense similarity, rank fusion, reranking, and approximate search audits.

Medium16m
8

Decoding Algorithms

Turn retrieved evidence into controlled text by implementing stable softmax, sampling filters, constrained decoding, beam search, and reproducible generation audits.

Medium15m
9

Experiment Design and A/B Testing

Design a trustworthy online experiment for an AI support change: randomize customers, measure useful outcomes, quantify uncertainty, and reject false wins.

Medium19m
10

PyTorch Training Loops

Build a PyTorch classifier from raw logits through autograd, validation, and reloadable checkpoints.

Medium16m
11

Dataset Pipelines and Data Quality

Build versioned AI datasets with schema gates, grouped splits, contamination checks, and auditable receipts.

Medium16m
๐Ÿ“ฆ

Production ML Systems

Feature pipelines, tabular prediction, ranking, forecasting, monitoring, and continuous training for production ML

1

Feature Engineering for Production ML

Turn delivery events into stable prediction inputs while preventing leakage and training-serving mismatch.

Medium9m
2

Batch and Streaming Feature Pipelines

Build point-in-time delivery features from events and preserve the same meaning in online serving.

Medium9m
3

Gradient Boosted Trees in Production

Train a boosted ETA-risk baseline from tabular features, evaluate slices, and package deployment evidence.

Medium12m
4

Ranking and Recommendation Systems

Rank products for a shopper using candidate retrieval, relevance metrics, and feedback-loop safeguards.

Medium12m
5

Forecasting and Anomaly Detection

Forecast parcel demand with time-aware evaluation and turn large forecast errors into reviewable operational alerts.

Medium14m
6

Monitoring Predictive Models

Monitor predictive models from feature freshness through delayed labels, then gate retraining, promotion, and rollback.

Medium14m
๐Ÿงช

Core LLM Foundations

First working mental models: tokenization, embeddings, evaluation basics, file ingestion, chunking, and instruction-tuned chat

1

The Bitter Lesson & Compute

Use Sutton's Bitter Lesson to compare rules, learning, and search through a measured support-ticket routing lab.

Medium13m
2

BPE, WordPiece, and SentencePiece

Build a small subword tokenizer, compare BPE, WordPiece, and SentencePiece, then audit token cost and Unicode behavior.

Medium17m
3

Static to Contextual Embeddings

Turn token IDs into vectors, learn what nearby usage captures, and see why a word such as charge needs sentence-dependent representations.

Medium13m
4

Perplexity & Model Evaluation

Compute perplexity from held-out token probabilities, compare models under a fixed protocol, normalize across tokenizers, and decide what PPL can't tell you.

Medium15m
5

File Ingestion for AI

Turn PDFs, scans, HTML, and Markdown into faithful evidence records with provenance and quality gates before retrieval.

Medium10m
6

Chunking Strategies

Turn clean documents into retrieval units that preserve answers, citations, and measurable search quality.

Medium12m
7

LLM Benchmarks & Limitations

Build an evaluation suite for a policy-answering LLM: score evidence use, understand public benchmark contracts, control judge bias, and make release decisions from private tests.

Medium19m
8

Instruction Tuning & Chat Templates

Teach a base language model to answer as an assistant: curate grounded SFT rows, serialize chat turns exactly, choose loss targets, pack safely, and detect serving-time template drift.

Medium16m
๐Ÿงฐ

Applied LLM Engineering

Practical medium-depth patterns for reasoning, tool use, agents, RAG, evaluation, observability, data, caching, cost, and first product design

1

Dimensionality Reduction for Embeddings

Shrink and inspect embedding indexes without guessing: measure recall while testing PCA, projections, native shortening, and quantization.

Medium20m
2

CoT, ToT & Self-Consistency Prompting

Build and evaluate reasoning controllers: single traces, answer voting, and bounded tree search for multi-step LLM decisions.

Medium14m
3

Function Calling & Tool Use

Build a safe tool-calling runtime that validates model requests, executes controlled actions, feeds observations back, and evaluates complete workflows.

Medium13m
4

MCP & Tool Protocol Standards

Move from local function calls to reusable MCP capability servers by tracing one real session, building a working stdio integration, and enforcing trust boundaries.

Medium18m
5

Prompt Injection Defense

Build a prompt-injection-resistant agent boundary: quarantine untrusted tool content, validate typed action proposals, require approval, and measure unsafe side effects.

Medium15m
6

Responsible AI Governance

Turn a tool-bearing LLM workflow into auditable evidence: classify its use, own risks, version controls, preserve traces, and gate releases.

Medium18m
7

Data Labeling and Human Feedback

Build a trustworthy human-feedback data flywheel: redact traces, write rubrics, measure agreement, select useful examples, prevent leakage, and promote versioned datasets.

Medium16m
8

Evaluating AI Agents

Evaluate ShopFlow refund-agent runs by final state, observable trace, safety gates, cost, and repeatability, then map private tests to public benchmarks.

Medium18m
9

Production RAG Pipelines

Design a secure, traceable RAG service around versioned policy evidence, grounded answers, abstention, release gates, and latency budgets.

Medium16m
10

Hybrid Search: Dense + Sparse

Upgrade a permission-safe RAG retriever with BM25, semantic scores, rank fusion, and recall gates for exact codes and paraphrased policy questions.

Medium17m
11

Reranking and Cross-Encoders for RAG

Turn a permission-safe hybrid candidate list into precise context using cross-encoder reasoning, ordering metrics, latency gates, and traceable evidence selection.

Medium13m
12

RAG Evaluation for Reliable Answers

Evaluate a permission-safe RAG answer trace with context, claim, citation, failure-attribution, and release gates before automating softer judgments.

Medium14m
13

LLM-as-a-Judge Evaluation

Add calibrated soft judgments to a RAG evaluation trace without letting an LLM override deterministic evidence gates.

Medium16m
14

Bias & Fairness in LLMs

Build a matched-pair fairness audit for an LLM judge, measure routing gaps, and block release when evidence is too weak.

Medium16m
15

Hallucination Detection & Mitigation

Build a claim-level grounding gate for delivery updates that verifies evidence, catches confident fabrication, abstains safely, and records release traces.

Medium14m
16

LLM Observability & Monitoring

Turn claim-level answer traces into production metrics, actionable alerts, privacy-safe debugging records, and reproducible incident evidence.

Medium16m
17

Experiment Tracking with MLflow and W&B

Turn a live LLM regression into a reproducible candidate decision by logging inputs, metrics, artifacts, and promotion evidence.

Medium15m
18

Mixed Precision Training

Measure how FP16 and BF16 affect training range, update precision, memory, and release evidence before enabling faster low-precision compute.

Medium14m
19

Model Versioning & Deployment

Turn an evaluated LLM change into an immutable release bundle, promote it through measured traffic, and roll back without losing lineage.

Medium16m
20

Semantic Caching & Cost Optimization

Reuse stable policy answers across paraphrased questions without crossing release, access, or freshness boundaries; then prove the cache is both safe and worth serving.

Medium15m
21

LLM Cost Engineering & Token Economics

Build an auditable LLM cost ledger from usage traces, cache decisions, output contracts, offline batch work, and release budget gates.

Medium16m
22

Model Gateways, Routing, and Fallbacks

Turn an audited cost contract into a model gateway that preserves data, schema, review, and budget requirements across routing and fallback.

Medium16m
23

Design an Automated Support Agent

Assemble a stateful support agent that grounds replies, gates refund actions, preserves gateway policy, and hands difficult cases to humans.

Medium16m
๐ŸŽ“

Portfolio Capstones

Shippable predictive ML and LLM products: ETA, ranking, forecasting, vision, pipelines, QA, evaluation, classifiers, and agents

1

Capstone: Delivery ETA Prediction

Ship a delivery-delay warning service with as-of features, versioned policy gates, baseline evidence, and monitored fallback.

Hard11m
2

Capstone: Product Ranking

Ship a marketplace ranking candidate with eligible retrieval, separate recall and NDCG gates, replayable exposure rows, and an A/B-ready rollback receipt.

Hard11m
3

Capstone: Demand Forecasting

Ship a demand forecast and capacity-alert artifact with rolling backtests, alert review, and retraining policy.

Hard11m
4

Capstone: Image Damage Classifier

Ship a damaged-package photo triage service with quality gates, slice evaluation, serving bundles, and review monitoring.

Hard13m
5

Capstone: Production ML Pipeline

Assemble predictive ML artifacts into validated training, registry promotion, canary monitoring, and rollback.

Hard10m
6

Capstone: Document QA

Ship the policy-evidence service required by a support agent: approved ingestion, cited answers, abstention, and dashboard-ready eval rows.

Hard21m
7

Capstone: Eval Dashboard

Build a release dashboard for document QA that turns grounded-answer rows into slice gates, uncertainty checks, and inspectable decisions.

Hard21m
8

Capstone: Fine-Tuned Classifier

Train and gate a support-escalation encoder that hands safe intake decisions to a production agent.

Hard21m
9

Capstone: Production Agent

Assemble classifier intake, cited policy evidence, approval-gated actions, and episode release tests into a production agent.

Hard20m
๐Ÿง 

Transformer Deep Dives

Harder internals: sentence embeddings, vector scoring, attention, positions, normalization, and decoding

1

Sentence Embeddings & Contrastive Loss

Learn how contrastive losses train sentence embeddings, why hard negatives matter, and how retrieval systems combine bi-encoders, rerankers, and dimension tradeoffs.

Hard36m
2

Embedding Similarity & Quantization

Learn vector scoring contracts, evaluate Matryoshka widths, and measure scalar, product, and binary quantization before shipping compressed retrieval.

Hard38m
3

Scaled Dot-Product Attention

Learn scaled dot-product attention from first principles, including Q/K/V routing, variance scaling, masks, multi-head shapes, KV-cache costs, and FlashAttention.

Hard38m
4

Vision Transformers and Image Encoders

Understand how Vision Transformers split images into patches, build visual tokens, train encoders, and connect to CLIP and multimodal LLMs.

Hard23m
5

Positional Encoding: RoPE & ALiBi

Understand why transformers need position information, how sinusoidal encodings work, how RoPE and ALiBi encode relative position, and why long-context extrapolation needs careful evaluation.

Hard33m
6

Layer Normalization: Pre-LN vs Post-LN

Understand LayerNorm mechanics, Pre-LN versus Post-LN placement, RMSNorm simplification, gradient stability, and hybrid normalization layouts for deep transformers.

Hard26m
7

Mechanistic Interpretability

Learn how sparse autoencoders decompose transformer activations into candidate interpretable features, support circuit tracing, and enable controlled activation-steering experiments.

Hard29m
8

Decoding Strategies: Greedy to Nucleus

Compare decoding strategies for text generation: greedy, beam search, top-k, nucleus (top-p), temperature, repetition controls, and newer variants like min-p.

Hard37m
๐Ÿงฌ

Advanced Training & Adaptation

Scaling laws, distributed training, fine-tuning, alignment, rewards, distillation, merging, and prompt optimization

1

Scaling Laws & Compute-Optimal Training

Learn the empirical power laws governing LLM performance, from Kaplan's parameter-heavy frontier through Chinchilla-optimal ratios to modern inference-aware training strategies.

Hard38m
2

Pre-training Data at Scale

Understand how web-scale pre-training data is extracted, filtered, deduplicated, mixed, tokenized, and packed into training-ready shards, including decontamination, late-stage annealing, and synthetic-data tradeoffs.

Hard38m
3

Build GPT from Scratch Lab

Build and train a tiny GPT end to end on Shakespeare: tokenize with GPT-style subwords, remap active token IDs, run causal self-attention, track validation loss, save a checkpoint, and sample text.

Hard21m
4

Continued Pretraining for Domain Shift

Learn when to keep the causal language-modeling objective and continue pretraining on domain text instead of jumping straight to SFT, and how to evaluate the trade-off against forgetting, cost, and downstream gain.

Hard22m
5

Synthetic Data Pipelines

Build synthetic post-training data pipelines with Self-Instruct, Evol-Instruct, calibrated judge signals, verifiers, preference pairs, diversity checks, and decontamination.

Hard27m
6

Supervised Fine-Tuning Pipeline

Run supervised fine-tuning as a real training system: choose the learning objective before the update surface, verify response-token loss and packing, track the real batch budget, save resumable checkpoints, and export on held-out behavior.

Hard24m
7

Distributed Training: FSDP & ZeRO

Understand ZeRO stages, current FSDP1 vs FSDP2 guidance, and when native PyTorch or DeepSpeed is the right choice for large-model training.

Hard44m
8

LoRA & Parameter-Efficient Tuning

Understand the mathematics of Low-Rank Adaptation (LoRA), modern adapter targeting strategies, and the real memory tradeoffs compared to full fine-tuning and QLoRA.

Hard36m
9

Reward Modeling from Preference Data

Train reward models as a first-class post-training stage: validate chosen/rejected pairs and splits, fit a scalar reward head with Bradley-Terry loss, audit generalization, and decide when explicit rewards are worth the extra complexity.

Hard19m
10

RLHF & DPO Alignment

Understand the RLHF pipeline and DPO, including reward modeling, PPO mechanics, and the trade-offs between iterative reinforcement learning and direct preference optimization.

Hard37m
11

Constitutional AI & Red Teaming

Understand how Constitutional AI reduces reliance on repeated human preference labeling through AI critique and ranking, and how automated red teaming stress-tests those safeguards.

Hard33m
12

RLVR & Verifiable Rewards

Understand RLVR, a post-training approach that uses programmatic verification instead of learned human-preference rewards to improve checked outcomes in math, code, and other contract-driven tasks.

Hard40m
13

Knowledge Distillation for LLMs

Understand the main forms of knowledge distillation for LLMs, from logit matching and response-based supervision to on-policy KD. Learn when distillation helps, where student capacity becomes the bottleneck, and how to implement a correct teacher-student training loop.

Hard33m
14

Model Merging and Weight Interpolation

Learn model merging techniques, from simple weight averaging and task arithmetic to TIES-Merging and DARE, including practical guidance on tokenizer compatibility, mergekit workflows, and evaluation.

Hard36m
15

Prompt Optimization with DSPy

Move beyond manual prompt editing. Use DSPy to search prompt and few-shot candidates from data, then release only after held-out evaluation.

Hard29m
16

Recursive Language Models (RLM)

Learn Recursive Language Models (RLMs): keep long context in a programmable environment, delegate targeted sub-calls, and release the design only after measured quality, cost, and safety checks.

Hard43m
๐Ÿค–

Advanced Agents & Retrieval

Production-grade retrieval and agent systems: vector indexes, GraphRAG, security, orchestration, memory, HITL, and failure recovery

1

Vector DB Internals: HNSW & IVF

Learn how approximate nearest neighbor indexes use HNSW, IVF, and Product Quantization to balance speed, recall, and memory in production vector databases.

Hard38m
2

Advanced RAG: HyDE & Self-RAG

Learn how query rewriting, HyDE, Self-RAG, and Corrective RAG change retrieval control, and how to evaluate their cost and evidence quality.

Hard34m
3

GraphRAG & Knowledge Graphs

Learn how GraphRAG uses entity graphs, hierarchical community reports, and embeddings to retrieve evidence for relationship-heavy and corpus-level questions.

Hard36m
4

RAG Security & Access Control

Learn how document ACLs, tenant isolation, retrieval-time authorization, output checks, and audit logs reduce private-data leakage risk in enterprise RAG.

Hard38m
5

Structured Output Generation

Build reliable LLM interfaces with JSON mode, structured outputs, schema validation, and grammar-guided decoding.

Hard40m
6

ReAct & Plan-and-Execute

Compare ReAct for tightly coupled tool use with Plan-and-Execute for longer workflows with explicit planning and replanning.

Hard33m
7

Guardrails & Safety Filters

Build layered guardrails for prompt injection defense, sensitive-data controls, structured outputs, policy enforcement, and safe tool use.

Hard40m
8

Code Generation & Sandboxing

Build code agents that test candidate patches inside bounded sandboxes with runtime evidence and defense-in-depth controls.

Hard35m
9

Computer-Use / GUI / Browser Agents

Build browser and desktop agents whose proposed clicks and keystrokes remain behind host policy, approval, verification, and sandbox controls.

Hard30m
10

Human-in-the-Loop Agent Architecture

Build approval gates, durable checkpoints, and guarded resumes for agent actions that change real-world state.

Hard37m
11

AI Coding Workflow with Agents

Scope coding-agent tasks, isolate execution, keep patches on branches, verify behavior, and preserve human merge ownership.

Hard27m
12

Agent Memory & Persistence

Design agent memory systems with scoped storage, sourced recall, tenant isolation, and durable checkpoints without letting recalled context authorize side effects.

Hard35m
13

Agent Failure & Recovery

Learn how to implement validation gates, retries, checkpointed recovery, state reconciliation, loop breakers, and graceful degradation when LLM agents hallucinate, stall, or drift from their tools.

Hard50m
14

Multi-Agent Orchestration

Master multi-agent orchestration with LangGraph, AutoGen teams, and OpenAI handoffs. Learn DAG-style routing, typed shared state, protocol boundaries, and human-in-the-loop controls for reliable AI systems.

Hard44m
โšก

Inference & Production Scale

Serving architecture, KV cache mechanics, batching, quantization, long context, advanced architectures, deployment, and experiments

1

Inference: TTFT, TPS & KV Cache

Understand the two-phase inference process (prefill vs decode), derive the KV cache memory formula, and learn production optimizations like chunked prefill and prefill/decode disaggregation.

Hard31m
2

Multi-Query & Grouped-Query Attention

Compare MHA, MQA, and GQA architectures, calculate their KV cache footprint, and reason about memory-limited serving tradeoffs.

Hard38m
3

KV Cache & PagedAttention

Calculate KV cache capacity, trace paged block allocation, and separate memory packing from prefix reuse and scheduling tradeoffs.

Hard36m
4

Prefix Caching and Prompt Caching

Structure exact reusable prefixes, validate cache hits from usage fields, and enforce invalidation and tenant-isolation boundaries.

Hard19m
5

FlashAttention & Memory Efficiency

Understand how FlashAttention cuts auxiliary attention memory from O(nยฒ) to O(n) with tiling and online softmax, and analyze its IO complexity.

Hard33m
6

Continuous Batching & Scheduling

Understand how LLM schedulers use continuous batching, chunked prefill, and prefill-decode disaggregation to improve throughput without violating TTFT or inter-token latency targets.

Hard34m
7

Scaling LLM Inference

Explains why decode-heavy LLM serving is often memory-bound and how KV-cache design, batching, PagedAttention, and speculative decoding improve scale.

Hard41m
8

Model Parallelism for LLM Inference

Learn tensor parallelism, pipeline parallelism, sequence parallelism, and how multi-GPU serving trades memory capacity for communication overhead.

Hard20m
9

Model Quantization: GPTQ, AWQ & GGUF

Understand how GPTQ, AWQ, and GGUF trade off accuracy, memory footprint, and portability when serving LLMs on GPUs or local hardware.

Hard36m
10

Local LLM Deployment

Plan local LLM deployment with model size, quantization, pruning and sparsity trade-offs, Docker packaging, runtime choice, and hardware budgets.

Hard17m
11

SLM Specialization & Edge Deployment

Distill large teachers into compact SLMs using MobileLLM architectures and Phi-style data recipes. Compile and run them on-device with MLC LLM, ONNX Runtime, Core ML, and ExecuTorch while respecting power, thermal, and strict privacy constraints.

Hard26m
12

Speculative Decoding

Reduce LLM inter-token latency by pairing cheap drafting with target-model verification. Learn the rejection-sampling proof, speedup model, method choices, and production rollout gates.

Hard34m
13

Long Context Window Management

Master long-context LLM engineering: KV-cache math, prefill-vs-decode bottlenecks, RoPE scaling, lost-in-the-middle behavior, and long-context vs. RAG trade-offs.

Hard35m
14

Context Engineering

Move past fitting tokens into the window and learn the discipline of context engineering: curating the smallest high-signal token set, fighting context rot, and applying write, select, compress, and isolate strategies plus tool-result pruning and sub-agent isolation.

Hard23m
15

Mixture of Experts Architecture

Master MoE routing, load balancing, and understand why models like Mixtral and DeepSeek deliver strong capacity-per-compute tradeoffs compared with dense architectures.

Hard35m
16

Mamba & State Space Models

Master linear-time sequence modeling: from S4 and HiPPO to Mamba's selective recurrence, Mamba-2's SSD framework, Mamba-3's inference-first refinements, and modern hybrid Transformer-SSM designs.

Hard35m
17

Reasoning & Test-Time Compute

Understand how reasoning models trade extra inference compute for better answers, and what that means for search, verifiers, KV cache pressure, and routing.

Hard41m
18

Advanced MLOps & DevOps for AI

Master advanced MLOps and DevOps patterns for LLM systems: GitOps for prompts and models, feature stores for embedding features, automated rollback on eval regression, shadow traffic, and production-grade model registries.

Hard22m
19

GPU Serving & Autoscaling

Master the design of GPU serving infrastructure for LLMs with autoscaling, continuous batching, and cost optimization.

Hard49m
20

A/B Testing for LLMs

Master the design of an A/B testing framework for LLM-powered features, including traffic routing, metric selection, sample sizing, and automated guardrails.

Hard45m
๐Ÿ—๏ธ

System Design Capstones

End-to-end hard system design breakdowns for real AI products

1

Content Moderation System

Master the architecture of a real-time content moderation system using LLMs and specialized classifiers.

Hard36m
2

Code Completion System

Design a real-time code completion path with context construction, measured serving latency, privacy controls, and stale-result suppression.

Hard42m
3

Multi-Tenant LLM Platform

Design a shared LLM platform with tenant-scoped state, quota enforcement, adapter routing, KV accounting, and measured GPU utilization.

Hard35m
4

LLM-Powered Search Engine

Master the architecture of an end-to-end AI search engine, covering freshness routing, hybrid retrieval, evidence packing, citation verification, and streaming synthesis.

Hard34m
5

Vision-Language Models & CLIP

Master CLIP's contrastive pre-training, zero-shot classification, visual token budgets, and the architecture of modern published VLMs like LLaVA, BLIP-2, and Qwen-VL.

Hard40m
6

Multimodal LLM Architecture

Deep dive into multimodal LLM architecture covering encoders, projector designs, fusion patterns, training recipes, visual token budgets, and serving constraints like KV cache growth.

Hard36m
7

Diffusion Models & Image Generation

Master diffusion models from the forward noising process and DDPM training to Classifier-Free Guidance, latent diffusion, DiT backbones, and fast sampling trade-offs.

Hard39m
8

Real-Time Voice AI Agent

Master real-time voice AI architecture: turn detection, streaming STT/LLM/TTS, native audio trade-offs, WebRTC transport, and barge-in state.

Hard38m
9

Reasoning & Test-Time Compute

Design a production reasoning agent that routes by difficulty, evaluates candidate work, requires evidence before release, and survives serving bottlenecks like key-value (KV) cache growth.

Hard43m
๐ŸŽค

AI Lab Interviewing

Final interview practice for frontier AI labs: Python systems, design, behavioral evidence, and technical presentation

1

AI Lab Coding Interview: Python Systems

Practice production-shaped Python coding prompts: crawlers, in-memory stores, ledgers, schedulers, parsers, rate limiters, caches, and concurrency follow-ups.

Hard13m
2

AI Lab System Design Interview

Design AI lab systems with clear goals, scale math, APIs, data models, overload behavior, permissions, eval gates, and operational debugging paths.

Hard12m
3

AI Lab Behavioral Interview

Prepare behavioral answers for AI labs around judgment, humility, incident leadership, disagreement, safety mechanisms, ambiguity, and evidence of ownership.

Hard10m
4

AI Lab Technical Presentation

Prepare a technical project presentation that proves ownership, architecture taste, tradeoff judgment, rollout discipline, metrics, and depth under questioning.

Hard10m