All 158 articles free · Login only tracks progress

Learn LLM Engineering

Master the concepts that power modern AI systems. From foundational transformer architecture to production system design — structured to take you from basics to expert-level.

158 topics14 modules65h total contentOpen curriculum

Step-by-step roadmap

Follow these modules in order. Each step builds directly on the previous one.

1
Step 1 of 14
🛠️Computing Foundations
Current step
NumPy shapes, accelerator basics, data structures, SQL, and algorithmic cost for practical ML systems
9 topics~2hNext: Step 2
2
Step 2 of 14
📊Math & Statistics
Upcoming
Probability, statistics, distributions, uncertainty, hypothesis testing, bootstrap, and pass@k
8 topics~2hNext: Step 3
3
Step 3 of 14
📚Preparation & Prerequisites
Upcoming
Background knowledge for readers new to ML. Skip ahead if you already know neural networks and how models train.
13 topics~3hNext: Step 4
4
Step 4 of 14
🧮ML Algorithms & Evaluation
Upcoming
Regression, validation, PCA, retrieval, decoding, experiments, PyTorch loops, and dataset quality
11 topics~3hNext: Step 5
5
Step 5 of 14
📦Production ML Systems
Upcoming
Feature pipelines, tabular prediction, ranking, forecasting, monitoring, and continuous training for production ML
6 topics~1hNext: Step 6
6
Step 6 of 14
🧪Core LLM Foundations
Upcoming
First working mental models: tokenization, embeddings, evaluation basics, file ingestion, chunking, and instruction-tuned chat
8 topics~2hNext: Step 7
7
Step 7 of 14
🧰Applied LLM Engineering
Upcoming
Practical medium-depth patterns for reasoning, tools, context, RAG, evaluation, prompt optimization, observability, cost, and first product design
24 topics~7hNext: Step 8
8
Step 8 of 14
🎓Portfolio Capstones
Upcoming
Shippable predictive ML and LLM products: ETA, ranking, forecasting, vision, pipelines, document QA, evaluation, and classifiers
8 topics~3hNext: Step 9
9
Step 9 of 14
🧠Transformer Deep Dives
Upcoming
Harder internals: sentence embeddings, vector scoring, attention, positions, normalization, and decoding
8 topics~5hNext: Step 10
10
Step 10 of 14
🧬Advanced Training & Adaptation
Upcoming
Scaling laws, mixed-precision and distributed training, fine-tuning, alignment, rewards, distillation, and model merging
15 topics~8hNext: Step 11
11
Step 11 of 14
🤖Advanced Agents & Retrieval
Upcoming
Advanced retrieval and agent systems: vector indexes, GraphRAG, security, orchestration, memory, recovery, RLMs, and a production capstone
16 topics~10hNext: Step 12
12
Step 12 of 14
⚡Inference & Production Scale
Upcoming
Serving architecture, KV cache mechanics, batching, quantization, long context, advanced architectures, deployment, and experiments
19 topics~11hNext: Step 13
13
Step 13 of 14
🏗️System Design Capstones
Upcoming
End-to-end hard system design breakdowns for real AI products
9 topics~6hNext: Step 14
14
Step 14 of 14
🎤AI Lab Interviewing
Upcoming
Final interview practice for frontier AI labs: Python systems, design, behavioral evidence, and technical presentation
4 topics~2hFinal step

🛠️

Computing Foundations

NumPy shapes, accelerator basics, data structures, SQL, and algorithmic cost for practical ML systems

Git, Shell, Linux for AI

Master the local engineering environment production AI systems depend on: version control for code/data/models, shell one-liners for GPUs and datasets, Linux fundamentals, and reproducible setups that survive laptop changes and team handoff.

Easy12m

Docker for Reproducible AI

Turn the access-request scorer into a portable Docker image with .dockerignore, multi-stage builds, volume-mounted eval data, runtime secrets, docker compose, and a gate that reproduces the same 0.667 score before later GPU work.

Easy15m

Python for AI Engineering

Learn Python as the first AI engineering loop: read JSONL rows, validate fields, compute exact-match accuracy, and harden the scorer with pytest, prompt snapshots, leakage checks, seeded runs, and a CI gate.

Easy16m

NumPy and Tensor Shapes

Learn NumPy shape reasoning from first principles: name axes, predict indexing and broadcasting, reduce safely, distinguish reshape from transpose, and add shape guards.

Easy17m

CUDA for ML Training

Build beginner-first CUDA intuition for model training: CPU vs GPU roles, host-device copies, asynchronous execution, PyTorch device placement, and first-line debugging of OOM and performance issues.

Easy14m

MPS & Metal for ML on Mac

Build beginner-first intuition for training on Apple silicon: what Metal and MPS are, why unified memory changes the CUDA mental model, how PyTorch exposes the `mps` device, how to check availability, where CPU fallback appears, and how synchronization and memory pressure still shape performance.

Easy14m

Data Structures for AI

A beginner-first data-structures chapter that starts with a list scan, then teaches inverted indexes, heaps, queues, and caches through one retrieval story.

Easy16m

SQL and Data Modeling

Turn an in-memory support retriever into durable SQL tables. Create rows and keys, query with parameters and joins, enforce permissions, roll back failed work, deduplicate retries, inspect indexes, and see where pgvector fits.

Easy14m

Algorithms for ML Engineers

Learn to count retrieval work, express growth with Big-O, avoid wasteful selection and pairwise loops, and enforce a latency budget with runnable Python.

Easy12m

📊

Math & Statistics

Probability, statistics, distributions, uncertainty, hypothesis testing, bootstrap, and pass@k

Gradients and Backprop

Learn why training works by nudging one response-latency weight, tracing and summing chain-rule paths, checking gradients, and confirming them with PyTorch.

Easy15m

Vectors, Matrices & Tensors

Turn one gradient vector into batches of model inputs while learning dot products, matrix transforms, tensor axes, and shape debugging.

Easy12m

Linear Algebra for ML

Find hidden directions in a support-incident matrix with SVD, then use rank, PCA, truncation, and condition numbers without losing sight of what the numbers mean.

Easy14m

Adam, Momentum, Schedulers

Trace SGD, momentum, Adam, AdamW, schedules, and gradient clipping on one uneven loss surface. Learn what each optimizer buffer measures and how to validate a training choice.

Easy15m

Probability for Machine Learning

A beginner-first probability article that teaches events, priors, conditional probability, independence, Bayes rule, and base-rate mistakes through one API abuse-risk detector story.

Easy19m

Statistics and Uncertainty

Estimate abuse risk in a flagged review queue from finite labels, using bootstrap intuition, score intervals, sampling bias checks, and calibrated reporting.

Easy12m

Distributions and Sampling

Model an incident assistant with binary outcomes, request routes, tool-call counts, and tail latency, then challenge each simulation before trusting it.

Easy14m

Hypothesis Tests, Intervals, and pass@k

Compare a code-generation model with paired evidence, uncertainty for lift, and pass@k under a fixed sampling budget.

Easy13m

📚

Preparation & Prerequisites

Background knowledge for readers new to ML. Skip ahead if you already know neural networks and how models train.

Neural Networks from Scratch

Trace an incident-risk network from one neuron to a batched NumPy forward pass, then diagnose activation, shape, scale, and numerical-stability failures.

Easy11m

CNNs from Scratch

Trace a CNN over a cracked equipment-panel photo patch: shared kernels, feature-map shapes, pooling, padding failures, and a NumPy-to-PyTorch forward pass.

Easy10m

Training & Backpropagation

Follow a decode-latency model through prediction, loss, gradients, parameter updates, scalar autograd, mini-batches, validation checks, and PyTorch.

Easy12m

Softmax, Cross-Entropy & Optimization

Turn raw class scores into stable probabilities and a useful learning signal, then apply the same loss to next-token predictions.

Easy13m

RNNs, LSTMs, GRUs, and Sequence Modeling

Trace an RNN over ordered events, see why gradients fade or grow, and use LSTM and GRU gates to control memory.

Easy11m

Autoencoders and VAEs

Compress an inspection-image patch, turn its latent code into a sampleable distribution, and implement VAE loss and training.

Easy13m

The Transformer Architecture End-to-End

Trace a support reply through masked attention, a decoder block, and next-token logits with readable NumPy and PyTorch code.

Easy13m

Language Modeling & Next Tokens

Learn how next-token prediction becomes a trainable language model, from bigram counts and neural n-grams to causal Transformer generation and KV-cache serving.

Easy25m

From GPT to Modern LLMs

Trace how decoder-only models grew into modern LLMs, then inspect scaling, instruction tuning, open weights, MoE, and serving tradeoffs with runnable examples.

Easy22m

Prompt Engineering Fundamentals

Build and test grounded prompts with clear roles, few-shot examples, structured outputs, evidence checks, and failure-focused evaluation.

Easy18m

Calling LLM APIs in Production

Turn a grounded prompt into a reliable API boundary with server-side secrets, typed results, bounded retries, safe actions, and useful telemetry.

Easy16m

First AI App End-to-End

Ship one traceable rotation-decision workflow: validated input, model boundary, stored status, clear UI states, failure tests, and deploy checks.

Easy14m

The LLM Lifecycle

Follow one key-rotation assistant from base-model training to post-training, retrieval, serving, evaluation, and the fix chosen after a real failure.

Easy14m

🧮

ML Algorithms & Evaluation

Regression, validation, PCA, retrieval, decoding, experiments, PyTorch loops, and dataset quality

Linear Regression from Scratch

Fit key-rotation assistant latency by hand, implement least squares and gradient descent in NumPy, then test failure cases and held-out behavior.

Medium21m

Logistic Regression and Metrics

Route access-change requests with logistic regression from scratch: derive sigmoid and log loss, fit NumPy weights, select a cost-aware threshold on validation data, audit ranking and calibration, then compare with scikit-learn.

Medium25m

Decision Trees, Forests, and Boosting

Model access-change review with decision trees from scratch: compute impurity, test a non-perfect stump on held-out cases, compare forests and boosting, and audit feature explanations.

Medium20m

Reinforcement Learning Basics

Learn reinforcement learning through the access-review workflow from earlier lessons. Define an MDP, compute discounted returns and Bellman backups, implement value iteration and Q-learning, model abandonment risk, and connect policy gradients to LLM post-training.

Medium14m

Validation and Leakage

Make model and policy claims honestly: define the decision moment, split access-review episodes by time and user, expose feature and preprocessing leakage, and audit LLM evaluation contamination.

Medium16m

Clustering and PCA

Inspect unlabeled developer-message embeddings with k-means and PCA, then stress-test whether apparent neighborhoods survive scale, metric, and compression choices.

Medium19m

Core Retrieval Algorithms

Build and evaluate the evidence-selection stage of a technical-docs assistant with BM25, dense similarity, rank fusion, reranking, and approximate search audits.

Medium17m

Decoding Algorithms

Turn retrieved evidence into controlled text by implementing stable softmax, sampling filters, constrained decoding, beam search, and reproducible generation audits.

Medium16m

Experiment Design and A/B Testing

Design a trustworthy online experiment for an AI support change: randomize customers, measure useful outcomes, quantify uncertainty, and reject false wins.

Medium20m

PyTorch Training Loops

Build a PyTorch classifier from raw logits through autograd, validation, and reloadable checkpoints.

Medium18m

Dataset Pipelines and Data Quality

Build versioned AI datasets with schema gates, grouped splits, contamination checks, and auditable receipts.

Medium17m

📦

Production ML Systems

Feature pipelines, tabular prediction, ranking, forecasting, monitoring, and continuous training for production ML

Feature Engineering for Production ML

Turn training-job events into stable prediction inputs while preventing leakage and training-serving mismatch.

Medium12m

Batch and Streaming Feature Pipelines

Build point-in-time training-run features from events and preserve the same meaning in online serving.

Medium13m

Gradient Boosted Trees in Production

Train a boosted SLA-risk baseline from tabular features, evaluate slices, and package deployment evidence.

Medium16m

Ranking and Recommendation Systems

Rank documents for a developer using candidate retrieval, relevance metrics, and feedback-loop safeguards.

Medium16m

Forecasting and Anomaly Detection

Forecast batch-job demand with time-aware evaluation and turn large forecast errors into reviewable operational alerts.

Medium16m

Monitoring Predictive Models

Monitor predictive models from feature freshness through delayed labels, then gate retraining, promotion, and rollback.

Medium16m

🧪

Core LLM Foundations

First working mental models: tokenization, embeddings, evaluation basics, file ingestion, chunking, and instruction-tuned chat

The Bitter Lesson & Compute

Use Sutton's Bitter Lesson to compare rules, learning, and search through a measured AI-incident routing lab.

Medium15m

BPE, WordPiece, and SentencePiece

Build a small subword tokenizer, compare BPE, WordPiece, and SentencePiece, then audit token cost and Unicode behavior.

Medium20m

Static to Contextual Embeddings

Turn token IDs into vectors, learn what nearby usage captures, and see why a word such as charge needs sentence-dependent representations.

Medium16m

Perplexity & Model Evaluation

Compute perplexity from held-out token probabilities, compare models under a fixed protocol, normalize across tokenizers, and decide what PPL can't tell you.

Medium16m

File Ingestion for AI

Turn PDFs, scans, HTML, and Markdown into faithful evidence records with provenance and quality checks before retrieval.

Medium12m

Chunking Strategies

Turn clean documents into retrieval units that preserve answers, citations, and measurable search quality.

Medium13m

LLM Benchmarks & Limitations

Build an evaluation suite for a policy-answering LLM: score evidence use, understand public benchmark contracts, control judge bias, and make release decisions from private tests.

Medium20m

Instruction Tuning & Chat Templates

Teach a base language model to answer as an assistant: curate grounded SFT rows, serialize chat turns exactly, choose loss targets, pack safely, and detect serving-time template drift.

Medium17m

🧰

Applied LLM Engineering

Practical medium-depth patterns for reasoning, tools, context, RAG, evaluation, prompt optimization, observability, cost, and first product design

Dimensionality Reduction for Embeddings

Shrink and inspect embedding indexes without guessing: measure recall while testing PCA, projections, native shortening, and quantization.

Medium21m

CoT, ToT & Self-Consistency Prompting

Build and evaluate reasoning controllers: single traces, answer voting, and bounded tree search for multi-step LLM decisions.

Medium16m

Function Calling & Tool Use

Build a safe tool-calling runtime that validates model requests, executes controlled actions, feeds observations back, and evaluates complete workflows.

Medium15m

MCP & Tool Protocol Standards

Move from local function calls to reusable MCP capability servers by tracing one real session, building a working stdio integration, and enforcing trust boundaries.

Medium19m

Context Engineering

Move past fitting tokens into the window and learn context engineering: curate a high-signal working set, package reusable Agent Skills, and build resumable harnesses with durable checkpoints.

Medium25m

Prompt Injection Defense

Build a prompt-injection-resistant agent boundary: quarantine untrusted tool content, validate typed action proposals, require approval, and measure unsafe side effects.

Medium18m

Responsible AI Governance

Turn a tool-bearing LLM workflow into auditable evidence: classify its use, own risks, version controls, preserve traces, and gate releases.

Medium19m

Data Labeling and Human Feedback

Build a trustworthy human-feedback data flywheel: redact traces, write rubrics, measure agreement, select useful examples, prevent leakage, and promote versioned datasets.

Medium17m

Evaluating AI Agents

Evaluate model-promotion agent runs by final state, observable trace, safety gates, cost, and repeatability, then map private tests to public benchmarks.

Medium19m

Production RAG Pipelines

Design a secure, traceable RAG service around versioned policy evidence, grounded answers, abstention, release gates, and latency budgets.

Medium18m

Hybrid Search: Dense + Sparse

Upgrade a permission-safe RAG retriever with BM25, semantic scores, rank fusion, and recall gates for exact codes and paraphrased policy questions.

Medium18m

Reranking and Cross-Encoders for RAG

Turn a permission-safe hybrid candidate list into precise context using cross-encoder reasoning, ordering metrics, latency gates, and traceable evidence selection.

Medium15m

RAG Evaluation for Reliable Answers

Evaluate a permission-safe RAG answer trace with context, claim, citation, failure-attribution, and release gates before automating softer judgments.

Medium15m

LLM-as-a-Judge Evaluation

Add calibrated soft judgments to a RAG evaluation trace without letting an LLM override deterministic evidence gates.

Medium19m

Bias & Fairness in LLMs

Build a matched-pair fairness audit for an LLM judge, measure routing gaps, and block release when evidence is too weak.

Medium18m

Hallucination Detection & Mitigation

Build a claim-level grounding gate for incident updates that verifies evidence, catches confident fabrication, abstains safely, and records release traces.

Medium17m

LLM Observability & Monitoring

Turn claim-level answer traces into production metrics, actionable alerts, privacy-safe debugging records, and reproducible incident evidence.

Medium19m

Experiment Tracking with MLflow and W&B

Turn a live LLM regression into a reproducible candidate decision by logging inputs, metrics, artifacts, and promotion evidence.

Medium19m

Prompt Optimization with DSPy

Move beyond manual prompt editing. Use DSPy to search prompt and few-shot candidates from data, then release only after held-out evaluation.

Medium24m

Model Versioning & Deployment

Turn an evaluated LLM change into an immutable release bundle, promote it through measured traffic, and roll back without losing lineage.

Medium21m

Semantic Caching & Cost Optimization

Reuse stable policy answers across paraphrased questions without crossing release, access, or freshness boundaries; then prove the cache is both safe and worth serving.

Medium19m

LLM Cost Engineering & Token Economics

Build an auditable LLM cost ledger from usage traces, cache decisions, output contracts, offline batch work, and release budget gates.

Medium19m

Model Gateways, Routing, and Fallbacks

Turn an audited cost contract into a model gateway that preserves data, schema, review, and budget requirements across routing and fallback.

Medium18m

Design an Automated Support Agent

Assemble a stateful support agent that grounds replies, gates credit actions, preserves gateway policy, and hands difficult cases to humans.

Medium20m

🎓

Portfolio Capstones

Shippable predictive ML and LLM products: ETA, ranking, forecasting, vision, pipelines, document QA, evaluation, and classifiers

Capstone: Delivery ETA Prediction

Ship a delivery-delay warning service with as-of features, versioned policy gates, baseline evidence, and monitored fallback.

Hard14m

Capstone: Product Ranking

Ship a marketplace ranking candidate with eligible retrieval, separate recall and NDCG gates, replayable exposure rows, and an A/B-ready rollback receipt.

Hard15m

Capstone: Demand Forecasting

Ship a demand forecast and capacity-alert artifact with rolling backtests, alert review, and retraining policy.

Hard16m

Capstone: Image Damage Classifier

Ship a damaged-package photo triage service with quality checks, slice evaluation, serving bundles, and review monitoring.

Hard17m

Capstone: Production ML Pipeline

Assemble predictive ML artifacts into validated training, registry promotion, canary monitoring, and rollback.

Hard17m

Capstone: Document QA

Ship a policy-evidence service with controlled admission, cited answers, abstention, replayable eval rows, and a resumable deep-research extension with claim-level citation checks.

Hard33m

Capstone: Eval Dashboard

Build a release dashboard for document QA that turns replayable evidence rows into exact-coverage gates, uncertainty checks, and inspectable decisions.

Hard24m

Capstone: Fine-Tuned Classifier

Train and gate an access-ticket encoder that exports exact-receipt evidence and safe intake decisions to a production agent.

Hard24m

🧠

Transformer Deep Dives

Harder internals: sentence embeddings, vector scoring, attention, positions, normalization, and decoding

Sentence Embeddings & Contrastive Loss

Learn how contrastive losses train sentence embeddings, why hard negatives matter, and how retrieval systems combine bi-encoders, rerankers, and dimension tradeoffs.

Hard38m

Embedding Similarity & Quantization

Learn vector scoring contracts, evaluate Matryoshka widths, and measure scalar, product, and binary quantization before deploying compressed retrieval.

Hard39m

Scaled Dot-Product Attention

Learn scaled dot-product attention from first principles, including Q/K/V routing, variance scaling, masks, multi-head shapes, KV-cache costs, and FlashAttention.

Hard41m

Vision Transformers and Image Encoders

Understand how Vision Transformers split images into patches, build visual tokens, train encoders, and connect to CLIP and multimodal LLMs.

Hard24m

Positional Encoding: RoPE & ALiBi

Understand why transformers need position information, how sinusoidal encodings work, how RoPE and ALiBi encode relative position, and why long-context extrapolation needs careful evaluation.

Hard34m

Layer Normalization: Pre-LN vs Post-LN

Understand LayerNorm mechanics, Pre-LN versus Post-LN placement, RMSNorm simplification, gradient stability, and hybrid normalization layouts for deep transformers.

Hard26m

Mechanistic Interpretability

Learn how sparse autoencoders decompose transformer activations into candidate interpretable features, support circuit tracing, and enable controlled activation-steering experiments.

Hard30m

Decoding Strategies: Greedy to Nucleus

Compare decoding strategies for text generation: greedy, beam search, top-k, nucleus (top-p), temperature, repetition controls, and newer variants like min-p.

Hard38m

🧬

Advanced Training & Adaptation

Scaling laws, mixed-precision and distributed training, fine-tuning, alignment, rewards, distillation, and model merging

Scaling Laws & Compute-Optimal Training

Learn the empirical power laws governing LLM performance, from Kaplan's parameter-heavy frontier through Chinchilla-optimal ratios to modern inference-aware training strategies.

Hard36m

Pre-training Data at Scale

Understand how web-scale pre-training data is extracted, filtered, deduplicated, mixed, tokenized, and packed into training-ready shards, including decontamination, late-stage annealing, and synthetic-data tradeoffs.

Hard37m

Build GPT from Scratch Lab

Build and train a tiny GPT end to end on Shakespeare: tokenize with GPT-style subwords, remap active token IDs, run causal self-attention, track validation loss, save a checkpoint, and sample text.

Hard24m

Continued Pretraining for Domain Shift

Learn when to keep the causal language-modeling objective and continue pretraining on domain text instead of jumping straight to SFT, and how to evaluate the trade-off against forgetting, cost, and downstream gain.

Hard22m

Synthetic Data Pipelines

Build synthetic post-training data pipelines with Self-Instruct, Evol-Instruct, calibrated judge signals, verifiers, preference pairs, diversity checks, and decontamination.

Hard27m

Supervised Fine-Tuning Pipeline

Run supervised fine-tuning as a real training system: choose the learning objective before the update surface, verify response-token loss and packing, track the real batch budget, save resumable checkpoints, and export on held-out behavior.

Hard24m

Mixed Precision Training

Measure how FP16 and BF16 affect training range, update precision, memory, and release evidence before enabling faster low-precision compute.

Hard20m

Distributed Training: FSDP & ZeRO

Understand ZeRO stages, current FSDP1 vs FSDP2 guidance, and when native PyTorch or DeepSpeed is the right choice for large-model training.

Hard42m

LoRA & Parameter-Efficient Tuning

Understand the mathematics of Low-Rank Adaptation (LoRA), modern adapter targeting strategies, and the real memory tradeoffs compared to full fine-tuning and QLoRA.

Hard36m

Reward Modeling from Preference Data

Train reward models as a first-class post-training stage: validate chosen/rejected pairs and splits, fit a scalar reward head with Bradley-Terry loss, audit generalization, and decide when explicit rewards are worth the extra complexity.

Hard20m

RLHF & DPO Alignment

Understand the RLHF pipeline and DPO, including reward modeling, PPO mechanics, and the trade-offs between iterative reinforcement learning and direct preference optimization.

Hard38m

Constitutional AI & Red Teaming

Understand how Constitutional AI reduces reliance on repeated human preference labeling through AI critique and ranking, and how automated red teaming stress-tests those safeguards.

Hard34m

RLVR & Verifiable Rewards

Understand RLVR, a post-training approach that uses programmatic verification instead of learned human-preference rewards to improve checked outcomes in math, code, and other contract-driven tasks.

Hard40m

Knowledge Distillation for LLMs

Understand the main forms of knowledge distillation for LLMs, from logit matching and response-based supervision to on-policy KD. Learn when distillation helps, where student capacity becomes the bottleneck, and how to implement a correct teacher-student training loop.

Hard33m

Model Merging and Weight Interpolation

Learn model merging techniques, from simple weight averaging and task arithmetic to TIES-Merging and DARE, including practical guidance on tokenizer compatibility, mergekit workflows, and evaluation.

Hard35m

🤖

Advanced Agents & Retrieval

Advanced retrieval and agent systems: vector indexes, GraphRAG, security, orchestration, memory, recovery, RLMs, and a production capstone

Vector DB Internals: HNSW & IVF

Learn how approximate nearest neighbor indexes use HNSW, IVF, and Product Quantization to balance speed, recall, and memory in production vector databases.

Hard38m

Advanced RAG: HyDE & Self-RAG

Learn how query rewriting, HyDE, Self-RAG, and Corrective RAG change retrieval control, and how to evaluate their cost and evidence quality.

Hard33m

GraphRAG & Knowledge Graphs

Learn how GraphRAG uses entity graphs, hierarchical community reports, and embeddings to retrieve evidence for relationship-heavy and corpus-level questions.

Hard38m

RAG Security & Access Control

Learn how document ACLs, tenant isolation, retrieval-time authorization, output checks, and audit logs reduce private-data leakage risk in enterprise RAG.

Hard39m

Structured Output Generation

Build reliable LLM interfaces with JSON mode, structured outputs, schema validation, and grammar-guided decoding.

Hard41m

ReAct & Plan-and-Execute

Compare ReAct for tightly coupled tool use with Plan-and-Execute for longer workflows with explicit planning and replanning.

Hard35m

Guardrails & Safety Filters

Build layered guardrails for prompt injection defense, sensitive-data controls, structured outputs, policy enforcement, and safe tool use.

Hard42m

Code Generation & Sandboxing

Build code agents that test candidate patches inside bounded sandboxes with runtime evidence and defense-in-depth controls.

Hard36m

Computer-Use / GUI / Browser Agents

Build browser and desktop agents whose proposed clicks and keystrokes remain behind host policy, approval, verification, and sandbox controls.

Hard30m

Human-in-the-Loop Agent Architecture

Build approval gates, durable checkpoints, and guarded resumes for agent actions that change external state.

Hard39m

AI Coding Workflow with Agents

Scope coding-agent tasks, isolate execution, keep patches on branches, verify behavior, and preserve human merge ownership.

Hard27m

Agent Memory & Persistence

Design agent memory systems with scoped storage, sourced recall, tenant isolation, and durable checkpoints without letting recalled context authorize side effects.

Hard37m

Agent Failure & Recovery

Learn how to implement validation checks, retries, checkpointed recovery, state reconciliation, loop breakers, and graceful degradation when LLM agents hallucinate, stall, or drift from their tools.

Hard53m

Recursive Language Models (RLM)

Learn Recursive Language Models (RLMs): keep long context in a programmable environment, delegate targeted sub-calls, and release the design only after measured quality, cost, and safety checks.

Hard42m

Multi-Agent Orchestration

Master multi-agent orchestration with LangGraph, AutoGen teams, and OpenAI handoffs. Learn DAG-style routing, typed shared state, protocol boundaries, and human-in-the-loop controls for reliable AI systems.

Hard44m

Capstone: Production Agent

Assemble classifier intake, cited policy evidence, approval-gated actions, and episode release tests into a production agent.

Hard23m

⚡

Inference & Production Scale

Serving architecture, KV cache mechanics, batching, quantization, long context, advanced architectures, deployment, and experiments

Inference: TTFT, TPS & KV Cache

Understand the two-phase inference process (prefill vs decode), derive the KV cache memory formula, and learn production optimizations like chunked prefill and prefill/decode disaggregation.

Hard32m

Multi-Query & Grouped-Query Attention

Compare MHA, MQA, and GQA architectures, calculate their KV cache footprint, and reason about memory-limited serving tradeoffs.

Hard40m

KV Cache & PagedAttention

Calculate KV cache capacity, trace paged block allocation, and separate memory packing from prefix reuse and scheduling tradeoffs.

Hard37m

Prefix Caching and Prompt Caching

Structure exact reusable prefixes, validate cache hits from usage fields, and enforce invalidation and tenant-isolation boundaries.

Hard21m

FlashAttention & Memory Efficiency

Understand how FlashAttention cuts auxiliary attention memory from O(n²) to O(n) with tiling and online softmax, and analyze its IO complexity.

Hard34m

Continuous Batching & Scheduling

Understand how LLM schedulers use continuous batching, chunked prefill, and prefill-decode disaggregation to improve throughput without violating TTFT, TPOT, or inter-token latency targets.

Hard34m

Scaling LLM Inference

Explains why decode-heavy LLM serving is often memory-bound and how KV-cache design, batching, PagedAttention, and speculative decoding improve scale.

Hard43m

Model Parallelism for LLM Inference

Learn tensor parallelism, pipeline parallelism, context parallelism, and how multi-GPU serving trades memory capacity for communication overhead.

Hard22m

Model Quantization: GPTQ, AWQ & GGUF

Understand how GPTQ, AWQ, and GGUF trade off accuracy, memory footprint, and portability when serving LLMs on GPUs or local hardware.

Hard36m

Local LLM Deployment

Plan local LLM deployment with model size, quantization, pruning and sparsity trade-offs, Docker packaging, runtime choice, and hardware budgets.

Hard20m

SLM Specialization & Edge Deployment

Distill large teachers into compact SLMs using MobileLLM architectures and Phi-style data recipes. Compile and run them on-device with MLC LLM, ONNX Runtime, Core ML, and ExecuTorch while respecting power, thermal, and strict privacy constraints.

Hard27m

Speculative Decoding

Reduce LLM inter-token latency by pairing cheap drafting with target-model verification. Learn the rejection-sampling proof, speedup model, method choices, and production rollout gates.

Hard35m

Long Context Window Management

Master long-context LLM engineering: KV-cache math, prefill-vs-decode bottlenecks, RoPE scaling, lost-in-the-middle behavior, and long-context vs. RAG trade-offs.

Hard37m

Mixture of Experts Architecture

Master MoE routing, load balancing, and the dense-vs-sparse serving tradeoffs behind Mixtral, DeepSeek, and Qwen3.6-style expert models.

Hard38m

Mamba & State Space Models

Master linear-time sequence modeling: from S4 and HiPPO to Mamba's selective recurrence, Mamba-2's SSD framework, Mamba-3's inference-first refinements, and modern hybrid Transformer-SSM designs.

Hard36m

Reasoning & Test-Time Compute

Understand how reasoning models trade extra inference compute for better answers, and what that means for search, verifiers, KV cache pressure, and routing.

Hard43m

Advanced MLOps & DevOps for AI

Master advanced MLOps and DevOps patterns for LLM systems: GitOps for prompts and models, feature stores for embedding features, automated rollback on eval regression, shadow traffic, and model registries with rollback metadata.

Hard24m

GPU Serving & Autoscaling

Master the design of GPU serving infrastructure for LLMs with autoscaling, continuous batching, and cost optimization.

Hard52m

A/B Testing for LLMs

Master the design of an A/B testing framework for LLM-powered features, including traffic routing, metric selection, sample sizing, and automated guardrails.

Hard47m

🏗️

System Design Capstones

End-to-end hard system design breakdowns for real AI products

Content Moderation System

Master the architecture of a real-time content moderation system using LLMs and specialized classifiers.

Hard37m

Code Completion System

Design a real-time code completion path with context construction, measured serving latency, privacy controls, and stale-result suppression.

Hard42m

Multi-Tenant LLM Platform

Design a shared LLM platform with tenant-scoped state, quota enforcement, adapter routing, KV accounting, and measured GPU utilization.

Hard36m

LLM-Powered Search Engine

Master the architecture of an end-to-end AI search engine, covering freshness routing, hybrid retrieval, evidence packing, citation verification, and streaming synthesis.

Hard35m

Vision-Language Models & CLIP

Design a visual inspection and search product while learning CLIP, zero-shot classification, visual token budgets, grounding, and modern VLM connectors.

Hard43m

Multimodal LLM Architecture

Design a multimodal incident-evidence copilot while learning encoders, connectors, fusion, token budgets, training, grounding, and serving constraints.

Hard39m

Diffusion Models: Images & Text

Design a governed image-generation service while learning DDPM noising, stochastic sampling, latent diffusion, classifier-free guidance, DiT backbones, and text diffusion.

Hard47m

Real-Time Voice AI Agent

Master real-time voice AI architecture: turn detection, streaming STT/LLM/TTS, native audio trade-offs, WebRTC transport, and barge-in state.

Hard41m

Reasoning & Test-Time Compute

Design a production reasoning agent that routes by difficulty, evaluates candidate work, requires evidence before release, and survives serving bottlenecks like key-value (KV) cache growth.

Hard44m

🎤

AI Lab Interviewing

Final interview practice for frontier AI labs: Python systems, design, behavioral evidence, and technical presentation

AI Lab Coding Interview: Python Systems

Practice production-shaped Python coding prompts: crawlers, in-memory stores, ledgers, schedulers, parsers, rate limiters, caches, and concurrency follow-ups.

Hard31m

AI Lab System Design Interview

Design AI lab systems with clear goals, scale math, APIs, data models, overload behavior, permissions, eval gates, and operational debugging paths.

Hard29m

AI Lab Behavioral Interview

Prepare behavioral answers for AI labs around judgment, humility, incident leadership, disagreement, safety mechanisms, ambiguity, and evidence of ownership.

Hard22m

AI Lab Technical Presentation

Prepare a technical project presentation that proves ownership, architecture taste, tradeoff judgment, rollout discipline, metrics, and depth under questioning.

Hard25m

Learn LLM Engineering

Step-by-step roadmap

🛠️Computing Foundations

📊Math & Statistics

📚Preparation & Prerequisites

🧮ML Algorithms & Evaluation

📦Production ML Systems

🧪Core LLM Foundations

🧰Applied LLM Engineering

🎓Portfolio Capstones

🧠Transformer Deep Dives

🧬Advanced Training & Adaptation

🤖Advanced Agents & Retrieval

⚡Inference & Production Scale

🏗️System Design Capstones

🎤AI Lab Interviewing

Computing Foundations

Git, Shell, Linux for AI

Docker for Reproducible AI

Python for AI Engineering

NumPy and Tensor Shapes

CUDA for ML Training

MPS & Metal for ML on Mac

Data Structures for AI

SQL and Data Modeling

Algorithms for ML Engineers

Math & Statistics

Gradients and Backprop

Vectors, Matrices & Tensors

Linear Algebra for ML

Adam, Momentum, Schedulers

Probability for Machine Learning

Statistics and Uncertainty

Distributions and Sampling

Hypothesis Tests, Intervals, and pass@k

Preparation & Prerequisites

Neural Networks from Scratch

CNNs from Scratch

Training & Backpropagation

Softmax, Cross-Entropy & Optimization

RNNs, LSTMs, GRUs, and Sequence Modeling

Autoencoders and VAEs

The Transformer Architecture End-to-End

Language Modeling & Next Tokens

From GPT to Modern LLMs

Prompt Engineering Fundamentals

Calling LLM APIs in Production

First AI App End-to-End

The LLM Lifecycle

ML Algorithms & Evaluation

Linear Regression from Scratch

Logistic Regression and Metrics

Decision Trees, Forests, and Boosting

Reinforcement Learning Basics

Validation and Leakage

Clustering and PCA

Core Retrieval Algorithms

Decoding Algorithms

Experiment Design and A/B Testing

PyTorch Training Loops

Dataset Pipelines and Data Quality

Production ML Systems

Feature Engineering for Production ML

Batch and Streaming Feature Pipelines

Gradient Boosted Trees in Production

Ranking and Recommendation Systems

Forecasting and Anomaly Detection

Monitoring Predictive Models

Core LLM Foundations

The Bitter Lesson & Compute

BPE, WordPiece, and SentencePiece

Static to Contextual Embeddings

Perplexity & Model Evaluation

File Ingestion for AI

Chunking Strategies

LLM Benchmarks & Limitations

Instruction Tuning & Chat Templates

Applied LLM Engineering

Dimensionality Reduction for Embeddings

CoT, ToT & Self-Consistency Prompting