Advanced14 lessons

Training and Alignment

Follow the training stack from scaling laws and data pipelines through SFT, LoRA, RLHF, DPO, rewards, and distillation.

Readers moving from API usage into model adaptation, post-training, and training infrastructure.

You can explain the lifecycle of a model update and choose the right adaptation method for a product constraint.

1Scaling Laws & Compute-Optimal TrainingLearn the empirical power laws governing LLM performance, from Kaplan's parameter-heavy frontier through Chinchilla-optimal ratios to modern inference-aware training strategies.Advanced Training & AdaptationHard36 min
2Pre-training Data at ScaleUnderstand how web-scale pre-training data is extracted, filtered, deduplicated, mixed, tokenized, and packed into training-ready shards, including decontamination, late-stage annealing, and synthetic-data tradeoffs.Advanced Training & AdaptationHard36 min
3Build GPT from Scratch LabBuild and train a tiny GPT end to end on Shakespeare: tokenize with GPT-style subwords, remap active token IDs, run causal self-attention, track validation loss, save a checkpoint, and sample text.Advanced Training & AdaptationHard23 min
4Continued Pretraining for Domain ShiftLearn when to keep the causal language-modeling objective and continue pretraining on domain text instead of jumping straight to SFT, and how to evaluate the trade-off against forgetting, cost, and downstream gain.Advanced Training & AdaptationHard22 min
5Synthetic Data PipelinesBuild synthetic post-training data pipelines with Self-Instruct, Evol-Instruct, calibrated judge signals, verifiers, preference pairs, diversity checks, and decontamination.Advanced Training & AdaptationHard26 min
6Supervised Fine-Tuning PipelineRun supervised fine-tuning as a real training system: choose the learning objective before the update surface, verify response-token loss and packing, track the real batch budget, save resumable checkpoints, and export on held-out behavior.Advanced Training & AdaptationHard23 min
7Distributed Training: FSDP & ZeROUnderstand ZeRO stages, current FSDP1 vs FSDP2 guidance, and when native PyTorch or DeepSpeed is the right choice for large-model training.Advanced Training & AdaptationHard42 min
8LoRA & Parameter-Efficient TuningUnderstand the mathematics of Low-Rank Adaptation (LoRA), modern adapter targeting strategies, and the real memory tradeoffs compared to full fine-tuning and QLoRA.Advanced Training & AdaptationHard35 min
9Reward Modeling from Preference DataTrain reward models as a first-class post-training stage: validate chosen/rejected pairs and splits, fit a scalar reward head with Bradley-Terry loss, audit generalization, and decide when explicit rewards are worth the extra complexity.Advanced Training & AdaptationHard19 min
10RLHF & DPO AlignmentUnderstand the RLHF pipeline and DPO, including reward modeling, PPO mechanics, and the trade-offs between iterative reinforcement learning and direct preference optimization.Advanced Training & AdaptationHard38 min
11Constitutional AI & Red TeamingUnderstand how Constitutional AI reduces reliance on repeated human preference labeling through AI critique and ranking, and how automated red teaming stress-tests those safeguards.Advanced Training & AdaptationHard33 min
12RLVR & Verifiable RewardsUnderstand RLVR, a post-training approach that uses programmatic verification instead of learned human-preference rewards to improve checked outcomes in math, code, and other contract-driven tasks.Advanced Training & AdaptationHard40 min
13Knowledge Distillation for LLMsUnderstand the main forms of knowledge distillation for LLMs, from logit matching and response-based supervision to on-policy KD. Learn when distillation helps, where student capacity becomes the bottleneck, and how to implement a correct teacher-student training loop.Advanced Training & AdaptationHard33 min
14Model Merging and Weight InterpolationLearn model merging techniques, from simple weight averaging and task arithmetic to TIES-Merging and DARE, including practical guidance on tokenizer compatibility, mergekit workflows, and evaluation.Advanced Training & AdaptationHard35 min