LeetLLM
LearnFeaturesPricingBlog
Menu
LearnFeaturesPricingBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Features
  • Pricing
  • Blog

Legal

  • Terms of Service
  • Privacy Policy

ยฉ 2026 LeetLLM. All rights reserved.

1๐ŸงชAI Engineering Foundations
The Bitter Lesson & ComputeTokenization: BPE & SentencePieceWord to Contextual EmbeddingsSentence Embeddings & Contrastive LossDimensionality Reduction for EmbeddingsEmbedding Similarity & QuantizationScaled Dot-Product AttentionPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNDecoding Strategies: Greedy to NucleusPerplexity & Model Evaluation
2โšกInference Systems & Optimization
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceSpeculative DecodingLong Context Window ManagementModel Quantization: GPTQ, AWQ & GGUFMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time Compute
3๐Ÿ”Advanced Retrieval & Enterprise Memory
Chunking StrategiesVector DB Internals: HNSW & IVFHybrid Search: Dense + SparseProduction RAG PipelinesAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access Control
4๐Ÿค–Agentic Architecture & Orchestration
CoT, ToT & Self-Consistency PromptingStructured Output GenerationFunction Calling & Tool UseMCP & Tool Protocol StandardsReAct & Plan-and-ExecuteAgent Memory & PersistenceHuman-in-the-Loop AgentsGuardrails & Safety FiltersPrompt Injection DefenseCode Generation & SandboxingAgent Failure & RecoveryMulti-Agent OrchestrationAI Agent Evaluation and Benchmarking
5๐Ÿ“ŠEvaluation & Reliability
LLM Benchmarks & LimitationsLLM-as-a-Judge EvaluationA/B Testing for LLMsLLM Observability & MonitoringHallucination Detection & MitigationBias & Fairness in LLMs
6๐Ÿ› ๏ธLLMOps & Production Engineering
Semantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsModel Versioning & DeploymentGPU Serving & Autoscaling
7๐ŸงฌTraining, Alignment & Reasoning
Scaling Laws & Compute-Optimal TrainingPre-training Data at ScaleInstruction Tuning & Chat TemplatesMixed Precision TrainingDistributed Training: FSDP & ZeROPrompt Optimization with DSPyRecursive Language Models (RLM)LoRA & Parameter-Efficient TuningKnowledge Distillation for LLMsModel Merging and Weight InterpolationConstitutional AI & Red TeamingRLHF & DPO AlignmentRLVR & Verifiable Rewards
8๐Ÿ—๏ธSystem Design Case Studies
Design an Automated Support AgentContent Moderation SystemLLM-Powered Search EngineCode Completion SystemMulti-Tenant LLM PlatformReasoning & Test-Time ComputeReal-Time Voice AI AgentVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models & Image Generation
Track Your Progress

Create a free account to save your reading progress across devices and unlock the full learning experience.

10 free articles ยท No login required

Learn LLM Engineering

Master the concepts that power modern AI systems. From foundational transformer architecture to production system design โ€” structured to take you from basics to expert-level.

76 topics8 modules45h total content10 free

Step-by-step roadmap

Follow these modules in order. Each step builds directly on the previous one.

  1. 1
    Step 1 of 8

    ๐ŸงชAI Engineering Foundations

    Current step

    Tokenization, embeddings, attention, and the core mental models behind modern LLMs

    11 topics~7hNext: Step 2
  2. 2
    Step 2 of 8

    โšกInference Systems & Optimization

    Upcoming

    Serving architecture, KV cache mechanics, batching strategies, and latency/cost trade-offs

    12 topics~8hNext: Step 3
  3. 3
    Step 3 of 8

    ๐Ÿ”Advanced Retrieval & Enterprise Memory

    Upcoming

    Chunking, indexing, hybrid retrieval, GraphRAG, and enterprise data access controls

    7 topics~4hNext: Step 4
  4. 4
    Step 4 of 8

    ๐Ÿค–Agentic Architecture & Orchestration

    Upcoming

    Prompting, tool calling, memory, orchestration, and guardrails for robust agents

    13 topics~7hNext: Step 5
  5. 5
    Step 5 of 8

    ๐Ÿ“ŠEvaluation & Reliability

    Upcoming

    Benchmarking, LLM-as-judge, online experiments, and reliability diagnostics

    6 topics~4hNext: Step 6
  6. 6
    Step 6 of 8

    ๐Ÿ› ๏ธLLMOps & Production Engineering

    Upcoming

    Caching, deployment, versioning, and the operational discipline for production LLM systems

    4 topics~2hNext: Step 7
  7. 7
    Step 7 of 8

    ๐ŸงฌTraining, Alignment & Reasoning

    Upcoming

    Pre-training to post-training: data pipelines, alignment methods, and reasoning performance

    13 topics~7hNext: Step 8
  8. 8
    Step 8 of 8

    ๐Ÿ—๏ธSystem Design Case Studies

    Upcoming

    End-to-end system design breakdowns for real-world AI applications

    10 topics~6hFinal step
๐Ÿงช

AI Engineering Foundations

Tokenization, embeddings, attention, and the core mental models behind modern LLMs

1

The Bitter Lesson & Compute

Understand Sutton's Bitter Lesson, why general methods that use computation consistently outperform human-engineered heuristics, and how this principle shapes every modern AI architecture decision.

Medium25m
2

Tokenization: BPE & SentencePiece

Master tokenization algorithms (BPE, WordPiece, SentencePiece), understand vocabulary size tradeoffs, and analyze the multilingual tokenization tax.

Medium30m
3

Word to Contextual Embeddings

Trace the full evolution from count-based methods through Word2Vec/GloVe to contextual BERT/GPT representations. Understand the distributional hypothesis, embedding geometry, and when to use static vs contextual embeddings in production.

Hard30m

Sentence Embeddings & Contrastive Loss

PRO

Master sentence embedding training with contrastive learning (InfoNCE), optimize retrieval with bi-encoder vs. cross-encoder architectures, and use modern advances like Matryoshka representations.

Hard45m

Dimensionality Reduction for Embeddings

PRO

Compare PCA, t-SNE, and UMAP for visualizing and compressing embeddings, and learn when MRL and product quantization replace post-hoc reduction.

Medium45m

Embedding Similarity & Quantization

PRO

Master vector similarity (cosine vs dot product), optimize dimensions with Matryoshka learning, and implement scalar, product, and binary quantization for retrieval systems.

Hard45m
7

Scaled Dot-Product Attention

Master the scaled dot-product attention formula from first principles. Deep dive into the variance proof, multi-head parallelization, O(nยฒ) memory complexity, and the three core attention variants.

Hard45m
8

Positional Encoding: RoPE & ALiBi

Understand why transformers need position info, derive sinusoidal encodings, explore how RoPE encodes relative position through rotation, compare ALiBi's linear bias approach, and analyze long-context extrapolation methods.

Hard45m

Layer Normalization: Pre-LN vs Post-LN

PRO

Master Layer Normalization mechanics: Pre-LN vs Post-LN gradient flow, representation collapse trade-offs, RMSNorm simplification, and modern innovations like QK-Norm and Peri-LN.

Hard40m

Decoding Strategies: Greedy to Nucleus

PRO

Master decoding strategies for text generation: compare greedy, beam search, top-k, nucleus (top-p), and min-p sampling, with temperature scaling and repetition penalty.

Hard30m

Perplexity & Model Evaluation

PRO

Derive perplexity from cross-entropy loss, understand bits-per-byte normalization, and explore modern LLM evaluation methods including LLM-as-Judge and Arena Elo.

Medium25m
โšก

Inference Systems & Optimization

Serving architecture, KV cache mechanics, batching strategies, and latency/cost trade-offs

1

Inference: TTFT, TPS & KV Cache

Understand the two-phase inference process (prefill vs decode), derive the KV cache memory formula, and learn production optimizations like chunked prefill and disaggregation.

Hard30m

Multi-Query & Grouped-Query Attention

PRO

Master the inference optimizations that make serving large models possible. Compare MHA, MQA, and GQA architectures and their impact on KV cache memory.

Hard25m

KV Cache & PagedAttention

PRO

Understand KV cache storage strategies for multi-tenant LLM inference, including PagedAttention, memory fragmentation, and vLLM architecture.

Hard45m

FlashAttention & Memory Efficiency

PRO

Understand how FlashAttention achieves O(n) memory by tiling and online softmax, and analyze its IO complexity.

Hard45m

Continuous Batching & Scheduling

PRO

Understand high-throughput request schedulers for LLM serving, focusing on continuous batching, prefill-decode disaggregation, and latency-aware scheduling.

Hard35m

Scaling LLM Inference

PRO

Explores LLM inference optimization: KV cache management, continuous batching, PagedAttention, and speculative decoding.

Hard50m

Speculative Decoding

PRO

Accelerate LLM inference 2-3x by decoupling drafting from verification. Learn the probability theory behind exact distribution matching and how to deploy speculative decoding in production.

Hard45m

Long Context Window Management

PRO

Master long-context LLM engineering: from RoPE scaling and attention patterns to practical context management strategies, lost-in-the-middle effects, and chunking approaches for production systems.

Hard45m

Model Quantization: GPTQ, AWQ & GGUF

PRO

Understand post-training quantization methods GPTQ, AWQ, and GGUF. Learn how to deploy 72B models on consumer GPUs with minimal quality loss.

Hard50m

Mixture of Experts Architecture

PRO

Master MoE routing, load balancing, and understand why modern MoE models like DeepSeek-V2 achieve better compute-quality tradeoffs.

Hard40m

Mamba & State Space Models

PRO

Master the linear-time alternative to transformers: from structured state spaces (S4) and Mamba's selective mechanism to hybrid architectures like Jamba.

Hard45m

Reasoning & Test-Time Compute

PRO

Understand the shift from train-time to test-time compute scaling. Explore how reasoning models trade inference FLOPs for better logical deduction.

Hard45m

Unlock 66 Premium Articles

Deep dives, model answers, scoring rubrics & more

๐Ÿ”

Advanced Retrieval & Enterprise Memory

Chunking, indexing, hybrid retrieval, GraphRAG, and enterprise data access controls

1

Chunking Strategies

Deep dive into document chunking approaches for RAG: fixed-size, semantic, recursive, and their impact on retrieval quality.

Medium20m

Vector DB Internals: HNSW & IVF

PRO

Master the internals of approximate nearest neighbor algorithms: HNSW, IVF, and Product Quantization. Understand the speed-recall-memory tradeoffs in production vector databases.

Hard45m

Hybrid Search: Dense + Sparse

PRO

Understand how to build a hybrid retrieval system combining BM25 sparse search with dense vector embeddings for optimal recall.

Medium35m

Production RAG Pipelines

PRO

Understand the architecture of end-to-end RAG systems: retriever design, vector indices, chunking strategies, and hallucination mitigation.

Medium47m

Advanced RAG: HyDE & Self-RAG

PRO

Master advanced RAG techniques including query decomposition, HyDE, Self-RAG, and Corrective RAG (CRAG) to build robust retrieval pipelines.

Hard25m

GraphRAG & Knowledge Graphs

PRO

Understand how Microsoft's GraphRAG architecture uses community detection and graph structure to answer questions that pure vector search can't.

Hard35m

RAG Security & Access Control

PRO

Understand row-level security, document ACLs, and per-user filtering in vector stores to prevent RAG systems from leaking confidential data.

Hard35m
๐Ÿค–

Agentic Architecture & Orchestration

Prompting, tool calling, memory, orchestration, and guardrails for robust agents

CoT, ToT & Self-Consistency Prompting

PRO

Master Chain-of-Thought prompting, Self-Consistency, and Tree-of-Thought strategies. Learn when to trade inference compute for reasoning accuracy.

Medium20m

Structured Output Generation

PRO

Master the techniques for guaranteeing structurally valid LLM outputs, from JSON mode and function calling schemas to grammar-guided decoding with finite state machines.

Hard25m

Function Calling & Tool Use

PRO

Understand how LLMs learn to call functions, parse structured output, and handle multi-step tool use chains.

Medium30m

MCP & Tool Protocol Standards

PRO

Understand the Model Context Protocol (MCP) and emerging standards for agent-tool interaction, from protocol architecture and transport layers to security considerations and ecosystem integration.

Medium28m

ReAct & Plan-and-Execute

PRO

Master the core patterns for autonomous agents: ReAct loops, Plan-and-Execute architectures, and multi-agent orchestration.

Hard45m

Agent Memory & Persistence

PRO

Master memory systems for LLM agents, from short-term working memory and conversation buffers to long-term semantic stores, episodic recall, and MemGPT's hierarchical memory management.

Hard30m

Human-in-the-Loop Agents

PRO

Designing systems that pause agent execution for human approval. From bank transfers to code deployment, building trust into autonomous AI.

Hard38m

Guardrails & Safety Filters

PRO

Master the design of input and output safety filters for production LLM applications with configurable policy enforcement.

Hard40m

Prompt Injection Defense

PRO

Master prompt injection attacks, understand why they bypass safety filters, and design multi-layer defense strategies for production LLM systems.

Medium30m

Code Generation & Sandboxing

PRO

Master the architecture of code generation agents, from the generate-execute-debug loop to secure sandboxing with gVisor and WebAssembly.

Hard30m

Agent Failure & Recovery

PRO

Master implementing deterministic fallbacks, infinite-loop breakers, and graceful degradation for when LLM agents hallucinate or get stuck.

Hard35m

Multi-Agent Orchestration

PRO

Master multi-agent DAGs using LangGraph and AutoGen. Learn to implement shared state, message passing, conditional routing, and human-in-the-loop workflows for robust AI systems.

Hard40m

AI Agent Evaluation and Benchmarking

PRO

Master the design of evaluation frameworks for AI agents, from task-completion benchmarks like SWE-bench and OSWorld to custom metrics for tool use accuracy, multi-step reasoning, and safety.

Medium20m
๐Ÿ“Š

Evaluation & Reliability

Benchmarking, LLM-as-judge, online experiments, and reliability diagnostics

1

LLM Benchmarks & Limitations

Master major LLM benchmarks (MMLU, HumanEval, GPQA, SWE-bench), measurement protocols (pass@k, Elo), analyze data contamination, and learn 2026 selection strategies including agentic benchmarks and cost-to-quality ratios.

Medium32m

LLM-as-a-Judge Evaluation

PRO

Master the LLM-as-a-Judge approach, from designing effective rubrics to handling biases like position and verbosity.

Medium22m

A/B Testing for LLMs

PRO

Master the design of an A/B testing framework for LLM-powered features, including metric selection, sample size calculations, and automated guardrails.

Hard42m

LLM Observability & Monitoring

PRO

Master the design of an observability stack for LLM applications, covering logging, metrics, tracing, and drift detection.

Medium25m

Hallucination Detection & Mitigation

PRO

Master the taxonomy, detection methods, and mitigation strategies for LLM hallucinations. Covers everything from SelfCheckGPT and semantic entropy to specialized detectors like Lynx, token-level probing, and cutting-edge prevention techniques including contrastive decoding.

Medium55m

Bias & Fairness in LLMs

PRO

Master the taxonomy of LLM biases, implementation of fairness metrics, and end-to-end mitigation strategies from data curation to RLHF.

Medium38m
๐Ÿ› ๏ธ

LLMOps & Production Engineering

Caching, deployment, versioning, and the operational discipline for production LLM systems

Semantic Caching & Cost Optimization

PRO

Master semantic caching, request deduplication, and cost-aware routing to cut LLM API costs by 40-70% without quality loss.

Medium25m

LLM Cost Engineering & Token Economics

PRO

Master the economics of LLM deployment. Learn token-level cost modeling, prompt optimization, caching strategies, model routing, and build-vs-buy decisions at scale.

Medium30m

Model Versioning & Deployment

PRO

Master the architecture of CI/CD pipelines for LLM deployments, covering model versioning, automated evaluation gates, and rollback strategies.

Medium25m

GPU Serving & Autoscaling

PRO

Master the design of GPU serving infrastructure for LLMs with autoscaling, continuous batching, and cost optimization.

Hard40m
๐Ÿงฌ

Training, Alignment & Reasoning

Pre-training to post-training: data pipelines, alignment methods, and reasoning performance

Scaling Laws & Compute-Optimal Training

PRO

Master the empirical power laws governing LLM performance, from Kaplan's original scaling results through Chinchilla-optimal ratios to modern inference-aware training strategies.

Hard35m

Pre-training Data at Scale

PRO

Understand the complete data pipeline for pre-training a foundation model, including crawling, deduplication, quality filtering, data mixing, sequence packing, data annealing, and synthetic data generation.

Medium35m

Instruction Tuning & Chat Templates

PRO

Master instruction tuning (SFT) and chat templates. Learn how raw base models are transformed into helpful assistants using structured data, loss masking, and sequence packing.

Medium30m

Mixed Precision Training

PRO

Understand FP16/BF16 training formats, the necessity of master weights, and how dynamic loss scaling prevents gradient underflow.

Medium30m

Distributed Training: FSDP & ZeRO

PRO

Master FSDP and DeepSpeed ZeRO strategies for training LLMs. Compare memory efficiency, communication overhead, and 3D parallelism techniques.

Hard45m

Prompt Optimization with DSPy

PRO

Move beyond manual prompt engineering. Master DSPy's compiler to automatically optimize prompts, select few-shot examples, and improve LLM pipeline performance from data.

Hard35m

Recursive Language Models (RLM)

PRO

Master Recursive Language Models (RLMs), an inference-time approach that moves long context into a programmable environment so models can recurse over 10M+ token workloads with competitive quality and cost.

Hard30m
8

LoRA & Parameter-Efficient Tuning

Master the mathematics of Low-Rank Adaptation (LoRA), adapter injection strategies, and memory/compute tradeoffs compared to full fine-tuning.

Hard45m

Knowledge Distillation for LLMs

PRO

Understand the core mechanisms of knowledge distillation for LLMs. Master the techniques for compressing massive teacher models into efficient student models while preserving complex reasoning capabilities.

Hard30m

Model Merging and Weight Interpolation

PRO

Master model merging techniques, from simple weight averaging and Task Arithmetic to TIES-Merging and DARE, including practical guidance on using mergekit for combining specialized models.

Hard25m

Constitutional AI & Red Teaming

PRO

Understand how Constitutional AI replaces human feedback with AI self-supervision, and explore automated red teaming strategies for scalable model alignment.

Hard35m

RLHF & DPO Alignment

PRO

Master the RLHF pipeline and DPO. Understand reward modeling, PPO mechanics, and the trade-offs between iterative reinforcement learning and direct preference optimization.

Hard45m

RLVR & Verifiable Rewards

PRO

Understand RLVR (the training approach that produced DeepSeek-R1's reasoning capabilities) using binary correctness signals instead of human preferences or reward model approximations.

Hard25m
๐Ÿ—๏ธ

System Design Case Studies

End-to-end system design breakdowns for real-world AI applications

Design an Automated Support Agent

PRO

Master the architecture of a production-grade customer support agent, including RAG, tool use, and stateful human escalation.

Medium30m
2

Content Moderation System

Master the architecture of a real-time content moderation system using LLMs and specialized classifiers.

Hard25m

LLM-Powered Search Engine

PRO

Master the architecture of an end-to-end AI search engine, covering multi-stage retrieval, hallucination verification, and streaming synthesis.

Hard45m

Code Completion System

PRO

Master the design of a real-time code completion system like Copilot, including context construction, model serving, and low-latency UX.

Hard50m

Multi-Tenant LLM Platform

PRO

Master the design of a multi-tenant platform that serves large language models with strict SLA guarantees, token-aware rate limiting, and accurate cost tracking.

Hard42m

Reasoning & Test-Time Compute

PRO

Master how to design a production reasoning agent (like o1/DeepSeek-R1) that uses chain-of-thought, tree search, and test-time compute scaling for complex problem solving.

Hard35m

Real-Time Voice AI Agent

PRO

Master the architecture of a real-time voice AI agent with sub-500ms latency. Covers VAD, streaming STT/LLM/TTS pipelines, WebRTC transport, and handling interruptions.

Hard30m

Vision-Language Models & CLIP

PRO

Master CLIP's contrastive pre-training, zero-shot classification, and the architecture of modern published VLMs like LLaVA, BLIP-2, and Qwen-VL.

Hard35m

Multimodal LLM Architecture

PRO

Deep dive into multimodal LLM architecture covering encoders, projection strategies, fusion techniques, three-stage training with DPO, MoE for efficient inference, and adaptive thinking modes.

Hard28m

Diffusion Models & Image Generation

PRO

Master the mathematics and architecture of Diffusion Models, from the forward noising process to U-Net denoising, Classifier-Free Guidance, and Latent Diffusion scaling.

Hard45m

Ready to unlock everything?

Get full access to all 76 articles with detailed breakdowns, architecture diagrams, model answers, and scoring rubrics.