LeetLLM
LearnFeaturesPricingBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Features
  • Pricing
  • Blog

Legal

  • Terms of Service
  • Privacy Policy

ยฉ 2026 LeetLLM. All rights reserved.

All Topics
Your Progress
0%

0 of 104 articles completed

๐Ÿ› ๏ธComputing Foundations0/3
Python for AI EngineeringNumPy and Tensor ShapesData Structures for AI
๐Ÿ“ŠMath & Statistics0/4
Probability for MLStatistics and UncertaintyDistributions and SamplingHypothesis Tests and pass@k
๐Ÿ“šPreparation & Prerequisites0/9
Vectors, Matrices & TensorsNeural Networks from ScratchTraining & BackpropagationSoftmax, Cross-Entropy & OptimizationThe Transformer Architecture End-to-EndLanguage Modeling & Next TokensFrom GPT to Modern LLMsPrompt Engineering FundamentalsThe LLM Lifecycle
๐ŸงชCore LLM Foundations0/8
The Bitter Lesson & ComputeBPE, WordPiece, and SentencePieceStatic to Contextual EmbeddingsPerplexity & Model EvaluationFunction Calling & Tool UseChunking StrategiesLLM Benchmarks & LimitationsInstruction Tuning & Chat Templates
๐ŸงฎML Algorithms & Evaluation0/8
Linear Regression from ScratchValidation and LeakageClustering and PCACore Retrieval AlgorithmsDecoding AlgorithmsExperiment DesignPyTorch Training LoopsDataset Pipelines
๐ŸงฐApplied LLM Engineering0/17
Dimensionality Reduction for EmbeddingsCoT, ToT & Self-Consistency PromptingMCP & Tool Protocol StandardsPrompt Injection DefenseAI Agent Evaluation and BenchmarkingProduction RAG PipelinesHybrid Search: Dense + SparseLLM-as-a-Judge EvaluationBias & Fairness in LLMsHallucination Detection & MitigationLLM Observability & MonitoringPre-training Data at ScaleMixed Precision TrainingModel Versioning & DeploymentSemantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsDesign an Automated Support Agent
๐ŸŽ“Portfolio Capstones0/4
Capstone: Document QACapstone: Eval DashboardCapstone: Fine-Tuned ClassifierCapstone: Production Agent
๐Ÿง Transformer Deep Dives0/6
Sentence Embeddings & Contrastive LossEmbedding Similarity & QuantizationScaled Dot-Product AttentionPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNDecoding Strategies: Greedy to Nucleus
๐ŸงฌAdvanced Training & Adaptation0/10
Scaling Laws & Compute-Optimal TrainingDistributed Training: FSDP & ZeROLoRA & Parameter-Efficient TuningRLHF & DPO AlignmentConstitutional AI & Red TeamingRLVR & Verifiable RewardsKnowledge Distillation for LLMsModel Merging and Weight InterpolationPrompt Optimization with DSPyRecursive Language Models (RLM)
๐Ÿค–Advanced Agents & Retrieval0/12
Vector DB Internals: HNSW & IVFAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access ControlStructured Output GenerationReAct & Plan-and-ExecuteGuardrails & Safety FiltersCode Generation & SandboxingAgent Memory & PersistenceHuman-in-the-Loop AgentsAgent Failure & RecoveryMulti-Agent Orchestration
โšกInference & Production Scale0/14
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceModel Quantization: GPTQ, AWQ & GGUFSpeculative DecodingLong Context Window ManagementMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time ComputeGPU Serving & AutoscalingA/B Testing for LLMs
๐Ÿ—๏ธSystem Design Capstones0/9
Content Moderation SystemCode Completion SystemMulti-Tenant LLM PlatformLLM-Powered Search EngineVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models & Image GenerationReal-Time Voice AI AgentReasoning & Test-Time Compute
Track Your Progress

Create a free account to save your reading progress across devices and unlock the full learning experience.

LeetLLM Premium
  • All question breakdowns
  • Architecture diagrams
  • Model answers & rubrics
  • Follow-up Q&A analysis
  • New content weekly
Back to Topics
LearnMath & StatisticsDistributions and Sampling
๐Ÿ“ŠEasyEvaluation & Benchmarks

Distributions and Sampling

A beginner-first guide to distributions as recipes for randomness, with seeded NumPy simulations for clicks, latency, counts, and tail behavior.

35 min readOpenAI, Anthropic, Google +17 key concepts

A distribution is a recipe for randomness.

That sounds abstract, so start with product behavior. Some events are yes/no: a user clicked or didn't click. Some values are counts: a request used 14 tool calls. Some values are positive and skewed: latency is usually small, but occasionally huge.

Those shapes are different. If you simulate them all with the same bell curve, your code will teach you the wrong lesson.

What You Learn

This chapter teaches how to choose a simple distribution, draw samples, and use simulation as a safe practice field before real traffic arrives.[1][2][3]

Distribution comparison showing Bernoulli, normal, and lognormal samples with a fixed random seed and repeated simulation loop Distribution comparison showing Bernoulli, normal, and lognormal samples with a fixed random seed and repeated simulation loop
Visual anchor: each mini-chart is a different shape of randomness. The seed makes samples repeatable so debugging stays possible.

Step Map

StepQuestionWhat you should be able to do
1What kind of value is this?Match value type to distribution shape.
2What parameters control it?Name probability, mean, spread, or rate.
3Can we sample it?Generate repeatable fake data.
4What summary matters?Compare mean, percentile, or count.
5What would break?Notice wrong distribution choices.

Statistics taught you that samples wobble. Distributions explain the shape of that wobble.


Tiny Story

Imagine you are designing an LLM support bot.

Before launch, you want fake traffic for a small load test.

You need to simulate:

Product behaviorValue typeGood beginner distribution
User clicked "thumbs up"yes/noBernoulli
User intentcategoryCategorical
Number of tool callscountPoisson
Model latencypositive skewed numberLognormal
Average eval score noisecentered continuous numberNormal

This chapter isn't trying to turn you into a probability theorist. It teaches one habit: stop pretending all randomness has one shape.

Vocabulary

  • random variable: a quantity whose value is uncertain before observation.
  • distribution: a rule for which values can happen and how often.
  • sample: one draw from a distribution.
  • parameter: a number that controls the distribution, such as click probability.
  • seed: a number that makes a random run repeatable.
  • Monte Carlo: estimating behavior by running many random samples.

Analogy: a distribution is like a vending machine with rules. You don't know the next item, but you know which items are possible and how likely each one is.

Worked Example

For a thumbs-up click, the value is yes/no.

Use a Bernoulli distribution:

OutcomeNumeric valueProbability
no click00.92
click10.08

If you simulate 1,000 users, the click rate won't be exactly 0.08 every time. It should land near 0.08.

For latency, the value can't be negative. A normal distribution can generate negative numbers, so it's a poor first choice for latency. A lognormal distribution is often a better teaching example because it stays positive and creates a right tail.

That tail matters. Users often feel the 95th percentile, not the average.

Code Lab

Put this in distributions_demo.py:

python
1import numpy as np 2 3rng = np.random.default_rng(7) 4 5clicks = rng.binomial(n=1, p=0.08, size=1000) 6latency = rng.lognormal(mean=2.0, sigma=0.4, size=1000) 7 8print("click_rate", round(clicks.mean(), 3)) 9print("mean_latency", round(latency.mean(), 2)) 10print("p95_latency", round(np.percentile(latency, 95), 2))

Read it line by line:

  • default_rng(7) creates repeatable randomness.
  • binomial(n=1, p=0.08, size=1000) creates 1,000 yes/no clicks.
  • lognormal(...) creates positive latency-like values.
  • percentile(latency, 95) asks how slow the slowest 5 percent of requests are.

The exact numbers matter less than the pattern:

text
1click_rate near 0.08 2p95_latency greater than mean_latency

Second Worked Example: Categories

Now simulate routing for the same support bot.

Each request has one intent:

IntentProbability
billing0.25
bug0.35
account0.30
other0.10

Use a categorical sample:

python
1intents = np.array(["billing", "bug", "account", "other"]) 2probabilities = np.array([0.25, 0.35, 0.30, 0.10]) 3 4sampled = rng.choice(intents, size=20, p=probabilities) 5print(sampled[:10])

This isn't a classifier yet. It is fake traffic with a controlled shape.

Read the line p=probabilities as a contract: each sampled request must choose one of the listed intents, and the long-run frequencies should match the probabilities.

Third Worked Example: Counts

Tool calls are counts: 0, 1, 2, 3, and so on.

A Poisson distribution is a common first model for counts in a fixed window:

python
1tool_calls = rng.poisson(lam=2.5, size=1000) 2 3print("mean_tool_calls", round(tool_calls.mean(), 2)) 4print("max_tool_calls", tool_calls.max())

Read The Parameter

The parameter lam=2.5 is the expected count. It doesn't mean every request has 2.5 tool calls. Counts are whole numbers, and individual requests still wobble.

Use the output to ask product questions:

QuestionSummary to inspect
What is normal load?mean tool calls
How often are requests expensive?percentage above 5 calls
What should rate limits protect?max or high percentile count

Here is one more useful line:

python
1print("share_above_5", round((tool_calls > 5).mean(), 3))

That line turns simulation into a planning tool. It estimates how often a request crosses an operational threshold.

Why Seeded Randomness Matters

Random code without a seed is hard to debug.

If your teammate can't reproduce your sample, they can't tell whether a behavior changed because of your code or because randomness picked a different draw.

Named Generator Pattern

Use a named generator:

python
1rng = np.random.default_rng(7) 2print(rng.integers(0, 10, size=3))

Then pass rng around instead of calling global random functions from everywhere.

Draw The Shape Before Trusting It

Before using a distribution, write a tiny shape card:

ValueCan be negative?Can have a long tail?Example distribution
ClicknonoBernoulli
IntentnonoCategorical
Tool callsnosometimesPoisson
LatencynoyesLognormal
Score noiseyesusually noNormal

This table is more useful than memorizing names. It forces the key question:

What values are possible?

If the distribution can produce impossible values, the model is already teaching you nonsense.

Common Distributions

DistributionUse whenExample
Bernoullione yes/no trialone answer passed or failed
Binomialcount successes across trials17 passes out of 100 tasks
Categoricalone label from many labelsroute to billing, bug, or account
Poissoncount events in a windownumber of tool calls in a request
Normalsymmetric noise around a meanaverage embedding score noise
Lognormalpositive value with long taillatency or cost per request
Betauncertain probability between 0 and 1estimated click rate after few samples

The table is a starting point, not a law. Real data can be messier. Always compare simulated summaries against observed summaries when you have real data.

Common Trap

The common trap is using averages everywhere.

For latency, average can hide user pain.

Example:

MetricMeaning
Mean latencytypical load on system
p95 latencyslow experience for worst 5 percent
Max latencyworst observed request

If the p95 is high, users can feel a bad product even when the mean looks fine.

Mini Test

Add tests like these:

python
1def test_click_rate_is_near_probability(): 2 rng = np.random.default_rng(7) 3 clicks = rng.binomial(n=1, p=0.08, size=1000) 4 assert 0.05 < clicks.mean() < 0.11 5 6def test_latency_is_positive(): 7 rng = np.random.default_rng(7) 8 latency = rng.lognormal(mean=2.0, sigma=0.4, size=1000) 9 assert latency.min() > 0

These tests are intentionally wide. Random samples wobble. The test should catch broken assumptions, not demand exact values.

Practice

  1. Change click probability from 0.08 to 0.12.
  2. Predict whether click_rate should move up or down.
  3. Increase sigma in the lognormal sample.
  4. Compare mean latency and p95 latency.
  5. Write one sentence that starts: "Averages hide tail behavior when..."

Production Check

For simulations, record:

  • distribution choice
  • parameter values
  • random seed
  • sample size
  • summary metrics
  • real data used for comparison, if available

If the simulation drives a launch decision, also record what would make it invalid. For example: "This latency simulation is invalid if production requests include file uploads, because file uploads weren't sampled."

Next, continue to Hypothesis Tests and pass@k. You can now simulate noisy results. The next step is deciding whether one noisy result is evidence of a real improvement.

Evaluation Rubric
  • 1
    Explains distributions as rules for generating possible outcomes
  • 2
    Uses seeded NumPy simulations to compare Bernoulli, categorical, Poisson, beta, and lognormal behavior
  • 3
    Connects distribution choice to product behavior like clicks, classes, counts, and latency tails
Common Pitfalls
  • Sampling without a seed makes debugging hard. Sampling from the wrong distribution makes the simulation answer the wrong question.
  • A normal distribution can generate negative values, so it's a poor first model for latency or token counts.
  • A simulation that matches the mean can still miss the tail behavior that breaks production.
Follow-up Questions to Expect

Key Concepts Tested
BernoullicategoricalGaussianPoissonbetaDirichletMonte Carlo
References

Machine Learning: A Probabilistic Perspective.

Murphy, K. P. ยท 2012

Array programming with NumPy.

Harris, C. R., et al. ยท 2020 ยท Nature

Pattern Recognition and Machine Learning.

Bishop, C. M. ยท 2006

Share this article
XFacebookLinkedInBlueskyRedditHacker NewsEmail

Your account is free and you can post anonymously if you choose.