LeetLLM
LearnFeaturesBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Features
  • Blog

Legal

  • Terms of Service
  • Privacy Policy

ยฉ 2026 LeetLLM. All rights reserved.

All Topics
Your Progress
0%

0 of 155 articles completed

๐Ÿ› ๏ธComputing Foundations0/6
NumPy and Tensor ShapesCUDA for ML TrainingMPS & Metal for ML on MacData Structures for AISQL and Data ModelingAlgorithms for ML Engineers
๐Ÿ“ŠMath & Statistics0/8
Gradients and BackpropVectors, Matrices & TensorsLinear Algebra for MLAdam, Momentum, SchedulersProbability for Machine LearningStatistics and UncertaintyDistributions and SamplingHypothesis Tests, Intervals, and pass@k
๐Ÿ“šPreparation & Prerequisites0/13
Neural Networks from ScratchCNNs from ScratchTraining & BackpropagationSoftmax, Cross-Entropy & OptimizationRNNs, LSTMs, GRUs, and Sequence ModelingAutoencoders and VAEsThe Transformer Architecture End-to-EndLanguage Modeling & Next TokensFrom GPT to Modern LLMsPrompt Engineering FundamentalsCalling LLM APIs in ProductionFirst AI App End-to-EndThe LLM Lifecycle
๐ŸงฎML Algorithms & Evaluation0/11
Linear Regression from ScratchLogistic Regression and MetricsDecision Trees, Forests, and BoostingReinforcement Learning BasicsValidation and LeakageClustering and PCACore Retrieval AlgorithmsDecoding AlgorithmsExperiment Design and A/B TestingPyTorch Training LoopsDataset Pipelines and Data Quality
๐Ÿ“ฆProduction ML Systems0/6
Feature Engineering for Production MLBatch and Streaming Feature PipelinesGradient Boosted Trees in ProductionRanking and Recommendation SystemsForecasting and Anomaly DetectionMonitoring Predictive Models
๐ŸงชCore LLM Foundations0/8
The Bitter Lesson & ComputeBPE, WordPiece, and SentencePieceStatic to Contextual EmbeddingsPerplexity & Model EvaluationFile Ingestion for AIChunking StrategiesLLM Benchmarks & LimitationsInstruction Tuning & Chat Templates
๐ŸงฐApplied LLM Engineering0/23
Dimensionality Reduction for EmbeddingsCoT, ToT & Self-Consistency PromptingFunction Calling & Tool UseMCP & Tool Protocol StandardsPrompt Injection DefenseResponsible AI GovernanceData Labeling and Human FeedbackEvaluating AI AgentsProduction RAG PipelinesHybrid Search: Dense + SparseReranking and Cross-Encoders for RAGRAG Evaluation for Reliable AnswersLLM-as-a-Judge EvaluationBias & Fairness in LLMsHallucination Detection & MitigationLLM Observability & MonitoringExperiment Tracking with MLflow and W&BMixed Precision TrainingModel Versioning & DeploymentSemantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsModel Gateways, Routing, and FallbacksDesign an Automated Support Agent
๐ŸŽ“Portfolio Capstones0/9
Capstone: Delivery ETA PredictionCapstone: Product RankingCapstone: Demand ForecastingCapstone: Image Damage ClassifierCapstone: Production ML PipelineCapstone: Document QACapstone: Eval DashboardCapstone: Fine-Tuned ClassifierCapstone: Production Agent
๐Ÿง Transformer Deep Dives0/8
Sentence Embeddings & Contrastive LossEmbedding Similarity & QuantizationScaled Dot-Product AttentionVision Transformers and Image EncodersPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNMechanistic InterpretabilityDecoding Strategies: Greedy to Nucleus
๐ŸงฌAdvanced Training & Adaptation0/16
Scaling Laws & Compute-Optimal TrainingPre-training Data at ScaleBuild GPT from Scratch LabContinued Pretraining for Domain ShiftSynthetic Data PipelinesSupervised Fine-Tuning PipelineDistributed Training: FSDP & ZeROLoRA & Parameter-Efficient TuningReward Modeling from Preference DataRLHF & DPO AlignmentConstitutional AI & Red TeamingRLVR & Verifiable RewardsKnowledge Distillation for LLMsModel Merging and Weight InterpolationPrompt Optimization with DSPyRecursive Language Models (RLM)
๐Ÿค–Advanced Agents & Retrieval0/14
Vector DB Internals: HNSW & IVFAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access ControlStructured Output GenerationReAct & Plan-and-ExecuteGuardrails & Safety FiltersCode Generation & SandboxingComputer-Use / GUI / Browser AgentsHuman-in-the-Loop Agent ArchitectureAI Coding Workflow with AgentsAgent Memory & PersistenceAgent Failure & RecoveryMulti-Agent Orchestration
โšกInference & Production Scale0/20
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionPrefix Caching and Prompt CachingFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceModel Parallelism for LLM InferenceModel Quantization: GPTQ, AWQ & GGUFLocal LLM DeploymentSLM Specialization & Edge DeploymentSpeculative DecodingLong Context Window ManagementContext EngineeringMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time ComputeAdvanced MLOps & DevOps for AIGPU Serving & AutoscalingA/B Testing for LLMs
๐Ÿ—๏ธSystem Design Capstones0/9
Content Moderation SystemCode Completion SystemMulti-Tenant LLM PlatformLLM-Powered Search EngineVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models & Image GenerationReal-Time Voice AI AgentReasoning & Test-Time Compute
๐ŸŽคAI Lab Interviewing0/4
AI Lab Coding Interview: Python SystemsAI Lab System Design InterviewAI Lab Behavioral InterviewAI Lab Technical Presentation
Back to Topics
LearnAI Lab InterviewingAI Lab Coding Interview: Python Systems
โš™๏ธHardMLOps & Deployment

AI Lab Coding Interview: Python Systems

Practice production-shaped Python coding prompts: crawlers, in-memory stores, ledgers, schedulers, parsers, rate limiters, caches, and concurrency follow-ups.

9 min read
Learning path
Step 152 of 155 in the full curriculum
Reasoning & Test-Time ComputeAI Lab System Design Interview

AI Lab Coding Interview: Python Systems

Frontier AI lab coding rounds often look less like isolated puzzle drills and more like small production systems. You may get one base prompt, then a sequence of staged requirements: add TTLs, add concurrency, add cancellation, add rate limits, preserve deterministic output, or explain why your state model will not corrupt itself.

This article is the coding session for the final interview-prep section. The goal is to become fast at practical Python: clear state, small APIs, local tests, and honest concurrency invariants.

Python systems coding interview loop with clarify, implement, test, extend, add concurrency, and explain invariants Python systems coding interview loop with clarify, implement, test, extend, add concurrency, and explain invariants
AI lab coding rounds reward staged implementation: ship a correct base version, test it, then extend the same small state model without losing invariants.

The operating model

Use this loop for every prompt:

  1. Restate input, output, and failure behavior.
  2. Ship version 1 with the smallest correct state model.
  3. Add table-driven tests before adding stage 2.
  4. Isolate shared mutable state before adding threads.
  5. End by naming complexity, race risks, and production hardening.

Good interview code is not the most abstract code. It is code whose invariants can be defended while requirements change.

Python tools to know cold

Know these without documentation:

NeedPython building block
FIFO work queuecollections.deque, queue.Queue, asyncio.Queue
Counts and top errorscollections.Counter
LRU cachecollections.OrderedDict
Deadlines and TTLtime.monotonic, injected now function
Priority schedulingheapq, queue.PriorityQueue
Thread safetythreading.Lock, threading.RLock, threading.Condition, threading.Event
Worker fanoutconcurrent.futures.ThreadPoolExecutor, as_completed
Parsingsplitlines, re, explicit state machines

Do not wait for a test framework. Write a small run_tests() and use plain assert.

Prompt bank

Use these as drills. Do not memorize wording. Learn the patterns.

PromptBase implementationFollow-ups
Same-host web crawlerBFS/DFS with visited setconcurrency, per-host rate limit, timeouts, cancellation
In-memory key/value DBset, get, delete, scanTTL, compare-and-set, transactions, snapshots
Banking ledgeraccounts, deposit, withdraw, transferidempotency, reversals, scheduled transfers, deadlock avoidance
Task schedulerdependency graph and ready queuecycle detection, retries, worker pool, cancellation
Log parsermultiline event groupingmalformed lines, rolling windows, top errors
Rate limiterfixed or sliding windowtoken bucket, multi-dimensional quotas, retry-after
LRU/TTL cachecapacity evictionTTL, thread safety, metrics, stale cleanup

Drill 1: token bucket with deterministic time

Rate limiters show up because they combine state, boundary conditions, and production behavior. Retry policies often need jitter to avoid synchronized retry storms, so this prompt is really about overload control as much as counters.[1] A strong implementation injects time so tests do not sleep.

token-bucket.py
1from dataclasses import dataclass 2 3@dataclass 4class Bucket: 5 capacity: float 6 refill_per_second: float 7 tokens: float 8 updated_at: float 9 10class TokenBucketLimiter: 11 def __init__(self, capacity: int, refill_per_second: float) -> None: 12 self.capacity = float(capacity) 13 self.refill_per_second = float(refill_per_second) 14 self._buckets: dict[str, Bucket] = {} 15 16 def allow(self, key: str, now: float, cost: float = 1.0) -> tuple[bool, float]: 17 bucket = self._buckets.get(key) 18 if bucket is None: 19 bucket = Bucket(self.capacity, self.refill_per_second, self.capacity, now) 20 self._buckets[key] = bucket 21 22 elapsed = max(0.0, now - bucket.updated_at) 23 bucket.tokens = min(bucket.capacity, bucket.tokens + elapsed * bucket.refill_per_second) 24 bucket.updated_at = now 25 26 if bucket.tokens >= cost: 27 bucket.tokens -= cost 28 return True, 0.0 29 30 missing = cost - bucket.tokens 31 retry_after = missing / bucket.refill_per_second 32 return False, retry_after 33 34limiter = TokenBucketLimiter(capacity=3, refill_per_second=1.0) 35print([limiter.allow("org-a", now=0.0)[0] for _ in range(4)]) 36print(limiter.allow("org-a", now=0.5)) 37print(limiter.allow("org-a", now=1.0))
Output
1[True, True, True, False] 2(False, 0.5) 3(True, 0.0)

What to say out loud:

  • The key can be a user, organization, endpoint, or model.
  • now is injected for deterministic tests.
  • Cleanup for idle buckets is a production memory concern, not a correctness requirement for the base prompt.
  • A thread-safe version needs a lock around _buckets and bucket mutation.

Drill 2: thread-safe same-host crawler shape

A crawler prompt tests graph traversal plus concurrency. The invariant is simple: each URL is claimed once before it is fetched or enqueued.

single-process-crawler-core.py
1from collections import deque 2from urllib.parse import urlparse, urljoin 3 4PAGES = { 5 "https://lab.example/start": ["/a", "/b", "https://other.example/x"], 6 "https://lab.example/a": ["/b", "/c"], 7 "https://lab.example/b": ["/c"], 8 "https://lab.example/c": [], 9} 10 11def get_urls(url: str) -> list[str]: 12 return PAGES.get(url, []) 13 14def same_host_crawl(start_url: str) -> list[str]: 15 start_host = urlparse(start_url).netloc 16 queue = deque([start_url]) 17 visited = {start_url} 18 ordered: list[str] = [] 19 20 while queue: 21 url = queue.popleft() 22 ordered.append(url) 23 for raw_link in get_urls(url): 24 link = urljoin(url, raw_link) 25 if urlparse(link).netloc != start_host: 26 continue 27 if link in visited: 28 continue 29 visited.add(link) 30 queue.append(link) 31 32 return ordered 33 34print(same_host_crawl("https://lab.example/start"))
Output
1['https://lab.example/start', 'https://lab.example/a', 'https://lab.example/b', 'https://lab.example/c']

Concurrency follow-up:

  • Protect visited with a lock.
  • Claim a URL while holding the lock, before scheduling a worker.
  • Keep output nondeterministic unless the prompt explicitly asks for deterministic order.
  • Add timeouts and failed-fetch handling without retry storms.

Drill 3: ledger with idempotency

Ledger prompts test whether you can keep money-like state consistent. Use append-only events when possible; if you maintain balances, update balance and event together.

ledger-idempotency.py
1from dataclasses import dataclass 2 3@dataclass(frozen=True) 4class Event: 5 idempotency_key: str 6 account: str 7 delta: int 8 9class Ledger: 10 def __init__(self) -> None: 11 self.balance: dict[str, int] = {} 12 self.events: list[Event] = [] 13 self.seen: set[str] = set() 14 15 def apply(self, key: str, account: str, delta: int) -> int: 16 if key in self.seen: 17 return self.balance.get(account, 0) 18 new_balance = self.balance.get(account, 0) + delta 19 if new_balance < 0: 20 raise ValueError("insufficient funds") 21 self.balance[account] = new_balance 22 self.events.append(Event(key, account, delta)) 23 self.seen.add(key) 24 return new_balance 25 26ledger = Ledger() 27print(ledger.apply("deposit-1", "acct", 100)) 28print(ledger.apply("deposit-1", "acct", 100)) 29print(ledger.apply("withdraw-1", "acct", -30)) 30print(ledger.balance["acct"], len(ledger.events))
Output
1100 2100 370 470 2

Transfer follow-up:

  • Use one idempotency key for the whole transfer.
  • Lock account IDs in sorted order to avoid deadlock.
  • Record both debit and credit events together.
  • Define whether external side effects happen before or after durable commit.

Concurrency answer template

When asked to make a solution concurrent, say this before writing code:

  1. Shared state is X.
  2. The lock protects X.
  3. A work item is claimed at this point.
  4. Worker shutdown happens through this sentinel, event, or executor lifecycle.
  5. Failed work records an error and does not corrupt shared state.

This sounds mechanical because it should. Concurrency interview failures usually come from vague ownership.

Common pitfalls

  • Solving version 1, then adding TTL, transactions, or threads without restating the invariant.
  • Using wall-clock sleeps in tests instead of injected time.
  • Making crawler output order part of correctness after adding concurrent workers.
  • Treating idempotency as "retry the operation" instead of recording the request key and result semantics.
  • Adding a global lock around everything without explaining throughput, deadlock, and fairness tradeoffs.

Mastery checklist

  • Implement a rate limiter with deterministic time and boundary tests.
  • Implement a same-host crawler and explain where to place the visited lock.
  • Implement a ledger with idempotency and insufficient-funds behavior.
  • Explain how to add TTLs to a key/value store without sleeping in tests.
  • Build a dependency scheduler with cycle detection and a ready queue.
  • Parse multiline logs with malformed-line handling.
  • State complexity and memory growth for each solution.
Next Step
Continue to AI Lab System Design Interview

You will turn the same building blocks into end-to-end AI/backend designs with scale, overload behavior, permissions, rollout gates, and observability.

PreviousReasoning & Test-Time Compute
Share this article
XFacebookLinkedInBlueskyRedditHacker NewsEmail
References

Exponential Backoff And Jitter

Brooker, M. (AWS) ยท 2015