LearnAI Lab InterviewingAI Lab Coding Interview: Python Systems

⚙️HardMLOps & Deployment

AI Lab Coding Interview: Python Systems

Practice production-shaped Python coding prompts: crawlers, in-memory stores, ledgers, schedulers, parsers, rate limiters, caches, and concurrency follow-ups.

28 min read

Learning path

Step 155 of 158 in the full curriculum

Reasoning & Test-Time Compute AI Lab System Design Interview

After the system design capstones, the final section turns architecture knowledge into interview execution. The first test is whether you can build small, correct systems while requirements change.

AI lab coding formats vary by team. Public frontier-lab guidance emphasizes well-designed solutions, tests, live coding, debugging, standard-library fluency, experience, motivation, communication, and tradeoff analysis.^{[1]Reference 1Interview guidehttps://openai.com/interview-guide/}^{[2]Reference 2Careershttps://www.anthropic.com/careers} Practice that bar with one base prompt followed by staged requirements: add TTLs, add concurrency, add cancellation, add rate limits, preserve deterministic output, or explain why your state model won't corrupt itself.

The final interview-prep section ends with a coding session. Build speed at practical Python: clear state, small APIs, local tests, and honest concurrency invariants.

Python systems coding interview loop with clarify, implement, test, extend, add concurrency, and explain invariants — AI lab coding rounds reward staged implementation: ship a correct base version, test it, then extend the same small state model without losing invariants.

Four-round prep map

Treat the final prep section as one packet. Each round tests a different artifact, but the artifacts should agree with each other.

Round	Pattern families	Proof artifact
Coding	traversal, stores, TTL, ordering, parsing, concurrency, validation	passing tests plus an invariant you can explain under follow-up
System design	model gateway, retrieval, scheduler, coding agent, eval rollout	API, data model, scale math, overload plan, permission boundary
Behavioral	motivation, risk judgment, disagreement, ambiguity, failure, growth	five story bank with metrics, artifacts, and honest reflections
Technical presentation	architecture narrative, tradeoffs, metrics, AI bridge, Q&A defense	15-minute talk, 5-minute version, appendix, ownership boundary

Use the map to avoid uneven prep. A strong candidate doesn't only solve Python prompts; they can also explain why a design boundary exists, what failure changed their judgment, and which project artifact proves depth.

The operating model

Use this loop for every prompt:

Restate input, output, and failure behavior.
Ship version 1 with the smallest correct state model.
Add table-driven tests before adding stage 2.
Isolate shared mutable state before adding threads.
End by naming complexity, race risks, and production hardening.

Good interview code isn't the most abstract code. It's code whose invariants can be defended while requirements change.

Python tools to know cold

Know these without documentation:

Need	Python building block
FIFO work queue	collections.deque, queue.Queue, asyncio.Queue
Counts and top errors	collections.Counter
LRU cache	collections.OrderedDict
Deadlines and TTL	time.monotonic, injected `now` function
Priority scheduling	heapq, queue.PriorityQueue
Thread safety	threading.Lock, threading.RLock, threading.Condition, threading.Event
Worker fanout	concurrent.futures.ThreadPoolExecutor, as_completed
Parsing	splitlines, re, explicit state machines

Don't wait for a test framework. Write a small run_tests() and use plain assert.

Prompt bank

Use these as drills. Don't memorize wording. Learn the patterns.

Prompt pattern	Base implementation	Follow-ups
Same-host web crawler	BFS/DFS with visited set	concurrency, per-host rate limit, timeouts, cancellation
In-memory key/value DB	`set`, `get`, `delete`, `scan`	TTL, compare-and-set, transactions, snapshots
Banking ledger	accounts, deposit, withdraw, transfer	idempotency, reversals, scheduled transfers, deadlock avoidance
Task scheduler	dependency graph and ready queue	cycle detection, retries, worker pool, cancellation
Log parser	multiline event grouping	malformed lines, rolling windows, top errors
Rate limiter	fixed or sliding window	token bucket, multi-dimensional quotas, retry-after
LRU/TTL cache	capacity eviction	TTL, thread safety, metrics, stale cleanup
LFU/cache policy	frequency buckets and recency tie-break	TTL, capacity 0, update semantics, thread safety
Stream assembler	chunks, sequence IDs, end markers	out-of-order chunks, duplicate chunks, timeout, memory cap
Schema validator	nested dict/list validation	defaults, unknown fields, coercion, error paths
Batch scheduler	queue by arrival and priority	fairness, deadlines, retries, starvation
Secret redactor	token scanning and replacement	overlapping matches, allowlists, multiline logs

Frontier live-practice set

Use this as the main circuit. Each linked drill carries a full prompt, clarification questions, sample tests, hidden tests, solution guide, and validated Python plus Java paths. Run each problem twice: first in Python for speed, then in Java to force explicit classes, maps, and boundary checks.

Arena	Practice drills	Interview signal
Repository and codebase systems	Repository Hash Tree, Repository Diff, Duplicate File Groups, Repository Ignore Filter, File Patch Applier, Patch Conflict Detector, Codebase Symbol Index, In-memory Filesystem, Stack Trace Reconstructor	Code review agents, workspace snapshots, deterministic patches, symbol lookup, upload filtering, incremental state
Stateful services	TTL Key/Value Store, Transactional KV Store, Idempotent Ledger, LRU TTL Cache, LFU TTL Cache, Token Budget Ledger	Durable APIs, atomic updates, idempotency, eviction policy, TTL boundaries
Scheduling and workers	Dependency Scheduler With Retries, Workflow State Machine, Multi-tenant Job Scheduler, Bounded Worker Queue, Worker Lease Registry, Priority Deadline Scheduler, Batch Inference Scheduler	Run orchestration, backpressure, retries, leases, fairness, cancellation follow-ups
Streaming and parsers	Streaming Markdown Parser, SSE Event Parser, Streaming Token Assembler, Log Error Parser, Longest-match Tokenizer, Prompt Section Extractor, Secret Redactor	Chunked state, final flush, malformed records, overlapping matches, transcript safety
Agents and webhooks	Tool Call Schema Validator, Webhook Idempotency Receiver, PR Readiness Gate, Model Fallback Router, Retry Backoff Planner, Circuit Breaker State	Tool execution, dedupe, provider fallback, proof artifacts, retry policy, outage behavior
Retrieval and context	RAG Chunk Selector, Context Window Packer, Document Permission Filter	Ranking, budget packing, access control, explainable inclusion
Evals and telemetry	Eval Failure Aggregator, Canary Metric Judge, Experiment Traffic Splitter, Agent Event Timeline, Command Log Classifier, Sliding Window Error Counter	Rollout judgment, failure clustering, reviewable proof, redacted logs, stable assignment, moving windows
Systems basics under frontier wording	Same-host Crawler, Token Bucket Rate Limiter, Notification Rate Limiter	Graph traversal, duplicate suppression, host policy, quota math, noisy automation control

Live drill contract

For every practice run, write these artifacts before opening the guide:

Problem proposition in your own words: API, input shape, output shape, failure behavior.
Three clarification questions that can change tests.
Invariant sentence: state owner, allowed transition, and cheap lookup.
Public tests: happy path, duplicate input, missing input, boundary.
Hidden tests: malformed input, ordering tie-break, expiry equality, retry exhaustion, or conflict.
Python implementation with plain assert tests.
Java implementation with explicit classes and deterministic collections.
Follow-up answer: concurrency, persistence, scale, observability, or rollback.

That routine makes LeetLLM practice closer to a real frontier coding loop: define semantics, ship correct code, prove boundaries, then extend without rewriting the state model.

Question pattern taxonomy

Most prompts are one of these shapes. Classify the prompt before writing code.

Pattern	Recognition signal	First state model	High-value tests
Traversal	URLs, graph nodes, dependencies, neighbors	`visited` plus queue or stack	cycle, duplicate edge, malformed node, deterministic order
Mutable store	`set`, `get`, `delete`, `scan`, inventory operations	dict from key to record	missing key, overwrite, delete, scan ordering, expired key
Time boundary	TTL, deadline, rate limit, retry-after	injected `now`, deadline fields	equality boundary, refill edge, expired-on-read, no `sleep`
Ordering policy	priority, deadline, LRU, LFU, ready queue	heap, deque, `OrderedDict`, frequency buckets	tie-break, stale heap entry, capacity 0, promotion
Idempotent write	request ID, retry, ledger, external action	request key to stored result	duplicate success, duplicate failure, conflicting retry
Parser	logs, events, chunks, sections, streaming lines	explicit current record or buffer	malformed line, continuation, empty input, final flush
Concurrency	workers, thread-safe, fanout, cancellation	lock-protected claim point plus queue	duplicate claim, shutdown, exception, partial result
Validation	schema, payload, permissions, filters	recursive validator or rule table	nested failure, path reporting, unknown field, missing required

Use this sentence when stuck:

The invariant is X; this data structure makes X cheap to maintain; these tests prove the boundary cases.

Answer rhythm

Use the same rhythm for every timed solution:

Clarify semantics that change tests: ordering, equality boundary, failure behavior, and concurrency expectation.
Name invariant and data structures before code.
Implement smallest single-threaded version.
Add sample tests while code is still small.
Add follow-up by changing one boundary: time, ordering, retries, or locking.
Explain complexity and the first production hardening step.

For a frontier-lab coding round, communication is part of correctness. Say when you're making an assumption, say which invariant you're protecting, and say what test you'll write before you write it.

Pattern drills by week

Session	Drill set	Goal
1	crawler, filesystem, tokenizer	traversal and parsing without losing order
2	TTL store, token bucket, retry planner	deterministic time and boundary tests
3	LRU, LFU, priority scheduler	eviction, tie-breaks, and stale heap entries
4	ledger, workflow state machine, schema validator	idempotency, transitions, and explainable errors
5	worker queue, batch scheduler, stream assembler	concurrency, backpressure, and partial results
6	two random problems under 35 minutes each	speed, tests, and clean explanation

Levelled prompt ladder

Many frontier coding rounds feel like one product-shaped problem that grows across levels. Practice each ladder by shipping level 1 quickly, then preserving the same invariant as requirements change.

Prompt family	Level 1	Level 2	Level 3	Level 4
GPU quota reservation system	add, remove, lookup	reserve/release capacity	expirations, over-allocation prevention	concurrency or audit log
Key/value store	set, get, delete	prefix scan and ordering	TTL and compare-and-set	transactions or snapshots
Crawler	same-host BFS	URL normalization	retries and rate limits	worker fanout and cancellation
Scheduler	dependencies and ready queue	cycle detection	retries and deadlines	workers, cancellation, fairness
Cache	get/put capacity eviction	update semantics	TTL or LFU	metrics and thread safety
Chat or event router	register handlers	route messages	priorities or filters	replay, idempotency, backpressure
Log processor	parse records	multiline events	top errors and windows	malformed input and streaming
Permission filter	include/exclude resources	groups and inheritance	deny precedence	audit why each item passed
Stream assembler	append chunks	sequence ordering	duplicate/missing chunks	timeout and memory cap
Experiment splitter	assign users	stable hashing	ramp percentages	sticky overrides and rollback

Use this timing target:

Time	Target
0-5 min	clarify ordering, failure behavior, and mutable state
5-20 min	level 1 complete with tests
20-32 min	level 2 or 3 complete without rewriting
32-38 min	edge tests and complexity
final minutes	name next follow-up design, race risk, and production hardening

Coding readiness rubric

Score yourself after each drill.

Signal	Not ready	Ready	Strong
Clarification	starts coding before semantics	asks about order, missing inputs, equality boundary	predicts which answer changes tests
Data model	grows incidental state	names records and invariants	can extend same model across levels
Tests	only happy path	missing, duplicate, boundary, and sample tests	tests one future follow-up before code
Standard library	reinvents queues, counters, heaps, caches	uses obvious standard tool	explains why that tool fits invariant
Debugging	stares at failure	narrows with print/assert/repro	states expected vs actual before editing
Follow-up handling	rewrites from scratch	changes one boundary at a time	names tradeoff and reversal signal
Communication	quiet implementation	narrates assumptions and complexity	keeps interviewer aligned through decisions

When a drill scores "not ready," repeat the same prompt family with different nouns. Train transfer, not memorization.

Pattern testcase checklist

State the tests before coding the follow-up. That keeps the interviewer aligned and prevents late rewrites.

Pattern	Minimum public tests	Private-edge tests to rehearse
Crawler or graph traversal	start node, duplicate edge, off-host or invalid neighbor	cycles, relative links, malformed URL, empty graph, deterministic order
TTL store or rate limiter	set/get, expiry, scan, refill	equality boundary, zero TTL, time moving backward, cleanup during read
LRU or LFU cache	capacity eviction, update existing key, get promotion	capacity 0, tie-break by recency, expired entry, stale heap record
Scheduler	independent tasks, dependency release, retry	cycle, missing dependency, permanent failure, blocked dependents
Ledger	deposit, withdraw, transfer	duplicate idempotency key, conflicting retry, insufficient funds, lock order
Parser	one record, multiline continuation, malformed line	orphan continuation, empty input, final flush, very large record
Validator	required field, unknown field, nested list	error path, default value, type coercion, repeated failure
Stream assembler	in-order chunks, out-of-order chunks	duplicate chunk, missing end marker, timeout, memory cap

Use this sentence before running tests:

I expect these tests to fail if I lose the invariant. The invariant is X; the edge case most likely to break it: Y.

Debugging protocol

When a test fails, narrate the smallest useful investigation:

Read the assertion and say expected versus actual.
Reproduce with one smaller fixture.
Print or inspect the state that owns the invariant.
Fix the state transition, not the symptom.
Add one regression test before moving to the next follow-up.

Common debug pivots:

Symptom	First thing to inspect
Duplicate output	claim point or `visited` insertion timing
Missing output	enqueue condition, prefix filter, or failure policy
Wrong order	queue/heap tie-break and where sorting happens
Expired key returned	read path skipped cleanup
Retry ran too many times	attempt counter increment point
Blocked task marked failed	dependency failure policy mixed with task execution failure
Threaded result flakes	shared state changed outside the lock

Strong candidates aren't bug-free. They recover fast because they can name which invariant is broken.

Live coding narration bank

Practice short narration so you don't go silent while thinking. Use these sentences as scaffolding, then replace the placeholders with prompt-specific details.

Moment	What to say
Before coding	"I will first ship the single-threaded version, then add `follow-up` without changing the core invariant."
Choosing data structures	"The cheap operation needs to be `operation`, so I will store `state` as `structure`."
Before tests	"I want tests for happy path, duplicate input, boundary condition, and malformed input."
On failure	"Expected `X`, got `Y`. The likely owner is `state transition`, so I will inspect that before editing."
Adding TTL or deadlines	"I will inject time so expiry tests don't sleep and the equality boundary is explicit."
Adding concurrency	"The shared state is `X`; the claim point is `Y`; work runs outside the lock."
Running out of time	"The base invariant is correct. The next production hardening step is `race/cleanup/backpressure`."

Follow-up triage map

When the interviewer adds a follow-up, classify it before changing code.

Follow-up type	First move	Common trap
Time	inject `now`, store deadline, clean on read	sleeping in tests
Ordering	choose deque, heap, sort, or insertion order	mixing policy into unrelated state
Capacity	evict by explicit rule	hidden off-by-one at capacity 0 or 1
Retry	store attempt count and final state	retrying external writes without idempotency
Snapshot	version records or copy-on-write	mutating data a snapshot should freeze
Compare-and-set	lock compare and write together	calling `get` then `set` as two operations
Thread safety	protect claim point and shared maps	holding lock during slow fetch or execution
Cancellation	durable flag plus cooperative checks	stopping new work while workers keep publishing children

Good follow-up handling sounds calm because the state boundary is still visible.

Mock coding prompts

Use these as timed practice. Read the prompt, write a small solution and tests, then open the solution guide.

Prompt 1: same-host crawler

You're given a starting URL and a function get_links(url) -> list[str]. Implement a crawler that visits only URLs on the same host as the start URL.

Requirements:

Return visited URLs in deterministic breadth-first order for the single-threaded version.
Don't visit the same URL twice.
Ignore malformed URLs and off-host URLs.
Follow-up: add worker fanout without corrupting visited.
Follow-up: add a per-host rate limit and cancellation event.

Clarifying questions to ask:

Should URLs be normalized for fragments, trailing slashes, query strings, and redirects?
Should failed fetches be retried, skipped, or returned separately?
Does deterministic order still matter after worker fanout, or only for the single-threaded version?

Solution guide

Use a FIFO work queue for the base version because the prompt asks for breadth-first order. Put URL normalization and host checks before enqueueing. The invariant is: a URL enters the queue only after it has been accepted into visited.

same-host-crawler.py

from collections import deque
from urllib.parse import urlparse, urljoin

PAGES = {
    "https://repo.example/start": ["/tests", "/ci", "https://other.example/x"],
    "https://repo.example/tests": ["/ci"],
    "https://repo.example/ci": [],
}

def crawl(start_url: str, get_links) -> list[str]:
    start = urlparse(start_url)
    if not start.scheme or not start.netloc:
        return []

    visited = {start_url}
    queue = deque([start_url])
    ordered: list[str] = []

    while queue:
        url = queue.popleft()
        ordered.append(url)

        for raw_link in get_links(url):
            candidate = urljoin(url, raw_link)
            parsed = urlparse(candidate)
            if parsed.scheme not in {"http", "https"}:
                continue
            if parsed.netloc != start.netloc:
                continue
            if candidate in visited:
                continue
            visited.add(candidate)
            queue.append(candidate)

    return ordered

print(crawl("https://repo.example/start", lambda url: PAGES.get(url, [])))

Output

['https://repo.example/start', 'https://repo.example/tests', 'https://repo.example/ci']

Follow-up guide

For worker fanout, say explicitly that deterministic return order becomes expensive unless the prompt still requires it. Protect the claim step, not the whole fetch:

crawler-claim-step.py

from threading import Lock

visited = {"https://repo.example/start"}
visited_lock = Lock()

def claim(candidate: str) -> bool:
    with visited_lock:
        if candidate in visited:
            return False
        visited.add(candidate)
        return True

print(claim("https://repo.example/tests"))
print(claim("https://repo.example/tests"))
print(sorted(visited))

Output

True
False
['https://repo.example/start', 'https://repo.example/tests']

For rate limiting, add a per-host token bucket or next-allowed timestamp before fetch. For cancellation, check a threading.Event before scheduling new work, before fetching, and after each fetch before enqueueing children. The full answer should name all three behaviors: duplicate prevention, bounded fetch concurrency, and graceful stop.

Prompt 2: TTL key/value store

Implement an in-memory store with set(key, value, ttl=None), get(key), delete(key), and scan(prefix).

Requirements:

ttl is measured in seconds.
Expired keys behave as missing.
Tests must not call sleep.
Follow-up: add compare-and-set.
Follow-up: add thread safety.

Clarifying questions to ask:

Should scan(prefix) return keys, values, or key/value pairs?
Is a key expired when expires_at == now, or only when expires_at < now?
Should compare-and-set treat an expired key as missing?

Solution guide

Inject now so tests can advance time directly. Store expires_at beside each value. The invariant is: every public read path either returns a non-expired value or removes the expired key.

ttl-store.py

from dataclasses import dataclass
from typing import Callable

@dataclass
class Entry:
    value: str
    expires_at: float | None

class Store:
    def __init__(self, now: Callable[[], float]) -> None:
        self.now = now
        self.items: dict[str, Entry] = {}

    def set(self, key: str, value: str, ttl: float | None = None) -> None:
        expires_at = None if ttl is None else self.now() + ttl
        self.items[key] = Entry(value, expires_at)

    def get(self, key: str) -> str | None:
        entry = self.items.get(key)
        if entry is None:
            return None
        if entry.expires_at is not None and entry.expires_at <= self.now():
            self.items.pop(key, None)
            return None
        return entry.value

    def delete(self, key: str) -> None:
        self.items.pop(key, None)

    def scan(self, prefix: str) -> list[str]:
        return sorted(key for key in list(self.items) if key.startswith(prefix) and self.get(key) is not None)

clock = {"now": 10.0}
store = Store(lambda: clock["now"])
store.set("task:1", "running", ttl=5.0)
store.set("task:2", "ready")
print(store.get("task:1"), store.scan("task:"))
clock["now"] = 15.0
print(store.get("task:1"), store.scan("task:"))

Output

running ['task:1', 'task:2']
None ['task:2']

Follow-up guide

For compare-and-set, clean any expired value and compare while holding the same lock. The compact example below omits TTL so the atomic boundary is easy to see:

compare-and-set.py

from threading import RLock

class AtomicStore:
    def __init__(self) -> None:
        self.items: dict[str, str] = {}
        self.lock = RLock()

    def compare_and_set(self, key: str, expected: str | None, value: str) -> bool:
        with self.lock:
            if self.items.get(key) != expected:
                return False
            self.items[key] = value
            return True

store = AtomicStore()
print(store.compare_and_set("task:1", None, "running"))
print(store.compare_and_set("task:1", None, "ready"))
print(store.compare_and_set("task:1", "running", "ready"))

Output

True
False
True

For the TTL store, wrap every public method with the same lock. If compare_and_set() calls get() and set() while holding that lock, use an RLock or split the internal helpers so the lock is acquired once. The important answer is atomicity: no other thread can change the key between compare and set.

Prompt 3: task scheduler with retries

Implement a scheduler for tasks with dependencies. A task becomes runnable when all dependencies have completed.

Requirements:

Detect dependency cycles before running.
Run ready tasks in priority order.
Retry failed tasks up to max_attempts.
Return completed tasks, permanently failed tasks, and blocked dependents separately.
Follow-up: add worker fanout.

Clarifying questions to ask:

Does a lower priority number run first, or does a higher number run first?
If a dependency permanently fails, should dependents be marked failed, skipped, or blocked?
Are retries immediate, delayed, or scheduled with backoff?

Solution guide

Represent the graph explicitly: dependents[task] points to tasks released by this task, and remaining[task] counts unmet dependencies. Use heapq for priority. The invariant is: a task enters the heap only when remaining[task] == 0.

dependency-scheduler.py

import heapq
from collections import defaultdict
from dataclasses import dataclass

@dataclass(frozen=True)
class Task:
    name: str
    priority: int
    deps: tuple[str, ...] = ()

def schedule(tasks: list[Task], run, max_attempts: int = 2) -> tuple[list[str], list[str], list[str]]:
    if max_attempts <= 0:
        raise ValueError("max_attempts must be positive")
    by_name = {task.name: task for task in tasks}
    if len(by_name) != len(tasks):
        raise ValueError("duplicate task name")
    dependents: dict[str, list[str]] = defaultdict(list)
    remaining = {task.name: len(task.deps) for task in tasks}

    for task in tasks:
        for dep in task.deps:
            if dep not in by_name:
                raise ValueError(f"unknown dependency: {dep}")
            dependents[dep].append(task.name)

    state: dict[str, int] = {}

    def visit(name: str) -> None:
        marker = state.get(name, 0)
        if marker == 1:
            raise ValueError("cycle")
        if marker == 2:
            return
        state[name] = 1
        for dep in by_name[name].deps:
            visit(dep)
        state[name] = 2

    for name in by_name:
        visit(name)

    ready = [(task.priority, task.name) for task in tasks if remaining[task.name] == 0]
    heapq.heapify(ready)
    attempts = defaultdict(int)
    completed: list[str] = []
    failed: list[str] = []

    while ready:
        _, name = heapq.heappop(ready)
        attempts[name] += 1
        if not run(name):
            if attempts[name] < max_attempts:
                heapq.heappush(ready, (by_name[name].priority, name))
            else:
                failed.append(name)
            continue

        completed.append(name)
        for child in dependents[name]:
            remaining[child] -= 1
            if remaining[child] == 0:
                heapq.heappush(ready, (by_name[child].priority, child))

    blocked = sorted(set(by_name) - set(completed) - set(failed))
    return completed, failed, blocked

tasks = [
    Task("pack", priority=2),
    Task("ship", priority=3, deps=("pack",)),
    Task("label", priority=1),
]
fail_once = {"pack"}

def run_with_one_retry(name: str) -> bool:
    if name in fail_once:
        fail_once.remove(name)
        return False
    return True

print(schedule(tasks, run_with_one_retry))
print(schedule([Task("pack", 1), Task("ship", 2, ("pack",))], lambda _: False, max_attempts=1))

Output

(['label', 'pack', 'ship'], [], [])
([], ['pack'], ['ship'])

Follow-up guide

Keep permanently failed tasks separate from blocked dependents. A blocked task never ran; callers may skip it, surface the failed prerequisite, or retry after repair.

For fanout, keep graph construction and cycle detection single-threaded. Then protect only shared scheduler state: ready heap, attempts, completed, failed, and remaining dependency counts. Worker threads can run tasks outside the lock, then reacquire the lock to publish success/failure and release dependents.

If asked about retries, say whether retries preserve priority or use backoff. A concise full answer: "Ready tasks are claimed under a condition variable, execution happens outside the lock, and completion updates notify workers when new tasks become ready."

Drill 1: token bucket with deterministic time

Rate limiters show up because they combine state, boundary conditions, and production behavior. Retry policies often need jitter to avoid synchronized retry storms, so this prompt is really about overload control as much as counters.^{[3]Reference 3Exponential Backoff And Jitterhttps://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/} A strong implementation injects time so tests don't sleep.

token-bucket.py

import math
from dataclasses import dataclass

@dataclass
class Bucket:
    capacity: float
    refill_per_second: float
    tokens: float
    updated_at: float

class TokenBucketLimiter:
    def __init__(self, capacity: int, refill_per_second: float) -> None:
        if capacity <= 0:
            raise ValueError("capacity must be positive")
        if not math.isfinite(refill_per_second) or refill_per_second <= 0:
            raise ValueError("refill_per_second must be positive and finite")
        self.capacity = float(capacity)
        self.refill_per_second = float(refill_per_second)
        self._buckets: dict[str, Bucket] = {}

    def allow(self, key: str, now: float, cost: float = 1.0) -> tuple[bool, float]:
        if not math.isfinite(now):
            raise ValueError("now must be finite")
        if not math.isfinite(cost) or not 0 < cost <= self.capacity:
            raise ValueError("cost must be greater than zero and no larger than capacity")
        bucket = self._buckets.get(key)
        if bucket is None:
            bucket = Bucket(self.capacity, self.refill_per_second, self.capacity, now)
            self._buckets[key] = bucket
        if now < bucket.updated_at:
            raise ValueError("now must not move backwards")

        elapsed = max(0.0, now - bucket.updated_at)
        bucket.tokens = min(bucket.capacity, bucket.tokens + elapsed * bucket.refill_per_second)
        bucket.updated_at = now

        if bucket.tokens >= cost:
            bucket.tokens -= cost
            return True, 0.0

        missing = cost - bucket.tokens
        retry_after = missing / bucket.refill_per_second
        return False, retry_after

limiter = TokenBucketLimiter(capacity=3, refill_per_second=1.0)
print([limiter.allow("org-a", now=0.0)[0] for _ in range(4)])
print(limiter.allow("org-a", now=0.5))
print(limiter.allow("org-a", now=1.0))
try:
    limiter.allow("org-a", now=2.0, cost=4.0)
except ValueError as error:
    print(error)

Output

[True, True, True, False]
(False, 0.5)
(True, 0.0)
cost must be greater than zero and no larger than capacity

What to say out loud:

The key can be a user, organization, endpoint, or model.
now is injected for deterministic tests.
Impossible costs fail immediately instead of returning retry advice that can never succeed.
Cleanup for idle buckets is a production memory concern, not a correctness requirement for the base prompt.
A thread-safe version needs a lock around _buckets and bucket mutation.

Drill 2: thread-safe same-host crawler shape

A crawler prompt tests graph traversal plus concurrency. The invariant is simple: each URL is claimed once before it's fetched or enqueued.

single-process-crawler-core.py

from collections import deque
from urllib.parse import urlparse, urljoin

PAGES = {
    "https://lab.example/start": ["/a", "/b", "https://other.example/x"],
    "https://lab.example/a": ["/b", "/c"],
    "https://lab.example/b": ["/c"],
    "https://lab.example/c": [],
}

def get_urls(url: str) -> list[str]:
    return PAGES.get(url, [])

def same_host_crawl(start_url: str) -> list[str]:
    start_host = urlparse(start_url).netloc
    queue = deque([start_url])
    visited = {start_url}
    ordered: list[str] = []

    while queue:
        url = queue.popleft()
        ordered.append(url)
        for raw_link in get_urls(url):
            link = urljoin(url, raw_link)
            if urlparse(link).netloc != start_host:
                continue
            if link in visited:
                continue
            visited.add(link)
            queue.append(link)

    return ordered

print(same_host_crawl("https://lab.example/start"))

Output

['https://lab.example/start', 'https://lab.example/a', 'https://lab.example/b', 'https://lab.example/c']

Concurrency follow-up:

Protect visited with a lock.
Claim a URL while holding the lock, before scheduling a worker.
Keep output nondeterministic unless the prompt explicitly asks for deterministic order.
Add timeouts and failed-fetch handling without retry storms.

Drill 3: ledger with idempotency

Ledger prompts test whether you can keep money-like state consistent. Use append-only events when possible; if you maintain balances, update balance and event together.

ledger-idempotency.py

from dataclasses import dataclass

@dataclass(frozen=True)
class Event:
    idempotency_key: str
    account: str
    delta: int
    balance_after: int

class Ledger:
    def __init__(self) -> None:
        self.balance: dict[str, int] = {}
        self.events: list[Event] = []
        self.results: dict[str, Event] = {}

    def apply(self, key: str, account: str, delta: int) -> int:
        if key in self.results:
            prior = self.results[key]
            if (prior.account, prior.delta) != (account, delta):
                raise ValueError("idempotency key reused with different operation")
            return prior.balance_after
        new_balance = self.balance.get(account, 0) + delta
        if new_balance < 0:
            raise ValueError("insufficient funds")
        self.balance[account] = new_balance
        event = Event(key, account, delta, new_balance)
        self.events.append(event)
        self.results[key] = event
        return new_balance

ledger = Ledger()
print(ledger.apply("deposit-1", "acct", 100))
print(ledger.apply("withdraw-1", "acct", -30))
print(ledger.apply("deposit-1", "acct", 100))
print(ledger.balance["acct"], len(ledger.events))
try:
    ledger.apply("deposit-1", "acct", 200)
except ValueError as error:
    print(error)

Output

100
70
100
70 2
idempotency key reused with different operation

A replay with the same key and operation returns the stored result, even if the account changed later. Reusing a key for a different operation fails loudly instead of silently returning an unrelated balance.

Transfer follow-up:

Use one idempotency key for the whole transfer.
Lock account IDs in sorted order to avoid deadlock.
Record both debit and credit events together.
Define whether external side effects happen before or after durable commit.

A reusable lock-order helper makes deadlock prevention concrete:

ordered-account-locks.py

from contextlib import ExitStack
from threading import Lock

locks = {"acct-a": Lock(), "acct-b": Lock()}

def lock_accounts(*account_ids: str) -> ExitStack:
    stack = ExitStack()
    for account_id in sorted(set(account_ids)):
        stack.enter_context(locks[account_id])
    return stack

with lock_accounts("acct-b", "acct-a"):
    print("locked:", sorted({"acct-b", "acct-a"}))

Output

locked: ['acct-a', 'acct-b']

Drill 4: multiline log parser

Log parsers test string handling and explicit malformed-input policy. Group indented continuation lines under the previous valid event. Ignore orphan continuations and malformed records in this base version; a production parser should count or collect rejects.

multiline-log-parser.py

import json

def parse_events(text: str) -> list[dict[str, str]]:
    events: list[dict[str, str]] = []
    current: dict[str, str] | None = None

    for line in text.splitlines():
        if line.startswith(" "):
            if current is not None:
                current["message"] += f" {line.strip()}"
            continue

        parts = line.split("|", maxsplit=2)
        if len(parts) != 3:
            current = None
            continue

        timestamp, level, message = parts
        current = {"timestamp": timestamp, "level": level, "message": message}
        events.append(current)

    return events

LOGS = """  orphan continuation
2026-06-02T12:00:00Z|ERROR|request failed
  timeout while calling model
malformed line
2026-06-02T12:01:00Z|INFO|retry queued"""

print(json.dumps(parse_events(LOGS), indent=2))

Output

[
  {
    "timestamp": "2026-06-02T12:00:00Z",
    "level": "ERROR",
    "message": "request failed timeout while calling model"
  },
  {
    "timestamp": "2026-06-02T12:01:00Z",
    "level": "INFO",
    "message": "retry queued"
  }
]

Concurrency answer template

When asked to make a solution concurrent, say this before writing code:

Shared state is X.
The lock protects X.
A work item is claimed at this point.
Worker shutdown happens through this sentinel, event, or executor lifecycle.
Failed work records an error and doesn't corrupt shared state.

This sounds mechanical because it should. Concurrency interview failures usually come from vague ownership.

Common pitfalls

Solving version 1, then adding TTL, transactions, or threads without restating the invariant.
Using wall-clock sleeps in tests instead of injected time.
Making crawler output order part of correctness after adding concurrent workers.
Treating idempotency as "retry the operation" instead of recording the request key and result semantics.
Adding a global lock around everything without explaining throughput, deadlock, and fairness tradeoffs.

Mastery checklist

Implement a rate limiter with deterministic time and boundary tests.
Implement a same-host crawler and explain where to place the visited lock.
Implement a ledger with idempotency and insufficient-funds behavior.
Explain how to add TTLs to a key/value store without sleeping in tests.
Build a dependency scheduler with cycle detection and a ready queue.
Parse multiline logs with malformed-line handling.
State complexity and memory growth for each solution.

Next Step

Continue to AI Lab System Design Interview

You'll turn the same building blocks into end-to-end AI/backend designs with scale, overload behavior, permissions, rollout gates, and observability.

PreviousReasoning & Test-Time Compute

Share this article

X Facebook LinkedIn Bluesky Reddit Hacker News Email

References

Interview guide

OpenAI · 2026

Careers

Frontier AI lab · 2026

Exponential Backoff And Jitter

Brooker, M. (AWS) · 2015

Discussion

Questions and insights from fellow learners.

Discussion loads when you reach this section.

AI Lab Coding Interview: Python Systems

Four-round prep map

The operating model

What is the common failure mode in staged coding rounds?

Python tools to know cold

Why inject now instead of calling time.sleep() in a TTL or rate-limit interview test?

Prompt bank

Frontier live-practice set

Live drill contract

Question pattern taxonomy

Answer rhythm

Pattern drills by week

Levelled prompt ladder

Coding readiness rubric

Pattern testcase checklist

Debugging protocol

Live coding narration bank

Follow-up triage map

Mock coding prompts

Prompt 1: same-host crawler

Follow-up guide

Prompt 2: TTL key/value store

Follow-up guide

Prompt 3: task scheduler with retries

Follow-up guide

Drill 1: token bucket with deterministic time

Drill 2: thread-safe same-host crawler shape

Drill 3: ledger with idempotency

Drill 4: multiline log parser

What invariant should you say before adding threads to a crawler, scheduler, or cache?

Concurrency answer template

Common pitfalls

Mastery checklist

Mastery Check

Discussion