LeetLLM
LearnPracticeFeaturesBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Practice
  • Features
  • Blog

Legal

  • Terms of Service
  • Privacy Policy

© 2026 LeetLLM. All rights reserved.

All Topics
Your Progress
0%

0 of 155 articles completed

🛠️Computing Foundations0/6
NumPy and Tensor ShapesCUDA for ML TrainingMPS & Metal for ML on MacData Structures for AISQL and Data ModelingAlgorithms for ML Engineers
📊Math & Statistics0/8
Gradients and BackpropVectors, Matrices & TensorsLinear Algebra for MLAdam, Momentum, SchedulersProbability for Machine LearningStatistics and UncertaintyDistributions and SamplingHypothesis Tests, Intervals, and pass@k
📚Preparation & Prerequisites0/13
Neural Networks from ScratchCNNs from ScratchTraining & BackpropagationSoftmax, Cross-Entropy & OptimizationRNNs, LSTMs, GRUs, and Sequence ModelingAutoencoders and VAEsThe Transformer Architecture End-to-EndLanguage Modeling & Next TokensFrom GPT to Modern LLMsPrompt Engineering FundamentalsCalling LLM APIs in ProductionFirst AI App End-to-EndThe LLM Lifecycle
🧮ML Algorithms & Evaluation0/11
Linear Regression from ScratchLogistic Regression and MetricsDecision Trees, Forests, and BoostingReinforcement Learning BasicsValidation and LeakageClustering and PCACore Retrieval AlgorithmsDecoding AlgorithmsExperiment Design and A/B TestingPyTorch Training LoopsDataset Pipelines and Data Quality
📦Production ML Systems0/6
Feature Engineering for Production MLBatch and Streaming Feature PipelinesGradient Boosted Trees in ProductionRanking and Recommendation SystemsForecasting and Anomaly DetectionMonitoring Predictive Models
🧪Core LLM Foundations0/8
The Bitter Lesson & ComputeBPE, WordPiece, and SentencePieceStatic to Contextual EmbeddingsPerplexity & Model EvaluationFile Ingestion for AIChunking StrategiesLLM Benchmarks & LimitationsInstruction Tuning & Chat Templates
🧰Applied LLM Engineering0/23
Dimensionality Reduction for EmbeddingsCoT, ToT & Self-Consistency PromptingFunction Calling & Tool UseMCP & Tool Protocol StandardsPrompt Injection DefenseResponsible AI GovernanceData Labeling and Human FeedbackEvaluating AI AgentsProduction RAG PipelinesHybrid Search: Dense + SparseReranking and Cross-Encoders for RAGRAG Evaluation for Reliable AnswersLLM-as-a-Judge EvaluationBias & Fairness in LLMsHallucination Detection & MitigationLLM Observability & MonitoringExperiment Tracking with MLflow and W&BMixed Precision TrainingModel Versioning & DeploymentSemantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsModel Gateways, Routing, and FallbacksDesign an Automated Support Agent
🎓Portfolio Capstones0/9
Capstone: Delivery ETA PredictionCapstone: Product RankingCapstone: Demand ForecastingCapstone: Image Damage ClassifierCapstone: Production ML PipelineCapstone: Document QACapstone: Eval DashboardCapstone: Fine-Tuned ClassifierCapstone: Production Agent
🧠Transformer Deep Dives0/8
Sentence Embeddings & Contrastive LossEmbedding Similarity & QuantizationScaled Dot-Product AttentionVision Transformers and Image EncodersPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNMechanistic InterpretabilityDecoding Strategies: Greedy to Nucleus
🧬Advanced Training & Adaptation0/16
Scaling Laws & Compute-Optimal TrainingPre-training Data at ScaleBuild GPT from Scratch LabContinued Pretraining for Domain ShiftSynthetic Data PipelinesSupervised Fine-Tuning PipelineDistributed Training: FSDP & ZeROLoRA & Parameter-Efficient TuningReward Modeling from Preference DataRLHF & DPO AlignmentConstitutional AI & Red TeamingRLVR & Verifiable RewardsKnowledge Distillation for LLMsModel Merging and Weight InterpolationPrompt Optimization with DSPyRecursive Language Models (RLM)
🤖Advanced Agents & Retrieval0/14
Vector DB Internals: HNSW & IVFAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access ControlStructured Output GenerationReAct & Plan-and-ExecuteGuardrails & Safety FiltersCode Generation & SandboxingComputer-Use / GUI / Browser AgentsHuman-in-the-Loop Agent ArchitectureAI Coding Workflow with AgentsAgent Memory & PersistenceAgent Failure & RecoveryMulti-Agent Orchestration
⚡Inference & Production Scale0/20
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionPrefix Caching and Prompt CachingFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceModel Parallelism for LLM InferenceModel Quantization: GPTQ, AWQ & GGUFLocal LLM DeploymentSLM Specialization & Edge DeploymentSpeculative DecodingLong Context Window ManagementContext EngineeringMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time ComputeAdvanced MLOps & DevOps for AIGPU Serving & AutoscalingA/B Testing for LLMs
🏗️System Design Capstones0/9
Content Moderation SystemCode Completion SystemMulti-Tenant LLM PlatformLLM-Powered Search EngineVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models & Image GenerationReal-Time Voice AI AgentReasoning & Test-Time Compute
🎤AI Lab Interviewing0/4
AI Lab Coding Interview: Python SystemsAI Lab System Design InterviewAI Lab Behavioral InterviewAI Lab Technical Presentation
Back to Topics
LearnComputing FoundationsData Structures for AI
⚙️EasyMLOps & Deployment

Data Structures for AI

A beginner-first data-structures chapter that starts with a list scan, then teaches inverted indexes, heaps, queues, and caches through one support-search story.

15 min read
Learning path
Step 4 of 155 in the full curriculum
MPS & Metal for ML on MacSQL and Data Modeling

Data structures decide what an AI system can find, update, rank, and reuse quickly. If the structure matches the operation, a support bot can jump to the right help article, a background worker can process jobs in order, and a retrieval cache can reuse ranked document IDs without serving stale results.

Imagine a customer opens a support chat and types:

refund email

Behind the scenes, the company has thousands of help articles. The system can't read each article from start to finish under a tight latency budget. It needs a way to jump straight to the handful of articles that mention those words.

That jump is the job of a data structure. The NumPy lesson taught you that arrays and tensors are good when position is the whole story: if you know row 2, column 1, you can land on the value instantly. The CUDA and MPS chapters then showed where those tensors execute. This chapter answers the next systems question: what if your lookup doesn't start with a position? What if it starts with a word like refund, a repeated query like refund email, or a stream of jobs waiting to be processed?

Production AI products don't use one data structure for every job. A lexical retriever can index terms, an ingestion worker can queue jobs, and a retrieval cache can reuse ranked IDs only while their source documents remain current. Each structure exists because a particular operation repeats.

Picking a structure by the operation it must serve

Keep one question in mind: what operation must this structure serve?

  • Do I need ordered items?
  • Do I need exact lookup by key?
  • Do I need uniqueness?
  • Do I need the best few results?
  • Do I need first-in, first-out (FIFO) work?
  • Do I need to reuse previous results safely?

That is the real skill behind data structures. You are not memorizing names. You are matching an operation to a shape.

Support-search overview showing documents flowing through an inverted index into top-k ranking, with FIFO queue and versioned cache state beside the query path. Support-search overview showing documents flowing through an inverted index into top-k ranking, with FIFO queue and versioned cache state beside the query path.
The support-search path narrows work before ranking: documents become posting lists, posting lists produce a small candidate set, and top-k keeps the best few IDs.

The previous NumPy lesson focused on position-based access. Here the query starts with a term, a top-k request, or a work queue, so the shape has to change.

One running example

We will use one tiny support-search story from start to finish. The documents are small enough to trace by hand, but the ideas scale to the retrieval layer inside a retrieval-augmented generation (RAG) system, where a large language model (LLM) answers with retrieved evidence.

doc idtext
0refund policy details
1account password reset
2refund status email

User query:

refund email

We will use this story five times:

  1. A list scan to show the smallest correct solution.
  2. A dictionary of sets to make term lookup fast.
  3. A heap to keep the top results.
  4. A queue to show first-in, first-out work.
  5. A cache to show safe reuse.

The list scan: the smallest working version

A list is the right starting point because the data is tiny and the order of documents is stable.

If you scan each document by hand, the match counts look like this:

doc idcontains refundcontains emailtotal matches
0101
1000
2112

Document 2 should rank above document 0.

This matters for beginners because a list scan is not "wrong." It is the smallest working version. The mistake is staying with a full scan after the question has changed.

When the corpus grows from 3 documents to 3 million, a query like refund email shouldn't force the system to reread each document. A full scan touches every document, so its cost grows in proportion to the corpus size. A dictionary lookup, by contrast, jumps to the value in roughly constant time on average regardless of how big the dictionary gets, because it hashes the key directly to its slot.[1] That doesn't make the whole search constant-time: the retriever still reads the posting lists for the requested terms and ranks the resulting candidates. The next chapter formalizes this gap with Big-O notation; for now the intuition is enough: the data structure must let the query enter the system in its natural form, by term, not by document position.

Building an inverted index

The query starts with a term such as refund. That means the system should jump from a term to the matching documents.

This is what an inverted index does.

An inverted index maps each term to its posting list. A posting records a matching document ID; larger indexes can store additional information such as term positions or term frequency. In this first version, a Python set[int] is enough because our score asks only whether each query term appears in each document.[2]

Build the index by hand

Split each document into terms and store the matching document IDs:

termdocument IDs
refund{0, 2}
policy{0}
details{0}
account{1}
password{1}
reset{1}
status{2}
email{2}

Now the query can enter through the index:

query termposting list
refund{0, 2}
email{2}

Turn those posting lists into match counts:

doc idmatches from refundmatches from emailfinal score
0101
2112

Document 2 wins because it matches both terms. Document 0 still matters because it matches one term.

Why a set lives inside the dictionary

The dictionary solves exact term lookup. The set solves uniqueness.

Suppose document 2 were refund refund email. The posting list for refund should still contain document 2 once, not twice. That is why the value is a set of document IDs instead of a list of document IDs.

This is a good beginner habit:

  • Dictionary asks: "given this key, where is the value?"
  • Set asks: "have I already seen this item?"

Production lexical indexes commonly keep ordered, compressed posting lists and richer statistics. The Python set is a teaching choice: it makes membership and duplicate prevention visible before you optimize storage or intersections.[2]

Here is the smallest proof. The third support article says refund twice, but membership in the refund posting set still contributes one document ID.

posting-list-membership.py
1from collections import defaultdict 2 3docs_with_repeat = [ 4 "refund policy details", 5 "account password reset", 6 "refund refund email", 7] 8postings: dict[str, set[int]] = defaultdict(set) 9 10for doc_id, text in enumerate(docs_with_repeat): 11 for term in text.split(): 12 postings[term].add(doc_id) 13 14print("refund postings", sorted(postings["refund"])) 15print("document 2 counted once", len(postings["refund"]) == 2)
Output
1refund postings [0, 2] 2document 2 counted once True

Ranking with a heap

Once documents have scores, the system needs the best few results. It doesn't need a full sorted report about the corpus.

That is the moment for a heap.

For our tiny example, scores are:

doc idscore
22
01

A heap keeps the top k results easy to retrieve. In retrieval systems, that matters because you often want the best 5, 10, or 20 candidates, rather than a fully sorted candidate list.

For a stream of scored candidates, keep a min-heap capped at k items. The weakest retained score stays at the root. When a stronger candidate arrives, replace that weakest item instead of sorting every candidate seen so far.

Real rankers also need an explicit tie-break rule. The retriever lab below prefers the lower document ID when scores tie so its output stays deterministic.

bounded-top-k-heap.py
1import heapq 2 3candidate_scores = [("doc-0", 1), ("doc-2", 2), ("doc-3", 3)] 4k = 2 5top_k: list[tuple[int, str]] = [] 6 7for doc_id, score in candidate_scores: 8 candidate = (score, doc_id) 9 if len(top_k) < k: 10 heapq.heappush(top_k, candidate) 11 elif candidate > top_k[0]: 12 evicted = heapq.heapreplace(top_k, candidate) 13 print("evicted", evicted) 14 15print("top k", sorted(top_k, reverse=True))
Output
1evicted (1, 'doc-0') 2top k [(3, 'doc-3'), (2, 'doc-2')]

The retriever pipeline

Here is how the pieces fit together in a single flow:

Lexical retrieval path showing query terms becoming posting lists, document match counts, and a top-k heap that returns the best support articles. Lexical retrieval path showing query terms becoming posting lists, document match counts, and a top-k heap that returns the best support articles.
The query enters as terms, the index returns posting lists, scores accumulate per document, and the heap keeps only the best candidates.

The query enters as plain text. Terms split apart. The index turns each term into a posting list. Scores accumulate per document. The heap returns the best k document IDs. This same pattern sits inside RAG systems, where a language model reads only the best supporting chunks instead of the whole corpus.

Code lab: a tiny retriever

Start with the slow version so you can see what the index is replacing. Then add the index and heap.

Put this in data_structures_demo.py. This first tokenizer lowercases and splits on whitespace. Production tokenizers also handle punctuation and language-specific rules, but keeping normalization tiny makes the structure visible. Returning a set means this example scores binary term membership: each distinct query term contributes at most once.

data_structures_demo.py
1from collections import defaultdict 2import heapq 3 4docs = [ 5 "refund policy details", 6 "account password reset", 7 "refund status email", 8] 9 10def tokenize(text: str) -> set[str]: 11 return set(text.lower().split()) 12 13def slow_search(query: str) -> list[tuple[int, int]]: 14 terms = tokenize(query) 15 matches: list[tuple[int, int]] = [] 16 for doc_id, text in enumerate(docs): 17 words = tokenize(text) 18 score = sum(term in words for term in terms) 19 if score: 20 matches.append((doc_id, score)) 21 return matches 22 23index: dict[str, set[int]] = defaultdict(set) 24for doc_id, text in enumerate(docs): 25 for term in tokenize(text): 26 index[term].add(doc_id) 27 28def search(query: str, k: int = 2) -> list[int]: 29 if not query.strip(): 30 return [] 31 32 scores: dict[int, int] = defaultdict(int) 33 for term in tokenize(query): 34 for doc_id in index.get(term, set()): 35 scores[doc_id] += 1 36 37 best = heapq.nlargest( 38 k, 39 scores.items(), 40 key=lambda item: (item[1], -item[0]), 41 ) 42 return [doc_id for doc_id, _score in best] 43 44print("slow scan", slow_search("refund email")) 45print("posting list", sorted(index["refund"])) 46print("search", search("refund email"))
Output
1slow scan [(0, 1), (2, 2)] 2posting list [0, 2] 3search [2, 0]

Read the data movement

JobStructureWhy it fits
keep the original corpus orderlistthe documents start as an ordered sequence
jump from term to matching docsdictionary of setsthe query starts with a term
count how many terms matcheddictionary of integerseach document needs a score
keep the best few resultsheapthe system wants top-k rather than a full sorted list

The heap key uses score first and negative document ID second. Higher scores win; lower IDs win ties. That small rule makes repeated runs predictable.

That table is more important than the syntax.

What happens beyond retrieval: queues and caches

Retrieval is one place where data structures show up in AI systems.

Queues for background work

Imagine new documents arrive and need to be split into chunks, then encoded as embedding vectors before they can enter a semantic index. For one class of equal-priority jobs, arrival order is a clean baseline. Production queues may add priorities, retries, and dead-letter handling, but they still need an explicit ordering contract.

That is a queue.

queues-for-background-work.py
1from collections import deque 2 3embedding_jobs = deque(["doc-3", "doc-4", "doc-5"]) 4next_job = embedding_jobs.popleft() 5 6print("next job", next_job) 7print("remaining", list(embedding_jobs))
Output
1next job doc-3 2remaining ['doc-4', 'doc-5']

The rule is simple:

  • append new work on one side
  • pop the oldest work from the other side

That is first-in, first-out behavior. If you accidentally process newest work first, users may see random delays or starvation in old jobs.

This failure case makes the ordering bug visible. New jobs belong at the tail; putting them at the front lets recent uploads jump ahead of old work.

queue-order-failure.py
1from collections import deque 2 3fifo_jobs = deque(["oldest-ticket"]) 4fifo_jobs.extend(["new-ticket-a", "new-ticket-b"]) 5 6newest_first_jobs = deque(["oldest-ticket"]) 7newest_first_jobs.appendleft("new-ticket-a") 8newest_first_jobs.appendleft("new-ticket-b") 9 10print("FIFO runs", fifo_jobs.popleft()) 11print("wrong order runs", newest_first_jobs.popleft())
Output
1FIFO runs oldest-ticket 2wrong order runs new-ticket-b

Caches for repeated queries

If users ask the same question repeatedly, recomputing retrieval results per request wastes work.

A retrieval cache stores prior ranked document IDs so the system can reuse them.

caches-for-repeated-queries.py
1cache: dict[tuple[int, str], list[int]] = {} 2corpus_version = 1 3query = "refund email" 4 5cache_key = (corpus_version, query) 6if cache_key not in cache: 7 cache[cache_key] = search(query) 8 9print("cached", cache[cache_key])
Output
1cached [2, 0]

Why the key includes version

cache key partwhy it exists
queryseparates one repeated request from another
corpus_versionforces refresh after the document collection changes

If document 2 stops mentioning refund, the cached ranking [2, 0] becomes fast and wrong. A version number, timestamp, or explicit invalidation rule tells the system when old results must be dropped.

Run the failure before relying on the fix. A text-only key finds yesterday's ranking after an update; a versioned key misses and forces recomputation.

cache-invalidation-failure.py
1query = "refund email" 2old_version = 1 3new_version = 2 4 5text_only_cache = {query: [2, 0]} 6versioned_cache = {(old_version, query): [2, 0]} 7 8print("text-only stale hit", query in text_only_cache) 9print("versioned stale hit", (new_version, query) in versioned_cache)
Output
1text-only stale hit True 2versioned stale hit False

Common mistakes

Beginners often don't choose the wrong structure because they lack vocabulary. They choose it because the first working code feels good enough.

SymptomLikely causeFix
query gets slower as documents growthe code still scans the whole list per querybuild an index keyed by the query field
same document appears more than oncea list was used where uniqueness mattersstore posting lists as sets
old jobs starvework is being popped in the wrong orderuse a FIFO queue and test the order explicitly
cached result looks right until documents changecache has no version or invalidation ruletie the cache key to corpus updates
ranking feels mysteriousscores are hidden inside loopsprint one query's per-document score table

The fastest way to debug is still the same: trace one tiny example by hand.

Protect the promises with tests

These tests protect the structures, rather than the final answer alone:

protect-the-promises-with-tests.py
1from collections import deque 2 3def test_refund_lookup_has_two_docs(): 4 assert index["refund"] == {0, 2} 5 6def test_empty_query_returns_empty_list(): 7 assert search("") == [] 8 9def test_refund_email_ranks_doc_2_first(): 10 assert search("refund email")[0] == 2 11 12def test_search_normalizes_case(): 13 assert search("Refund EMAIL") == [2, 0] 14 15def test_queue_is_fifo(): 16 jobs = deque(["doc-3", "doc-4"]) 17 assert jobs.popleft() == "doc-3" 18 19test_refund_lookup_has_two_docs() 20test_empty_query_returns_empty_list() 21test_refund_email_ranks_doc_2_first() 22test_search_normalizes_case() 23test_queue_is_fifo() 24print("five structural tests passed")
Output
1five structural tests passed

Each test checks one promise:

  • the index maps terms to the right documents
  • empty input is predictable
  • ranking follows visible score logic
  • query normalization is consistent
  • queue order stays first-in, first-out

Systems drill: simulate a stateful fulfillment lane

Production AI infrastructure is full of small systems with state, events, and invariants. Request queues, cache entries, retry state, tool side effects, and rollout gates all need explicit state transitions. If you can build one of these cleanly at small scale, you can transfer the pattern to larger LLM systems.

Here is a tiny fulfillment simulator. It processes events in order and refuses transitions that would break the system's promises.

systems-drill-simulate-a-stateful.py
1from collections import defaultdict 2 3inventory = {"SKU-7": 3} 4reserved: dict[str, int] = defaultdict(int) 5shipped: dict[str, int] = defaultdict(int) 6errors: list[str] = [] 7 8events = [ 9 ("reserve", "order-1", "SKU-7", 2), 10 ("ship", "order-1", "SKU-7", 1), 11 ("ship", "order-1", "SKU-7", 2), 12 ("reserve", "order-2", "SKU-7", 1), 13] 14 15def apply_event(kind: str, order_id: str, sku: str, qty: int) -> None: 16 if kind == "reserve": 17 if inventory[sku] < qty: 18 errors.append(f"{order_id}: not enough inventory") 19 return 20 inventory[sku] -= qty 21 reserved[order_id] += qty 22 return 23 24 if kind == "ship": 25 if reserved[order_id] - shipped[order_id] < qty: 26 errors.append(f"{order_id}: cannot ship more than reserved") 27 return 28 shipped[order_id] += qty 29 return 30 31 errors.append(f"{order_id}: unknown event {kind}") 32 33for event in events: 34 apply_event(*event) 35 36print("inventory", dict(inventory)) 37print("reserved", dict(reserved)) 38print("shipped", dict(shipped)) 39print("errors", errors)
Output
1inventory {'SKU-7': 0} 2reserved {'order-1': 2, 'order-2': 1} 3shipped {'order-1': 1} 4errors ['order-1: cannot ship more than reserved']

Invariant

text
1shipped[order] <= reserved[order]

If you can name the invariant, write the state transition, and produce a failing case that proves the guard works, you can handle many "build a moderately complex system quickly" prompts. The same pattern applies to token schedulers, retry queues, tool-call ledgers, and evaluation runners.

Retries need a set too

Queue workers retry failed events. If a successful shipment event is delivered twice, applying it twice silently corrupts state. A completed-event ID set turns an accidental replay into a visible skip. Record the ID only after the side effect succeeds; a failed attempt must remain retryable.

deduplicate-retried-events.py
1completed_event_ids: set[str] = set() 2shipped_units = 0 3delivery_attempts = [("ship-91", 1), ("ship-91", 1), ("ship-92", 2)] 4 5for event_id, quantity in delivery_attempts: 6 if event_id in completed_event_ids: 7 print("skipped retry", event_id) 8 continue 9 shipped_units += quantity 10 completed_event_ids.add(event_id) 11 12print("shipped units", shipped_units)
Output
1skipped retry ship-91 2shipped units 3
A set is not a transaction

This in-memory set makes sequential replay visible, but it doesn't make an external shipment atomic. If a worker ships the package and crashes before adding the event ID, a retry can ship it again. Production code needs a durable idempotency key enforced together with the side effect, or a downstream write API that enforces the same contract.

That crash window is one reason the next SQL chapter matters. Durable rows, uniqueness constraints, and transactions let a system protect state across process failure.

Practice

Try these before you look at the solution sketches.

  1. Add a fourth document: email support refund.
  2. Predict sorted(index["refund"]).
  3. Predict search("refund email").
  4. Run search("invoice") and explain why it shouldn't crash.
  5. Change document 2 so it no longer contains refund, then explain what must happen to the cache.

Solution sketches

Use these to check your reasoning, not to skip the exercise.

Practice itemWhat a good answer should show
Add email support refundrefund and email both gain the new document ID.
Predict sorted(index["refund"])[0, 2, 3] after the new document is added.
Predict search("refund email")[2, 3]. Documents 2 and 3 tie with score 2, so the lower document ID wins the deterministic tie-break.
Query invoiceindex.get(term, set()) returns an empty set, so the function returns [] instead of crashing.
Change document 2Rebuild the affected posting lists and drop cache entries tied to the old corpus version.

If your answer feels fuzzy, go back to the score table. Good data-structure reasoning should look inspectable on paper before it looks clever in code.

What to remember

Most beginner confusion disappears once you ask one concrete question:

What operation must stay fast and predictable?

Use that question to pick the structure:

If the job is...Reach for...
keep things in their original orderlist
jump from key to valuedictionary
guarantee uniqueness or fast membershipset
keep the best few resultsheap
process work in arrival orderqueue
reuse prior results safelycache

This chapter intentionally stayed small. It doesn't try to cover all classic structures in one pass. That is a feature, not a missing piece. A beginner needs a handful of structures they can trace end to end before moving to bigger systems.

Later retrieval systems may combine exact term postings with approximate nearest-neighbor (ANN) indexes over embedding vectors. Hierarchical navigable small world (HNSW) is a layered graph index for ANN search; the Faiss work describes optimized GPU similarity search for brute-force, approximate, and compressed-domain setups. The core lesson doesn't change: store data in a structure that matches the lookup you need.[3][4]

Three later patterns reuse the same ideas at larger scale:

  • Beam search: a decoder needs the best few partial generations from a huge candidate set. A heap or top-k selection routine gives you candidates without sorting the full vocabulary at each step.
  • Vector indexes: HNSW stores embeddings in a multi-layer graph so search can move from long-range links toward local neighbors. Its authors report logarithmic complexity scaling for their design, but ANN search still trades recall, latency, build time, and memory against one another.[3]
  • The key-value (KV) cache: during text generation, an LLM stores key and value vectors already computed for earlier tokens, then appends new entries as it decodes. It avoids recalculating those earlier key/value vectors, but the new query still attends over stored prefix state. The cache grows with sequence length and concurrent requests, which can limit how many requests a serving system can batch efficiently.[5][6]

Handoff to the next chapter

Data structures decide which candidates enter the model's field of view.

That is why this chapter sits before databases. First you need to know which structure makes a lookup fast. Then you are ready to put those structures behind durable tables, joins, transactions, and vector search.

Mastery check

Key concepts

  • lists
  • dictionaries
  • queues
  • heaps
  • indexes
  • sets
  • caches

Evaluation rubric

  • Foundational: Explains what kind of operation each structure serves: ordered scan, exact lookup, uniqueness, top-k selection, FIFO work, or safe reuse
  • Developing: Traces the support-search example by hand and predicts document scores before running code
  • Intermediate: Builds the inverted index and shows why dictionary plus set is better than rescanning the full document list
  • Applied: Uses a heap for top-k retrieval, a queue for background work, and a versioned cache for repeated queries
  • Advanced: Writes tests for empty input, duplicate prevention, FIFO behavior, ranking logic, and cache invalidation after corpus changes, then explains why a durable write still needs idempotency

Follow-up questions

Common pitfalls

  • A query slows down as the corpus grows. Cause: every request still scans the full list. Fix: build an index keyed by the field the query uses.
  • Ranking counts feel too high or inconsistent. Cause: duplicates leaked into posting lists. Fix: store matching document IDs in sets when membership, not repetition, is the real state.
  • Old embedding jobs never seem to finish. Cause: queue order was reversed by pushing or popping from the wrong side. Fix: test FIFO explicitly with a tiny trace.
  • A cached ranking looks correct until the documents change. Cause: the cache key doesn't include version or invalidation state. Fix: tie cache entries to corpus updates.
  • Heap output feels magical during debugging. Cause: scores stayed hidden inside loops. Fix: print one score table and compare heap output to the visible counts.
  • A replay guard works in one process but duplicate writes still appear after crashes. Cause: completion marker and side effect are not one durable operation. Fix: enforce idempotency at the write boundary.
Complete the lesson

Mastery Check

Answer every question, then check your score. Score above 75% to mark this lesson complete.

1.A support-search system stores millions of articles. Queries begin with terms such as refund, and the retriever must jump from each term to the unique document IDs containing it instead of scanning every article. Which data structure matches this lookup operation?
2.An index was built from document 5, refund refund email, and document 6, refund status. The retriever uses binary term membership: each distinct query term can add at most 1 point per document. For query refund email, which score table is correct if posting lists store document IDs as sets?
3.An indexed retriever has documents 0: refund policy details, 1: account password reset, 2: refund status email, and 3: email support refund. It scores one point for each distinct query term in a document, returns k=2 IDs, and breaks score ties by lower document ID. For query refund email, which result is correct?
4.You are keeping the top 3 scored chunks with a capped min-heap. It currently retains A at 0.52, B at 0.68, and C at 0.91, so A is the weakest retained item. A new candidate D arrives with score 0.74. What should happen?
5.An ingestion deque starts from left to right as doc-10, doc-11. A developer inserts doc-12 with appendleft() and takes the next job with popleft(). Which outcome follows?
6.A retriever caches the ranked IDs for query reset password as [4, 1] while corpus_version is 7. Later document 4 is edited and the corpus becomes version 8. Which cache design avoids serving the old ranking as a valid hit?
7.A fulfillment lane tracks reserved[order] and shipped[order] with the invariant shipped[order] <= reserved[order]. For order-1, reserved is 2 and shipped is 1. A ship event for quantity 2 arrives. What should the transition do?
8.A queue worker keeps an in-memory completed_event_ids set. It ships package event ship-91, then crashes before adding ship-91 to the set. The broker retries ship-91. Why can the retry duplicate the shipment, and what fixes this at the production boundary?

8 questions remaining.

Next Step
Continue to SQL, Databases, and Data Modeling for AI

You can now choose in-memory structures for term lookup, top-k ranking, ordered work, and safe reuse. The next chapter moves those access patterns into durable tables, constraints, transactions, permissions, and vector-aware queries that can serve more than one process.

PreviousMPS & Metal for ML on Mac
Share this article
XFacebookLinkedInBlueskyRedditHacker NewsEmail
References

Designing Data-Intensive Applications.

Kleppmann, M. · 2017

Introduction to Information Retrieval.

Manning, C. D., Raghavan, P., Schutze, H. · 2008 · Cambridge University Press

Efficient and Robust Approximate Nearest Neighbor Using Hierarchical Navigable Small World Graphs.

Malkov, Y. A., & Yashunin, D. A. · 2018 · IEEE Transactions on Pattern Analysis and Machine Intelligence

Billion-scale similarity search with GPUs.

Johnson, J., Douze, M., & Jégou, H. · 2017 · arXiv preprint

Efficiently Scaling Transformer Inference.

Pope, R., et al. · 2023 · arXiv preprint

Efficient Memory Management for Large Language Model Serving with PagedAttention

Kwon, W., et al. · 2023 · SOSP 2023