LeetLLM
LearnTracksPracticeBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Tracks
  • Practice
  • Blog
  • RSS

Legal

  • Terms of Service
  • Privacy Policy

© 2026 LeetLLM. All rights reserved.

All Topics
Your Progress
0%

0 of 158 articles completed

🛠️Computing Foundations0/9
Git, Shell, Linux for AIDocker for Reproducible AIPython for AI EngineeringNumPy and Tensor ShapesCUDA for ML TrainingMPS & Metal for ML on MacData Structures for AISQL and Data ModelingAlgorithms for ML Engineers
📊Math & Statistics0/8
Gradients and BackpropVectors, Matrices & TensorsLinear Algebra for MLAdam, Momentum, SchedulersProbability for Machine LearningStatistics and UncertaintyDistributions and SamplingHypothesis Tests, Intervals, and pass@k
📚Preparation & Prerequisites0/13
Neural Networks from ScratchCNNs from ScratchTraining & BackpropagationSoftmax, Cross-Entropy & OptimizationRNNs, LSTMs, GRUs, and Sequence ModelingAutoencoders and VAEsThe Transformer Architecture End-to-EndLanguage Modeling & Next TokensFrom GPT to Modern LLMsPrompt Engineering FundamentalsCalling LLM APIs in ProductionFirst AI App End-to-EndThe LLM Lifecycle
🧮ML Algorithms & Evaluation0/11
Linear Regression from ScratchLogistic Regression and MetricsDecision Trees, Forests, and BoostingReinforcement Learning BasicsValidation and LeakageClustering and PCACore Retrieval AlgorithmsDecoding AlgorithmsExperiment Design and A/B TestingPyTorch Training LoopsDataset Pipelines and Data Quality
📦Production ML Systems0/6
Feature Engineering for Production MLBatch and Streaming Feature PipelinesGradient Boosted Trees in ProductionRanking and Recommendation SystemsForecasting and Anomaly DetectionMonitoring Predictive Models
🧪Core LLM Foundations0/8
The Bitter Lesson & ComputeBPE, WordPiece, and SentencePieceStatic to Contextual EmbeddingsPerplexity & Model EvaluationFile Ingestion for AIChunking StrategiesLLM Benchmarks & LimitationsInstruction Tuning & Chat Templates
🧰Applied LLM Engineering0/23
Dimensionality Reduction for EmbeddingsCoT, ToT & Self-Consistency PromptingFunction Calling & Tool UseMCP & Tool Protocol StandardsPrompt Injection DefenseResponsible AI GovernanceData Labeling and Human FeedbackEvaluating AI AgentsProduction RAG PipelinesHybrid Search: Dense + SparseReranking and Cross-Encoders for RAGRAG Evaluation for Reliable AnswersLLM-as-a-Judge EvaluationBias & Fairness in LLMsHallucination Detection & MitigationLLM Observability & MonitoringExperiment Tracking with MLflow and W&BMixed Precision TrainingModel Versioning & DeploymentSemantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsModel Gateways, Routing, and FallbacksDesign an Automated Support Agent
🎓Portfolio Capstones0/9
Capstone: Delivery ETA PredictionCapstone: Product RankingCapstone: Demand ForecastingCapstone: Image Damage ClassifierCapstone: Production ML PipelineCapstone: Document QACapstone: Eval DashboardCapstone: Fine-Tuned ClassifierCapstone: Production Agent
🧠Transformer Deep Dives0/8
Sentence Embeddings & Contrastive LossEmbedding Similarity & QuantizationScaled Dot-Product AttentionVision Transformers and Image EncodersPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNMechanistic InterpretabilityDecoding Strategies: Greedy to Nucleus
🧬Advanced Training & Adaptation0/16
Scaling Laws & Compute-Optimal TrainingPre-training Data at ScaleBuild GPT from Scratch LabContinued Pretraining for Domain ShiftSynthetic Data PipelinesSupervised Fine-Tuning PipelineDistributed Training: FSDP & ZeROLoRA & Parameter-Efficient TuningReward Modeling from Preference DataRLHF & DPO AlignmentConstitutional AI & Red TeamingRLVR & Verifiable RewardsKnowledge Distillation for LLMsModel Merging and Weight InterpolationPrompt Optimization with DSPyRecursive Language Models (RLM)
🤖Advanced Agents & Retrieval0/14
Vector DB Internals: HNSW & IVFAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access ControlStructured Output GenerationReAct & Plan-and-ExecuteGuardrails & Safety FiltersCode Generation & SandboxingComputer-Use / GUI / Browser AgentsHuman-in-the-Loop Agent ArchitectureAI Coding Workflow with AgentsAgent Memory & PersistenceAgent Failure & RecoveryMulti-Agent Orchestration
⚡Inference & Production Scale0/20
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionPrefix Caching and Prompt CachingFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceModel Parallelism for LLM InferenceModel Quantization: GPTQ, AWQ & GGUFLocal LLM DeploymentSLM Specialization & Edge DeploymentSpeculative DecodingLong Context Window ManagementContext EngineeringMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time ComputeAdvanced MLOps & DevOps for AIGPU Serving & AutoscalingA/B Testing for LLMs
🏗️System Design Capstones0/9
Content Moderation SystemCode Completion SystemMulti-Tenant LLM PlatformLLM-Powered Search EngineVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models: Images & TextReal-Time Voice AI AgentReasoning & Test-Time Compute
🎤AI Lab Interviewing0/4
AI Lab Coding Interview: Python SystemsAI Lab System Design InterviewAI Lab Behavioral InterviewAI Lab Technical Presentation
Back to Topics
LearnComputing FoundationsSQL and Data Modeling
⚙️EasyMLOps & Deployment

SQL and Data Modeling

Turn an in-memory support retriever into durable SQL tables. Create rows and keys, query with parameters and joins, enforce permissions, roll back failed work, deduplicate retries, inspect indexes, and see where pgvector fits.

14 min read
Learning path
Step 8 of 158 in the full curriculum
Data Structures for AIAlgorithms for ML Engineers

A Python dictionary retriever works while one process holds all the facts. Close that process and the index is gone. Put two support agents on it and you also need shared truth: which workspace owns a document, which chunk may be shown to which agent, and which source supports an answer.

Structured Query Language (SQL) is the language for asking those questions of a relational database. A database stores tables. A table has named columns, and each stored fact becomes a row. Instead of trusting one program's memory, you state durable rules about data and ask the database to enforce them.

The lab builds a tiny support-policy retriever for Acme Cloud. It answers "How do I rotate my API key?" only from chunks that belong to the active workspace and are visible to the active support agent. SQLite and Python's standard-library sqlite3 interface keep the first version local. At the end, the same model extends to PostgreSQL and pgvector for semantic search.

QuestionDatabase feature that answers it
Does this document still exist after a restart?A table row saved in a database
Can two chunks claim the same identity?A primary key or uniqueness constraint
Which document produced this quoted chunk?A foreign key plus a JOIN
Is this agent allowed to retrieve it?A permission row plus a filter
What if chunk insertion fails halfway through?A transaction that rolls back

From Python objects to rows

In memory, you might keep documents in a dictionary keyed by document_id. A relational model turns that implicit convention into explicit tables:

TableOne row meansImportant link
workspacesone customer organizationid identifies tenant
documentsone uploaded source documentworkspace_id refers to owner
chunksone retrievable paragraphdocument_id refers to source
chunk_permissionsone allowed principal for one chunkchunk_id refers to content

A primary key identifies one row in its table. A foreign key states that a value must refer to a row in another table. Here, a chunk can't point to a document that doesn't exist. A permission can't point to a chunk that doesn't exist.

Support retrieval data model flowing from documents in Python to SQLite tables, a joined permission-aware query, and an allowed cited paragraph. Support retrieval data model flowing from documents in Python to SQLite tables, a joined permission-aware query, and an allowed cited paragraph.
A database makes hidden application assumptions visible: identity, ownership, permissions, and citation links become rows that each retrieval query must respect.

Create the smallest useful schema

The first executable example creates four tables and inserts a tiny corpus. PRAGMA foreign_keys = ON asks SQLite to enforce foreign-key rules in this connection. SQLite requires that setting per connection, so applications shouldn't rely on a default. We commit the seed data so later examples can deliberately open and roll back new transactions.

01_create_schema.py
1import sqlite3 2 3conn = sqlite3.connect(":memory:") 4conn.execute("PRAGMA foreign_keys = ON") 5 6conn.executescript( 7 """ 8 CREATE TABLE workspaces ( 9 id TEXT PRIMARY KEY, 10 name TEXT NOT NULL 11 ); 12 13 CREATE TABLE documents ( 14 id TEXT PRIMARY KEY, 15 workspace_id TEXT NOT NULL REFERENCES workspaces(id), 16 title TEXT NOT NULL 17 ); 18 19 CREATE TABLE chunks ( 20 id TEXT PRIMARY KEY, 21 document_id TEXT NOT NULL REFERENCES documents(id), 22 chunk_text TEXT NOT NULL, 23 UNIQUE (document_id, chunk_text) 24 ); 25 26 CREATE TABLE chunk_permissions ( 27 chunk_id TEXT NOT NULL REFERENCES chunks(id), 28 principal TEXT NOT NULL, 29 PRIMARY KEY (chunk_id, principal) 30 ); 31 """ 32) 33 34conn.executemany( 35 "INSERT INTO workspaces (id, name) VALUES (?, ?)", 36 [("w01", "Acme Cloud"), ("w02", "Other Workspace")], 37) 38conn.executemany( 39 "INSERT INTO documents (id, workspace_id, title) VALUES (?, ?, ?)", 40 [ 41 ("d7", "w01", "API Key Policy"), 42 ("d9", "w01", "Account Help"), 43 ("d99", "w02", "Private Token Policy"), 44 ], 45) 46conn.executemany( 47 "INSERT INTO chunks (id, document_id, chunk_text) VALUES (?, ?, ?)", 48 [ 49 ("c7a", "d7", "API keys rotate within seven days."), 50 ("c7b", "d7", "Email security with your workspace ID for key rotation."), 51 ("c9a", "d9", "Reset passwords from account settings."), 52 ("c99", "d99", "Token exception for another workspace."), 53 ], 54) 55conn.executemany( 56 "INSERT INTO chunk_permissions (chunk_id, principal) VALUES (?, ?)", 57 [ 58 ("c7a", "agent:u1"), 59 ("c7a", "agent:u3"), 60 ("c7b", "agent:u1"), 61 ("c7b", "agent:u3"), 62 ("c9a", "agent:u1"), 63 ("c99", "agent:u3"), 64 ], 65) 66conn.commit() 67 68counts = [ 69 conn.execute(f"SELECT COUNT(*) FROM {table}").fetchone()[0] 70 for table in ("workspaces", "documents", "chunks", "chunk_permissions") 71] 72print("rows", counts)
Output
1rows [2, 3, 4, 6]

The expected output confirms that the database holds two workspaces, three documents, four chunks, and six permission rows. The schema keeps ownership at the document level and visibility at the chunk level. That's intentional. Two paragraphs in one document may later have different visibility, and every retrieved paragraph can still lead back to its source document.

Select rows with parameters

SELECT chooses columns from rows. WHERE narrows the rows. ORDER BY makes output deterministic.

Never join user input into SQL text with string formatting. Use placeholders and pass values separately. Python's SQLite documentation recommends placeholders for binding values because constructing query text from input opens SQL injection vulnerabilities.

02_select_with_parameters.py
1workspace_id = "w01" 2 3rows = conn.execute( 4 """ 5 SELECT id, title 6 FROM documents 7 WHERE workspace_id = ? 8 ORDER BY id 9 """, 10 (workspace_id,), 11).fetchall() 12 13print(rows)
Output
1[('d7', 'API Key Policy'), ('d9', 'Account Help')]

The SQL statement has a ? where a value belongs, and the tuple (workspace_id,) contains the value. That separation lets the database receive code and data separately.

This pattern stays important in AI applications. A chat message is still untrusted input, even when another model wrote it. Don't let it rewrite query structure.

Binding keeps a value from becoming SQL syntax. It doesn't change the meaning of the SQL operation: inside LIKE, % and _ in a bound value still act as wildcard characters. Escape them when the product needs literal substring matching, or use a search index with the tokenization rules your product expects.

Join a chunk to its source

The answer generator needs more than matching text. It needs the title or document ID that lets the UI cite the evidence. A join combines rows whose keys match.

Here, chunks.document_id refers to documents.id. The query can return a paragraph and its source title in one result set:

03_join_chunks_to_documents.py
1term = "%key%" 2 3citations = conn.execute( 4 """ 5 SELECT c.id, d.title, c.chunk_text 6 FROM chunks AS c 7 JOIN documents AS d ON d.id = c.document_id 8 WHERE d.workspace_id = ? 9 AND LOWER(c.chunk_text) LIKE ? 10 ORDER BY c.id 11 """, 12 ("w01", term), 13).fetchall() 14 15for chunk_id, title, text in citations: 16 print(f"{chunk_id} [{title}] {text}")
Output
1c7a [API Key Policy] API keys rotate within seven days. 2c7b [API Key Policy] Email security with your workspace ID for key rotation.

LIKE is enough for this tiny vocabulary demonstration. It isn't semantic retrieval: it will miss paraphrases such as "rotate credentials" when no row contains "key." The production extension later adds vector similarity without discarding tenant and permission checks.

Constraints catch broken relationships

Code can be buggy. An ingestion job can create a chunk for a document ID that never arrived. Without a constraint, the orphaned chunk may appear in retrieval results with no trustworthy source.

The foreign key in chunks turns that silent corruption into a visible failure. Under Python's default sqlite3 transaction handling, a data-changing statement opens a transaction implicitly. A failed statement reverts its own change, and rollback() closes the failed unit before the lab continues:

04_foreign_key_guard.py
1try: 2 conn.execute( 3 "INSERT INTO chunks (id, document_id, chunk_text) VALUES (?, ?, ?)", 4 ("broken", "missing-document", "This row has no source."), 5 ) 6except sqlite3.IntegrityError: 7 print("blocked orphan chunk") 8 conn.rollback() 9 10missing = conn.execute( 11 "SELECT COUNT(*) FROM chunks WHERE id = ?", 12 ("broken",), 13).fetchone()[0] 14print("orphan rows stored", missing)
Output
1blocked orphan chunk 2orphan rows stored 0

Primary keys catch duplicate identities. Foreign keys catch broken references. A NOT NULL column catches absent required values. These rules belong in the schema because every writer must obey them, including the current Python script.

Make permissions queryable data

An access-control list says which principals may view a resource. Instead of hiding that list in application logic, our beginner schema stores one permitted (chunk_id, principal) pair per row. That gives us one simple rule:

Return a chunk only when its document belongs to the active workspace and a permission row exists for the active principal.

Alice (agent:u1) may read key and account help. Bob (agent:u3) may read key paragraphs for Acme, but not Acme account-help content.

05_authorized_lookup.py
1def permitted_matches(workspace_id: str, principal: str, word: str) -> list[str]: 2 return [ 3 row[0] 4 for row in conn.execute( 5 """ 6 SELECT c.id 7 FROM chunks AS c 8 JOIN documents AS d ON d.id = c.document_id 9 JOIN chunk_permissions AS p ON p.chunk_id = c.id 10 WHERE d.workspace_id = ? 11 AND p.principal = ? 12 AND LOWER(c.chunk_text) LIKE ? 13 ORDER BY c.id 14 """, 15 (workspace_id, principal, f"%{word.lower()}%"), 16 ).fetchall() 17 ] 18 19print("Bob key", permitted_matches("w01", "agent:u3", "key")) 20print("Bob password", permitted_matches("w01", "agent:u3", "password")) 21print("Alice password", permitted_matches("w01", "agent:u1", "password"))
Output
1Bob key ['c7a', 'c7b'] 2Bob password [] 3Alice password ['c9a']
Authorization failure and fix: a key query without workspace scope exposes another workspace chunk, while adding workspace and permission predicates returns only allowed Acme evidence. Authorization failure and fix: a key query without workspace scope exposes another workspace chunk, while adding workspace and permission predicates returns only allowed Acme evidence.
A relevant result isn't automatically allowed. Retrieval needs both the workspace predicate and the permission predicate on every path that returns context.

Break it: forget the workspace predicate

Bob also has permission to read one row in w02. That's useful for showing the failure rather than assuming the rule is correct.

Suppose the active support screen is displaying Acme (w01), but a programmer filters only by user and query word. Bob's key result now includes another workspace's private rule:

06_missing_tenant_filter.py
1unsafe = conn.execute( 2 """ 3 SELECT c.id 4 FROM chunks AS c 5 JOIN chunk_permissions AS p ON p.chunk_id = c.id 6 WHERE p.principal = ? 7 AND LOWER(c.chunk_text) LIKE ? 8 ORDER BY c.id 9 """, 10 ("agent:u3", "%key%"), 11).fetchall() 12 13safe = permitted_matches("w01", "agent:u3", "key") 14 15print("unsafe", [row[0] for row in unsafe]) 16print("safe", safe)
Output
1unsafe ['c7a', 'c7b'] 2safe ['c7a', 'c7b']

The safe query isn't safe because its chunks sound relevant. It's safe because both authorization dimensions are present: current workspace and current principal.

For a production PostgreSQL application, row-level security (RLS) can make a database policy enforce an additional boundary. RLS isn't automatic: once it's enabled for a table, normal access must satisfy a policy, and an enabled table with no policy uses default deny. Superusers and roles with BYPASSRLS always bypass those policies. Table owners normally bypass them too unless the table uses FORCE ROW LEVEL SECURITY. Run application queries with a role that's subject to the policy, then test that role directly.[1]Reference 1PostgreSQL Row Security Policieshttps://www.postgresql.org/docs/current/ddl-rowsecurity.html

Use a transaction for all-or-nothing ingestion

A transaction groups changes into one unit. Either every change commits, or a failure rolls them back. That matters when a new document and its chunks must appear together. A visible document with missing chunks is confusing; visible chunks with no source are worse.

The next example uses with conn: so Python commits an open transaction on success or rolls it back when the body raises an exception. The context manager doesn't open a transaction by itself. With Python's default sqlite3 handling, the first INSERT inside the block opens one. The second INSERT intentionally points at a missing source, so its foreign-key error rolls back the document insertion too.

07_transaction_rollback.py
1try: 2 with conn: 3 conn.execute( 4 "INSERT INTO documents (id, workspace_id, title) VALUES (?, ?, ?)", 5 ("d20", "w01", "Incident Policy"), 6 ) 7 conn.execute( 8 "INSERT INTO chunks (id, document_id, chunk_text) VALUES (?, ?, ?)", 9 ("c20", "not-d20", "Incidents are reviewed in two days."), 10 ) 11except sqlite3.IntegrityError: 12 print("transaction rolled back") 13 14visible = conn.execute( 15 "SELECT COUNT(*) FROM documents WHERE id = ?", 16 ("d20",), 17).fetchone()[0] 18print("partial document visible", bool(visible))
Output
1transaction rolled back 2partial document visible False

When an embedding service is involved, you have two reasonable designs. You can compute embeddings before the database transaction and commit document plus chunks together, or write a pending ingestion record and publish a new complete version only after all chunks are ready. Don't expose half an index to retrieval without making that state deliberate.

Keep retries from duplicating completed work

A transaction makes one attempt all-or-nothing. It doesn't tell a worker whether the request already succeeded before a timeout. If the worker commits new rows, crashes before sending its response, and retries the same request, a second write attempt can duplicate work.

For retried writes, store a stable idempotency key in the same transaction as the rows created for that request. A uniqueness constraint lets the database accept the first attempt and recognize later repeats even after the original process disappears:

08_idempotent_ingestion.py
1conn.execute( 2 """ 3 CREATE TABLE ingestion_requests ( 4 request_id TEXT PRIMARY KEY, 5 document_id TEXT NOT NULL 6 ) 7 """ 8) 9conn.commit() 10 11def ingest_document_once( 12 request_id: str, 13 document_id: str, 14 title: str, 15 chunk_id: str, 16 chunk_text: str, 17) -> str: 18 with conn: 19 created = conn.execute( 20 """ 21 INSERT INTO ingestion_requests (request_id, document_id) 22 VALUES (?, ?) 23 ON CONFLICT(request_id) DO NOTHING 24 """, 25 (request_id, document_id), 26 ).rowcount 27 if created == 0: 28 existing_document_id = conn.execute( 29 "SELECT document_id FROM ingestion_requests WHERE request_id = ?", 30 (request_id,), 31 ).fetchone()[0] 32 if existing_document_id != document_id: 33 raise ValueError("idempotency key reused for a different document") 34 return "skipped retry" 35 36 conn.execute( 37 "INSERT INTO documents (id, workspace_id, title) VALUES (?, ?, ?)", 38 (document_id, "w01", title), 39 ) 40 conn.execute( 41 "INSERT INTO chunks (id, document_id, chunk_text) VALUES (?, ?, ?)", 42 (chunk_id, document_id, chunk_text), 43 ) 44 return "inserted" 45 46print( 47 "first", 48 ingest_document_once("req-30", "d30", "Key Rotation FAQ", "c30", "Temporary keys expire after 30 days."), 49) 50print( 51 "retry", 52 ingest_document_once("req-30", "d30", "Key Rotation FAQ", "c30", "Temporary keys expire after 30 days."), 53) 54stored = conn.execute( 55 "SELECT COUNT(*) FROM documents WHERE id = ?", 56 ("d30",), 57).fetchone()[0] 58print("d30 rows", stored)
Output
1first inserted 2retry skipped retry 3d30 rows 1

The first call stores the request marker, document, and chunk together. The retry sees the same request key and skips the write. This example also rejects reuse of one key for a different document_id. Production APIs often store a request fingerprint too, so the same key can't silently accept a changed payload.

This pattern works when the protected side effect lives in the same database transaction. A local marker can't atomically cover a deployment API or another remote service. For an external side effect, pass the same key to a downstream API that enforces it, or use a workflow such as an outbox with an idempotent consumer.

Index the predicates you run often

Without an index, a database may need to inspect many rows to answer a filter. An index is an additional data structure that lets the engine reach selected rows more directly, at the cost of storage and write work.

Our authorized query repeatedly filters permissions by principal and follows document ownership. Add indexes for those paths, then ask SQLite for its query plan:

09_add_indexes_and_explain.py
1conn.executescript( 2 """ 3 CREATE INDEX permission_principal_idx 4 ON chunk_permissions (principal, chunk_id); 5 CREATE INDEX document_workspace_idx 6 ON documents (workspace_id, id); 7 """ 8) 9 10authorized_sql = """ 11SELECT c.id, d.title, c.chunk_text 12FROM chunks AS c 13JOIN documents AS d ON d.id = c.document_id 14JOIN chunk_permissions AS p ON p.chunk_id = c.id 15WHERE d.workspace_id = ? 16 AND p.principal = ? 17 AND LOWER(c.chunk_text) LIKE ? 18ORDER BY c.id 19""" 20 21plan = conn.execute( 22 "EXPLAIN QUERY PLAN " + authorized_sql, 23 ("w01", "agent:u3", "%key%"), 24).fetchall() 25details = " | ".join(row[3] for row in plan) 26 27print("uses permission index", "permission_principal_idx" in details) 28print("plan steps", len(plan))
Output
1uses permission index True 2plan steps 4

The exact plan text can vary as engines and statistics change. The engineering habit is stable: inspect the query plan for the production query, with realistic tenant and permission selectivity, rather than assuming an index solved a latency problem.

Schema layers showing relational source and permission tables used by an exact SQLite query, followed by an optional PostgreSQL pgvector embedding column for semantic ranking. Schema layers showing relational source and permission tables used by an exact SQLite query, followed by an optional PostgreSQL pgvector embedding column for semantic ranking.
First make identity, ownership, and authorization correct with relational rows. Semantic ranking is an added retrieval capability, not a replacement for those guarantees.

Build a small authorized retriever

Now assemble a function the product could call. It retrieves permitted matching paragraphs and returns document titles beside them. A later answer generator can quote or cite those rows rather than inventing policy text.

10_authorized_retriever.py
1def search_support( 2 workspace_id: str, 3 principal: str, 4 query: str, 5) -> list[tuple[str, str]]: 6 pattern = f"%{query.lower()}%" 7 return conn.execute( 8 authorized_sql, 9 (workspace_id, principal, pattern), 10 ).fetchall() 11 12for chunk_id, title, text in search_support("w01", "agent:u3", "key"): 13 print(f"{chunk_id} | {title} | {text}")
Output
1c7a | API Key Policy | API keys rotate within seven days. 2c7b | API Key Policy | Email security with your workspace ID for key rotation.

This is lexical search: it looks for a word fragment. For a tiny demo, that keeps attention on durable data rules. A retrieval-augmented generation (RAG) system usually also retrieves semantically related text before a large language model (LLM) drafts an answer. That retrieval path should reuse the same source and permission model.

Test boundary rows, not happy paths alone

Tests for retrieval should include forbidden results. A system that finds the best paragraph and leaks it isn't correct.

11_authorization_tests.py
1bob_key = [row[0] for row in search_support("w01", "agent:u3", "key")] 2bob_password = search_support("w01", "agent:u3", "password") 3wrong_workspace = [row[0] for row in search_support("w01", "agent:u3", "token")] 4unknown_user = search_support("w01", "agent:unknown", "key") 5 6assert bob_key == ["c7a", "c7b"] 7assert bob_password == [] 8assert wrong_workspace == [] 9assert unknown_user == [] 10 11print("four retrieval boundary tests passed")
Output
1four retrieval boundary tests passed

That final test is short, but it guards the central failure mode: context must be relevant and authorized before it reaches an LLM prompt.

Extend the model with PostgreSQL and pgvector

SQLite gave us a fully runnable lab. Production systems often move shared workloads to PostgreSQL. When keyword matching is insufficient, the pgvector extension adds a vector column and distance operators. A vector is a list of numbers representing an embedding; nearby embeddings can capture semantic similarity even when exact words differ.

This sketch uses three dimensions only so you can read it. In a real system, choose the dimension required by your embedding model and store/query vectors of that same dimension.

postgres_pgvector_extension.sql
1CREATE EXTENSION vector; 2 3ALTER TABLE chunks ADD COLUMN embedding vector(3); 4 5-- Optional once exact scans no longer meet the latency target: 6CREATE INDEX chunks_embedding_hnsw 7 ON chunks USING hnsw (embedding vector_cosine_ops); 8 9SELECT c.id, 10 d.title, 11 c.chunk_text, 12 1 - (c.embedding <=> $1) AS cosine_similarity 13FROM chunks AS c 14JOIN documents AS d ON d.id = c.document_id 15WHERE d.workspace_id = $2 16 AND c.embedding IS NOT NULL 17 AND EXISTS ( 18 SELECT 1 19 FROM chunk_permissions AS p 20 WHERE p.chunk_id = c.id 21 AND p.principal = $3 22 ) 23ORDER BY c.embedding <=> $1 24LIMIT 3;

In pgvector, <=> means cosine distance, so 1 - distance is cosine similarity. The query works without HNSW: pgvector performs exact nearest-neighbor search by default. Add an HNSW index when exact scans no longer meet the latency target, then measure the recall tradeoff introduced by approximate search. The IS NOT NULL guard keeps chunks that are still waiting for embeddings out of semantic ranking; pgvector doesn't index null vectors.[2]Reference 2pgvectorhttps://github.com/pgvector/pgvector

Filtered search has one caveat. With an approximate vector index, filtering conditions can be applied after the index scans candidates. If tenant or permission filters reject many candidates, a query may return fewer rows than requested even when allowed matches exist. pgvector documents iterative index scans for continuing the search when filtering removes too many candidates; that capability arrived in version 0.8.0.[2]Reference 2pgvectorhttps://github.com/pgvector/pgvector[3]Reference 3pgvector 0.8.0 releasedhttps://www.postgresql.org/about/news/pgvector-080-released-2952/

The rule is precise:

  1. Authorization predicates must remain in every retrieval query.
  2. Approximate vector retrieval still needs plan inspection and recall testing under restrictive filters.
  3. A database policy such as PostgreSQL RLS can reinforce authorization, but it doesn't remove the need to design and test retrieval behavior.

What to remember

You started with four tables, not a production vector stack:

SkillWhy it matters in an AI system
Model documents, chunks, and permissions as rowsContext has ownership and provenance
Use primary and foreign keysBroken citation links fail early
Bind parameters rather than formatting SQLInputs can't change query structure
Join source and permission tablesRetrieval returns only attributable, allowed text
Wrap multi-row ingestion in a transactionPartially written context stays hidden
Store a unique idempotency key with retried writesCompleted work doesn't run twice after a timeout
Inspect indexes and plansPerformance is measured, not assumed
Add vectors after boundaries are correctSemantic rank preserves access rules

Mastery check

Evaluation rubric

  • Foundational: Explains a table, row, primary key, and foreign key using the support-policy schema.
  • Developing: Writes a parameterized SELECT and a JOIN that returns chunk text beside its source document.
  • Intermediate: Demonstrates why both workspace scope and principal permission belong in an authorized retrieval query.
  • Applied: Uses a transaction, durable idempotency key, and indexes to keep ingestion consistent and inspect a real query path.
  • Advanced: Extends the authorized schema with vector similarity while preserving permission filters and testing filtered approximate retrieval.

Follow-up questions

Common pitfalls

  • A query returns a relevant paragraph from another tenant. Cause: retrieval filters by principal or similarity but omits active workspace_id. Fix: keep both ownership and permission predicates in every context query.
  • A retrieved chunk has no document to cite. Cause: ingestion allowed orphaned chunk rows. Fix: require a foreign key from chunks.document_id to documents.id.
  • A new document appears with only some chunks available. Cause: multi-row ingestion was committed in fragments. Fix: publish a complete version inside a transaction or after staged ingestion completes.
  • A timed-out ingestion creates duplicate rows when retried. Cause: each attempt commits independently with no durable request identity. Fix: store a stable idempotency key in the same transaction and enforce uniqueness.
  • A chat message changes query behavior. Cause: application code formatted input into SQL text. Fix: bind user and model-produced values as parameters.
  • A filtered vector query returns too few results. Cause: approximate candidates were removed by tenant or permission filtering. Fix: test recall and use appropriate pgvector scan tuning while preserving authorization.
Complete the lesson

Mastery Check

Answer every question, then check your score. Score above 75% to mark this lesson complete.

1.An ingestion path must reject both a second chunk with the same id and a chunk whose document does not exist. Which SQLite setup enforces both rules?
2.A support search box receives the text key%' OR 1=1 -- from a chat message. Which query-building pattern prevents that text from changing the SQL operation?
3.A citation query must return each chunk's text beside the title of its source document. Given that chunks.document_id refers to documents.id, which join follows that relationship?
4.Bob is answering an Acme workspace request. A matching key chunk belongs to workspace w02, and Bob has permission to read it in another workspace. Should the Acme retrieval query return it?
5.An ingestion transaction inserts document d20, then insertion of its first chunk fails a foreign-key constraint. What should be visible after rollback?
6.A worker commits document d30, times out before replying, and receives the same request again. Why isn't a transaction alone enough to prevent duplicate work?
7.An authorized lookup is slow. To reduce latency without changing its authorization semantics, what should the engineer do?
8.You add an HNSW vector index, keep tenant and permission filters, and a LIMIT 3 query returns only one authorized chunk even though more exist. What should you investigate?

8 questions remaining.

Next Step
Continue to Algorithms for ML Engineers

You can now build durable retrieval queries that preserve ownership, visibility, and citations. The next chapter measures how runtime and memory grow as retrieval and model operations scale.

PreviousData Structures for AI
Share this article
XFacebookLinkedInBlueskyRedditHacker NewsEmail
References

PostgreSQL Row Security Policies

PostgreSQL Global Development Group · 2026

pgvector

pgvector contributors · 2026 · GitHub

pgvector 0.8.0 released

PostgreSQL Global Development Group · 2024