Turn an in-memory support retriever into durable SQL tables. Create rows and keys, query with parameters and joins, enforce permissions, roll back failed work, deduplicate retries, inspect indexes, and see where pgvector fits.
A Python dictionary retriever works while one process holds all the facts. Close that process and the index is gone. Put two support agents on it and you also need shared truth: which workspace owns a document, which chunk may be shown to which agent, and which source supports an answer.
Structured Query Language (SQL) is the language for asking those questions of a relational database. A database stores tables. A table has named columns, and each stored fact becomes a row. Instead of trusting one program's memory, you state durable rules about data and ask the database to enforce them.
The lab builds a tiny support-policy retriever for Acme Cloud. It answers "How do I rotate my API key?" only from chunks that belong to the active workspace and are visible to the active support agent. SQLite and Python's standard-library sqlite3 interface keep the first version local. At the end, the same model extends to PostgreSQL and pgvector for semantic search.
| Question | Database feature that answers it |
|---|---|
| Does this document still exist after a restart? | A table row saved in a database |
| Can two chunks claim the same identity? | A primary key or uniqueness constraint |
| Which document produced this quoted chunk? | A foreign key plus a JOIN |
| Is this agent allowed to retrieve it? | A permission row plus a filter |
| What if chunk insertion fails halfway through? | A transaction that rolls back |
In memory, you might keep documents in a dictionary keyed by document_id. A relational model turns that implicit convention into explicit tables:
| Table | One row means | Important link |
|---|---|---|
workspaces | one customer organization | id identifies tenant |
documents | one uploaded source document | workspace_id refers to owner |
chunks | one retrievable paragraph | document_id refers to source |
chunk_permissions | one allowed principal for one chunk | chunk_id refers to content |
A primary key identifies one row in its table. A foreign key states that a value must refer to a row in another table. Here, a chunk can't point to a document that doesn't exist. A permission can't point to a chunk that doesn't exist.
The first executable example creates four tables and inserts a tiny corpus. PRAGMA foreign_keys = ON asks SQLite to enforce foreign-key rules in this connection. SQLite requires that setting per connection, so applications shouldn't rely on a default. We commit the seed data so later examples can deliberately open and roll back new transactions.
1import sqlite3
2
3conn = sqlite3.connect(":memory:")
4conn.execute("PRAGMA foreign_keys = ON")
5
6conn.executescript(
7 """
8 CREATE TABLE workspaces (
9 id TEXT PRIMARY KEY,
10 name TEXT NOT NULL
11 );
12
13 CREATE TABLE documents (
14 id TEXT PRIMARY KEY,
15 workspace_id TEXT NOT NULL REFERENCES workspaces(id),
16 title TEXT NOT NULL
17 );
18
19 CREATE TABLE chunks (
20 id TEXT PRIMARY KEY,
21 document_id TEXT NOT NULL REFERENCES documents(id),
22 chunk_text TEXT NOT NULL,
23 UNIQUE (document_id, chunk_text)
24 );
25
26 CREATE TABLE chunk_permissions (
27 chunk_id TEXT NOT NULL REFERENCES chunks(id),
28 principal TEXT NOT NULL,
29 PRIMARY KEY (chunk_id, principal)
30 );
31 """
32)
33
34conn.executemany(
35 "INSERT INTO workspaces (id, name) VALUES (?, ?)",
36 [("w01", "Acme Cloud"), ("w02", "Other Workspace")],
37)
38conn.executemany(
39 "INSERT INTO documents (id, workspace_id, title) VALUES (?, ?, ?)",
40 [
41 ("d7", "w01", "API Key Policy"),
42 ("d9", "w01", "Account Help"),
43 ("d99", "w02", "Private Token Policy"),
44 ],
45)
46conn.executemany(
47 "INSERT INTO chunks (id, document_id, chunk_text) VALUES (?, ?, ?)",
48 [
49 ("c7a", "d7", "API keys rotate within seven days."),
50 ("c7b", "d7", "Email security with your workspace ID for key rotation."),
51 ("c9a", "d9", "Reset passwords from account settings."),
52 ("c99", "d99", "Token exception for another workspace."),
53 ],
54)
55conn.executemany(
56 "INSERT INTO chunk_permissions (chunk_id, principal) VALUES (?, ?)",
57 [
58 ("c7a", "agent:u1"),
59 ("c7a", "agent:u3"),
60 ("c7b", "agent:u1"),
61 ("c7b", "agent:u3"),
62 ("c9a", "agent:u1"),
63 ("c99", "agent:u3"),
64 ],
65)
66conn.commit()
67
68counts = [
69 conn.execute(f"SELECT COUNT(*) FROM {table}").fetchone()[0]
70 for table in ("workspaces", "documents", "chunks", "chunk_permissions")
71]
72print("rows", counts)1rows [2, 3, 4, 6]The expected output confirms that the database holds two workspaces, three documents, four chunks, and six permission rows. The schema keeps ownership at the document level and visibility at the chunk level. That's intentional. Two paragraphs in one document may later have different visibility, and every retrieved paragraph can still lead back to its source document.
SELECT chooses columns from rows. WHERE narrows the rows. ORDER BY makes output deterministic.
Never join user input into SQL text with string formatting. Use placeholders and pass values separately. Python's SQLite documentation recommends placeholders for binding values because constructing query text from input opens SQL injection vulnerabilities.
1workspace_id = "w01"
2
3rows = conn.execute(
4 """
5 SELECT id, title
6 FROM documents
7 WHERE workspace_id = ?
8 ORDER BY id
9 """,
10 (workspace_id,),
11).fetchall()
12
13print(rows)1[('d7', 'API Key Policy'), ('d9', 'Account Help')]The SQL statement has a ? where a value belongs, and the tuple (workspace_id,) contains the value. That separation lets the database receive code and data separately.
This pattern stays important in AI applications. A chat message is still untrusted input, even when another model wrote it. Don't let it rewrite query structure.
Binding keeps a value from becoming SQL syntax. It doesn't change the meaning of the SQL operation: inside LIKE, % and _ in a bound value still act as wildcard characters. Escape them when the product needs literal substring matching, or use a search index with the tokenization rules your product expects.
The answer generator needs more than matching text. It needs the title or document ID that lets the UI cite the evidence. A join combines rows whose keys match.
Here, chunks.document_id refers to documents.id. The query can return a paragraph and its source title in one result set:
1term = "%key%"
2
3citations = conn.execute(
4 """
5 SELECT c.id, d.title, c.chunk_text
6 FROM chunks AS c
7 JOIN documents AS d ON d.id = c.document_id
8 WHERE d.workspace_id = ?
9 AND LOWER(c.chunk_text) LIKE ?
10 ORDER BY c.id
11 """,
12 ("w01", term),
13).fetchall()
14
15for chunk_id, title, text in citations:
16 print(f"{chunk_id} [{title}] {text}")1c7a [API Key Policy] API keys rotate within seven days.
2c7b [API Key Policy] Email security with your workspace ID for key rotation.LIKE is enough for this tiny vocabulary demonstration. It isn't semantic retrieval: it will miss paraphrases such as "rotate credentials" when no row contains "key." The production extension later adds vector similarity without discarding tenant and permission checks.
Code can be buggy. An ingestion job can create a chunk for a document ID that never arrived. Without a constraint, the orphaned chunk may appear in retrieval results with no trustworthy source.
The foreign key in chunks turns that silent corruption into a visible failure. Under Python's default sqlite3 transaction handling, a data-changing statement opens a transaction implicitly. A failed statement reverts its own change, and rollback() closes the failed unit before the lab continues:
1try:
2 conn.execute(
3 "INSERT INTO chunks (id, document_id, chunk_text) VALUES (?, ?, ?)",
4 ("broken", "missing-document", "This row has no source."),
5 )
6except sqlite3.IntegrityError:
7 print("blocked orphan chunk")
8 conn.rollback()
9
10missing = conn.execute(
11 "SELECT COUNT(*) FROM chunks WHERE id = ?",
12 ("broken",),
13).fetchone()[0]
14print("orphan rows stored", missing)1blocked orphan chunk
2orphan rows stored 0Primary keys catch duplicate identities. Foreign keys catch broken references. A NOT NULL column catches absent required values. These rules belong in the schema because every writer must obey them, including the current Python script.
An access-control list says which principals may view a resource. Instead of hiding that list in application logic, our beginner schema stores one permitted (chunk_id, principal) pair per row. That gives us one simple rule:
Return a chunk only when its document belongs to the active workspace and a permission row exists for the active principal.
Alice (agent:u1) may read key and account help. Bob (agent:u3) may read key paragraphs for Acme, but not Acme account-help content.
1def permitted_matches(workspace_id: str, principal: str, word: str) -> list[str]:
2 return [
3 row[0]
4 for row in conn.execute(
5 """
6 SELECT c.id
7 FROM chunks AS c
8 JOIN documents AS d ON d.id = c.document_id
9 JOIN chunk_permissions AS p ON p.chunk_id = c.id
10 WHERE d.workspace_id = ?
11 AND p.principal = ?
12 AND LOWER(c.chunk_text) LIKE ?
13 ORDER BY c.id
14 """,
15 (workspace_id, principal, f"%{word.lower()}%"),
16 ).fetchall()
17 ]
18
19print("Bob key", permitted_matches("w01", "agent:u3", "key"))
20print("Bob password", permitted_matches("w01", "agent:u3", "password"))
21print("Alice password", permitted_matches("w01", "agent:u1", "password"))1Bob key ['c7a', 'c7b']
2Bob password []
3Alice password ['c9a']
Bob also has permission to read one row in w02. That's useful for showing the failure rather than assuming the rule is correct.
Suppose the active support screen is displaying Acme (w01), but a programmer filters only by user and query word. Bob's key result now includes another workspace's private rule:
1unsafe = conn.execute(
2 """
3 SELECT c.id
4 FROM chunks AS c
5 JOIN chunk_permissions AS p ON p.chunk_id = c.id
6 WHERE p.principal = ?
7 AND LOWER(c.chunk_text) LIKE ?
8 ORDER BY c.id
9 """,
10 ("agent:u3", "%key%"),
11).fetchall()
12
13safe = permitted_matches("w01", "agent:u3", "key")
14
15print("unsafe", [row[0] for row in unsafe])
16print("safe", safe)1unsafe ['c7a', 'c7b']
2safe ['c7a', 'c7b']The safe query isn't safe because its chunks sound relevant. It's safe because both authorization dimensions are present: current workspace and current principal.
For a production PostgreSQL application, row-level security (RLS) can make a database policy enforce an additional boundary. RLS isn't automatic: once it's enabled for a table, normal access must satisfy a policy, and an enabled table with no policy uses default deny. Superusers and roles with BYPASSRLS always bypass those policies. Table owners normally bypass them too unless the table uses FORCE ROW LEVEL SECURITY. Run application queries with a role that's subject to the policy, then test that role directly.[1]
A transaction groups changes into one unit. Either every change commits, or a failure rolls them back. That matters when a new document and its chunks must appear together. A visible document with missing chunks is confusing; visible chunks with no source are worse.
The next example uses with conn: so Python commits an open transaction on success or rolls it back when the body raises an exception. The context manager doesn't open a transaction by itself. With Python's default sqlite3 handling, the first INSERT inside the block opens one. The second INSERT intentionally points at a missing source, so its foreign-key error rolls back the document insertion too.
1try:
2 with conn:
3 conn.execute(
4 "INSERT INTO documents (id, workspace_id, title) VALUES (?, ?, ?)",
5 ("d20", "w01", "Incident Policy"),
6 )
7 conn.execute(
8 "INSERT INTO chunks (id, document_id, chunk_text) VALUES (?, ?, ?)",
9 ("c20", "not-d20", "Incidents are reviewed in two days."),
10 )
11except sqlite3.IntegrityError:
12 print("transaction rolled back")
13
14visible = conn.execute(
15 "SELECT COUNT(*) FROM documents WHERE id = ?",
16 ("d20",),
17).fetchone()[0]
18print("partial document visible", bool(visible))1transaction rolled back
2partial document visible FalseWhen an embedding service is involved, you have two reasonable designs. You can compute embeddings before the database transaction and commit document plus chunks together, or write a pending ingestion record and publish a new complete version only after all chunks are ready. Don't expose half an index to retrieval without making that state deliberate.
A transaction makes one attempt all-or-nothing. It doesn't tell a worker whether the request already succeeded before a timeout. If the worker commits new rows, crashes before sending its response, and retries the same request, a second write attempt can duplicate work.
For retried writes, store a stable idempotency key in the same transaction as the rows created for that request. A uniqueness constraint lets the database accept the first attempt and recognize later repeats even after the original process disappears:
1conn.execute(
2 """
3 CREATE TABLE ingestion_requests (
4 request_id TEXT PRIMARY KEY,
5 document_id TEXT NOT NULL
6 )
7 """
8)
9conn.commit()
10
11def ingest_document_once(
12 request_id: str,
13 document_id: str,
14 title: str,
15 chunk_id: str,
16 chunk_text: str,
17) -> str:
18 with conn:
19 created = conn.execute(
20 """
21 INSERT INTO ingestion_requests (request_id, document_id)
22 VALUES (?, ?)
23 ON CONFLICT(request_id) DO NOTHING
24 """,
25 (request_id, document_id),
26 ).rowcount
27 if created == 0:
28 existing_document_id = conn.execute(
29 "SELECT document_id FROM ingestion_requests WHERE request_id = ?",
30 (request_id,),
31 ).fetchone()[0]
32 if existing_document_id != document_id:
33 raise ValueError("idempotency key reused for a different document")
34 return "skipped retry"
35
36 conn.execute(
37 "INSERT INTO documents (id, workspace_id, title) VALUES (?, ?, ?)",
38 (document_id, "w01", title),
39 )
40 conn.execute(
41 "INSERT INTO chunks (id, document_id, chunk_text) VALUES (?, ?, ?)",
42 (chunk_id, document_id, chunk_text),
43 )
44 return "inserted"
45
46print(
47 "first",
48 ingest_document_once("req-30", "d30", "Key Rotation FAQ", "c30", "Temporary keys expire after 30 days."),
49)
50print(
51 "retry",
52 ingest_document_once("req-30", "d30", "Key Rotation FAQ", "c30", "Temporary keys expire after 30 days."),
53)
54stored = conn.execute(
55 "SELECT COUNT(*) FROM documents WHERE id = ?",
56 ("d30",),
57).fetchone()[0]
58print("d30 rows", stored)1first inserted
2retry skipped retry
3d30 rows 1The first call stores the request marker, document, and chunk together. The retry sees the same request key and skips the write. This example also rejects reuse of one key for a different document_id. Production APIs often store a request fingerprint too, so the same key can't silently accept a changed payload.
This pattern works when the protected side effect lives in the same database transaction. A local marker can't atomically cover a deployment API or another remote service. For an external side effect, pass the same key to a downstream API that enforces it, or use a workflow such as an outbox with an idempotent consumer.
Without an index, a database may need to inspect many rows to answer a filter. An index is an additional data structure that lets the engine reach selected rows more directly, at the cost of storage and write work.
Our authorized query repeatedly filters permissions by principal and follows document ownership. Add indexes for those paths, then ask SQLite for its query plan:
1conn.executescript(
2 """
3 CREATE INDEX permission_principal_idx
4 ON chunk_permissions (principal, chunk_id);
5 CREATE INDEX document_workspace_idx
6 ON documents (workspace_id, id);
7 """
8)
9
10authorized_sql = """
11SELECT c.id, d.title, c.chunk_text
12FROM chunks AS c
13JOIN documents AS d ON d.id = c.document_id
14JOIN chunk_permissions AS p ON p.chunk_id = c.id
15WHERE d.workspace_id = ?
16 AND p.principal = ?
17 AND LOWER(c.chunk_text) LIKE ?
18ORDER BY c.id
19"""
20
21plan = conn.execute(
22 "EXPLAIN QUERY PLAN " + authorized_sql,
23 ("w01", "agent:u3", "%key%"),
24).fetchall()
25details = " | ".join(row[3] for row in plan)
26
27print("uses permission index", "permission_principal_idx" in details)
28print("plan steps", len(plan))1uses permission index True
2plan steps 4The exact plan text can vary as engines and statistics change. The engineering habit is stable: inspect the query plan for the production query, with realistic tenant and permission selectivity, rather than assuming an index solved a latency problem.
Now assemble a function the product could call. It retrieves permitted matching paragraphs and returns document titles beside them. A later answer generator can quote or cite those rows rather than inventing policy text.
1def search_support(
2 workspace_id: str,
3 principal: str,
4 query: str,
5) -> list[tuple[str, str]]:
6 pattern = f"%{query.lower()}%"
7 return conn.execute(
8 authorized_sql,
9 (workspace_id, principal, pattern),
10 ).fetchall()
11
12for chunk_id, title, text in search_support("w01", "agent:u3", "key"):
13 print(f"{chunk_id} | {title} | {text}")1c7a | API Key Policy | API keys rotate within seven days.
2c7b | API Key Policy | Email security with your workspace ID for key rotation.This is lexical search: it looks for a word fragment. For a tiny demo, that keeps attention on durable data rules. A retrieval-augmented generation (RAG) system usually also retrieves semantically related text before a large language model (LLM) drafts an answer. That retrieval path should reuse the same source and permission model.
Tests for retrieval should include forbidden results. A system that finds the best paragraph and leaks it isn't correct.
1bob_key = [row[0] for row in search_support("w01", "agent:u3", "key")]
2bob_password = search_support("w01", "agent:u3", "password")
3wrong_workspace = [row[0] for row in search_support("w01", "agent:u3", "token")]
4unknown_user = search_support("w01", "agent:unknown", "key")
5
6assert bob_key == ["c7a", "c7b"]
7assert bob_password == []
8assert wrong_workspace == []
9assert unknown_user == []
10
11print("four retrieval boundary tests passed")1four retrieval boundary tests passedThat final test is short, but it guards the central failure mode: context must be relevant and authorized before it reaches an LLM prompt.
SQLite gave us a fully runnable lab. Production systems often move shared workloads to PostgreSQL. When keyword matching is insufficient, the pgvector extension adds a vector column and distance operators. A vector is a list of numbers representing an embedding; nearby embeddings can capture semantic similarity even when exact words differ.
This sketch uses three dimensions only so you can read it. In a real system, choose the dimension required by your embedding model and store/query vectors of that same dimension.
1CREATE EXTENSION vector;
2
3ALTER TABLE chunks ADD COLUMN embedding vector(3);
4
5-- Optional once exact scans no longer meet the latency target:
6CREATE INDEX chunks_embedding_hnsw
7 ON chunks USING hnsw (embedding vector_cosine_ops);
8
9SELECT c.id,
10 d.title,
11 c.chunk_text,
12 1 - (c.embedding <=> $1) AS cosine_similarity
13FROM chunks AS c
14JOIN documents AS d ON d.id = c.document_id
15WHERE d.workspace_id = $2
16 AND c.embedding IS NOT NULL
17 AND EXISTS (
18 SELECT 1
19 FROM chunk_permissions AS p
20 WHERE p.chunk_id = c.id
21 AND p.principal = $3
22 )
23ORDER BY c.embedding <=> $1
24LIMIT 3;In pgvector, <=> means cosine distance, so 1 - distance is cosine similarity. The query works without HNSW: pgvector performs exact nearest-neighbor search by default. Add an HNSW index when exact scans no longer meet the latency target, then measure the recall tradeoff introduced by approximate search. The IS NOT NULL guard keeps chunks that are still waiting for embeddings out of semantic ranking; pgvector doesn't index null vectors.[2]
Filtered search has one caveat. With an approximate vector index, filtering conditions can be applied after the index scans candidates. If tenant or permission filters reject many candidates, a query may return fewer rows than requested even when allowed matches exist. pgvector documents iterative index scans for continuing the search when filtering removes too many candidates; that capability arrived in version 0.8.0.[2][3]
The rule is precise:
You started with four tables, not a production vector stack:
| Skill | Why it matters in an AI system |
|---|---|
| Model documents, chunks, and permissions as rows | Context has ownership and provenance |
| Use primary and foreign keys | Broken citation links fail early |
| Bind parameters rather than formatting SQL | Inputs can't change query structure |
| Join source and permission tables | Retrieval returns only attributable, allowed text |
| Wrap multi-row ingestion in a transaction | Partially written context stays hidden |
| Store a unique idempotency key with retried writes | Completed work doesn't run twice after a timeout |
| Inspect indexes and plans | Performance is measured, not assumed |
| Add vectors after boundaries are correct | Semantic rank preserves access rules |
SELECT and a JOIN that returns chunk text beside its source document.workspace_id. Fix: keep both ownership and permission predicates in every context query.chunks.document_id to documents.id.Answer every question, then check your score. Score above 75% to mark this lesson complete.
8 questions remaining.