Build a production-grade support-ticket RAG system on Postgres + pgvector. Design schemas that mix normalized merchant data with denormalized chunks and embeddings, enforce ACLs and metadata filters, choose the right indexes, and run hybrid SQL + vector queries from Python.
Imagine a support agent at Acme Co. (merchant ID m01) receives a live chat: "I ordered the blue widget on Tuesday and the refund never arrived. Order #4821."
The agent needs the merchant's exact refund policy, the specific ticket history for that customer, the order status from the transactional system, and only the documents and tickets that this agent (or the end customer) is allowed to read. A naive LIKE '%refund%' scan across every row in a 4-million-ticket corpus takes 800 ms and returns garbage. The LLM that will draft the reply needs the right three paragraphs in the right order, with citations it can trust, in under 120 ms.
That is the job of a carefully designed SQL schema with vector search.
Modern AI applications are hybrid systems. The merchant's orders, payments, and user accounts must remain strongly consistent (ACID). The retrieval layer that feeds the LLM (RAG) must be fast, filterable by tenant, and permission-aware. The data model has to satisfy both workloads at the same time.[1]
This chapter teaches you exactly how to do it with Postgres + pgvector, the combination that powers the majority of production RAG systems in 2026.
| Course position | What to carry |
|---|---|
| Before this | You understand in-memory data structures (lists, dicts of posting lists, heaps, caches) from the previous lesson and can write simple Python that talks to a database. |
| This chapter | You will design durable tables that mix normalized merchant records with denormalized chunks + 1536-dimensional embeddings, enforce ACLs, pick the right indexes, and query them from Python. |
| Next chapter | Algorithms & Complexity for ML Engineers teaches you how to reason about the time, space, and memory-bandwidth costs of the retrieval and inference code that will sit on top of the tables you just designed. |
We will use a single merchant (m01 = Acme) with two source documents and three historical support tickets. All numbers are tiny so you can compute everything by hand before we scale to production.
Source documents (the knowledge base the merchant uploads)
| doc_id | title | full_text (abbreviated) |
|---|---|---|
| d7 | Refund Policy | "Refunds are granted within 30 days for unused items... Email [email protected] with order number..." |
| d9 | Account & Password Help | "To reset your password visit /account/reset. Two-factor is required for Pro plans..." |
Support tickets (the conversational history)
| ticket_id | merchant_id | customer_id | text | status |
|---|---|---|---|---|
| t1 | m01 | u42 | "Where is my refund for order 4821?" | open |
| t3 | m01 | u7 | "Password reset link expired again" | resolved |
| t5 | m01 | u19 | "Can I get a refund on the blue widget?" | open |
Two agents:
u1) works for Acme and can see everything.u3) is a contractor who can only see refund-related tickets (ACL restricted).We will build a schema that lets Alice ask "show me recent refund questions" and get three perfectly cited chunks in < 15 ms, while Bob asking the same question only sees the chunks he is allowed to read.
Start with the tables that must be correct for billing, compliance, and order integrity.
sql1CREATE TABLE merchants ( 2 id TEXT PRIMARY KEY, 3 name TEXT NOT NULL, 4 plan TEXT NOT NULL CHECK (plan IN ('starter','pro','enterprise')), 5 created_at TIMESTAMPTZ DEFAULT now() 6); 7 8CREATE TABLE support_documents ( 9 id TEXT PRIMARY KEY, 10 merchant_id TEXT REFERENCES merchants(id) ON DELETE CASCADE, 11 title TEXT NOT NULL, 12 full_text TEXT NOT NULL, 13 updated_at TIMESTAMPTZ DEFAULT now() 14); 15 16CREATE TABLE tickets ( 17 id TEXT PRIMARY KEY, 18 merchant_id TEXT REFERENCES merchants(id), 19 customer_id TEXT NOT NULL, 20 text TEXT NOT NULL, 21 status TEXT NOT NULL, 22 created_at TIMESTAMPTZ DEFAULT now() 23);
This is classic third-normal form. If Acme changes its name, one UPDATE fixes it for every ticket and document. If a merchant is deleted, its tickets and documents disappear (or are archived) automatically. The transactional side of the business is happy.
An LLM cannot read a 12-page refund policy in one shot and the policy is only useful when the right paragraph is retrieved. We therefore split documents into chunks (typically 200โ800 tokens) and store an embedding for each chunk.
We could have put the embedding column on support_documents, but then we would be forced to re-embed the entire document every time any sentence changes, and we would lose the ability to cite the exact paragraph that answered the question.
Hence the document_chunks table.
sql1CREATE EXTENSION IF NOT EXISTS vector; -- pgvector 2 3CREATE TABLE document_chunks ( 4 id BIGSERIAL PRIMARY KEY, 5 document_id TEXT REFERENCES support_documents(id) ON DELETE CASCADE, 6 merchant_id TEXT NOT NULL, -- denormalized for fast filter 7 chunk_index INT NOT NULL, 8 chunk_text TEXT NOT NULL, 9 embedding VECTOR(1536) NOT NULL, -- OpenAI text-embedding-3-small size 10 metadata JSONB NOT NULL DEFAULT '{}', -- { "category": "refund", "source": "policy", "last_embedded": "..." } 11 acl TEXT[] NOT NULL DEFAULT '{}', -- user IDs or role strings that may read this chunk 12 created_at TIMESTAMPTZ DEFAULT now() 13); 14 15-- Critical indexes (we explain each in the next section) 16CREATE INDEX idx_chunks_merchant ON document_chunks(merchant_id); 17CREATE INDEX idx_chunks_acl_gin ON document_chunks USING GIN (acl); 18 19-- HNSW is a strong default for high-recall, low-latency vector search 20CREATE INDEX idx_chunks_embedding ON document_chunks 21 USING hnsw (embedding vector_cosine_ops);
Key modeling decisions:
merchant_id is duplicated (denormalized) into the chunks table. A tenant-scoped vector query usually wants "only this tenant's data". A join at query time can be much slower and may prevent the planner from using the vector index well.
metadata is JSONB. You can add category, language, version, page_number, or any future field without ALTER TABLE.
acl is a text array. Postgres GIN index on arrays gives you fast acl @> ARRAY['u1'] checks.
embedding is the real vector(1536) type from pgvector. You get the distance operators for free.
This single table now serves both the RAG retrieval path and the permission system.
| Workload | Normalized (join) | Denormalized (dupe merchant_id + metadata) | Winner for AI |
|---|---|---|---|
| Update merchant name | One row | Must update N chunk rows or use trigger | Normalized |
| "Give me all refund chunks for merchant m01" | Requires join + filter | Direct btree filter on merchant_id | Denormalized hot path, usually easier to optimize |
| Change ACL on a policy paragraph | Update one document row, trigger updates chunks | Same, but you control the trigger | Either |
| Vector similarity query + tenant filter + ACL filter | Planner may choose bad join order | All filters on the same table; HNSW + btree used together | Denormalized |
| Storage | Lower | ~15โ25 % higher | Normalized |
Rule of thumb for serious RAG systems:
merchant_id, acl, category, last_updated) into the chunks table.A vector index without the surrounding filters is useless in multi-tenant systems.
The combination that works:
merchant_id - filters 99 % of the corpus in a single merchant before the vector index even runs.acl - fast array containment.sql1-- You can also combine them in one composite index in newer Postgres + pgvector 2-- but separate indexes + bitmap AND in the planner is often clearer.
Never create only the vector index. The planner will fall back to a sequential scan on any query that also filters by merchant_id or acl.
When a new refund policy document arrives:
sql1BEGIN; 2INSERT INTO support_documents ...; 3INSERT INTO document_chunks (document_id, merchant_id, chunk_index, chunk_text, embedding, metadata, acl) 4VALUES (...), (...), ...; 5COMMIT;
Generate embeddings before opening the transaction if the external call is slow or flaky, then write the document row and chunk rows in one database transaction. Postgres + pgvector then guarantee that either the whole policy (with its vectors) is visible or none of it is.
For the "re-embed everything because we switched to a better embedding model" job you can relax:
corpus_version column.corpus_version = 2.AND corpus_version = (SELECT max FROM ...) or uses a view.This is the classic "strong consistency on the write path, eventual on the refresh path" pattern.
Install the official pgvector helper (it registers the vector type correctly):
bash1pip install "psycopg2-binary>=2.9" pgvector
Python loader sketch (the exact script you will adapt in your repo):
python1import psycopg2 2from pgvector.psycopg2 import register_vector 3import numpy as np 4import os 5 6conn = psycopg2.connect( 7 host="localhost", dbname="acme_rag", user="rag_app", password=os.environ["DB_PASSWORD"] 8) 9register_vector(conn) 10 11def upsert_chunks(chunks: list[dict]): 12 """chunks = [{"doc_id": "d7", "merchant_id": "m01", "chunk_index": 0, 13 "text": "...", "embedding": np.array([...], dtype=np.float32), 14 "metadata": {"category": "refund"}, "acl": ["u1", "u3"]}, ...]""" 15 with conn.cursor() as cur: 16 cur.execute("BEGIN") 17 for c in chunks: 18 cur.execute(""" 19 INSERT INTO document_chunks 20 (document_id, merchant_id, chunk_index, chunk_text, embedding, metadata, acl) 21 VALUES (%s, %s, %s, %s, %s, %s, %s) 22 ON CONFLICT (document_id, chunk_index) DO UPDATE 23 SET embedding = EXCLUDED.embedding, 24 metadata = EXCLUDED.metadata, 25 acl = EXCLUDED.acl 26 """, (c["doc_id"], c["merchant_id"], c["chunk_index"], c["text"], 27 c["embedding"], c["metadata"], c["acl"])) 28 conn.commit() 29 30def retrieve(merchant_id: str, user_acls: list[str], query_vec: np.ndarray, k: int = 5): 31 with conn.cursor() as cur: 32 cur.execute(""" 33 SELECT chunk_text, 34 metadata, 35 document_id, 36 chunk_index, 37 1 - (embedding <=> %s) AS cosine_sim 38 FROM document_chunks 39 WHERE merchant_id = %s 40 AND acl && %s -- overlap: at least one ACL matches 41 ORDER BY embedding <=> %s 42 LIMIT %s 43 """, (query_vec, merchant_id, user_acls, query_vec, k)) 44 return cur.fetchall() 45 46print("document_chunks loader and retrieval helpers are ready")
The <=> operator is cosine distance (1 โ cosine similarity). The && operator on arrays is the fast overlap check that uses the GIN index.
The query returns the exact text, the metadata you stored, and the source document + chunk index so the LLM can emit citations like [Policy ยง2, Ticket t1].
Run EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) on your retrieve query. You should see:
text1Limit (cost=12.34..45.67 rows=5 width=...) 2 -> Index Scan using idx_chunks_embedding on document_chunks 3 Index Cond: (embedding <=> $1) 4 Filter: (merchant_id = $2) AND (acl && $3) 5 Rows Removed by Filter: 124
If you see Seq Scan instead of Index Scan, you are missing one of the three indexes or the planner statistics are stale (ANALYZE document_chunks;).
A good plan uses the btree/GIN filter first to reduce the candidate set to a few thousand rows, then the HNSW index only on those candidates.
In your local repo (the same one you created in the Git, Docker, and Python foundation chapters):
pgvector to your docker-compose.yml Postgres service.migrations/003_rag_schema.sql file with the CREATE EXTENSION and the three tables + indexes above.scripts/load_support_rag.py that:
.md files from fixtures/support_docs/sentence-transformers if you have it) that returns 1536-dim vectorsupsert_chunksscripts/query_rag.py that takes --merchant m01 --user u1 "where is my refund" and prints the top-3 chunks with similarity and source.docker compose up and show that Bob (u3) cannot see the password-reset chunk while Alice can.Expected output: Alice sees three refund or policy chunks with citation IDs, while Bob sees zero chunks for the restricted password-reset document. If Bob sees it, the solution failed the ACL overlap check. If Alice sees no rows, inspect the merchant filter, vector dimension, and register_vector(conn) call before blaming the LLM.
When scripts/query_rag.py returns the correct three paragraphs with citations and the ACL test passes for both agents, you have built the exact data layer that every later RAG, agent, and evaluation chapter depends on.
register_vector(conn) โ can't adapt type 'numpy.ndarray'float[] column instead of vector(1536) โ distance operator falls back to Python and is 1000ร slower.WHERE merchant_id = ... becomes a sequential scan.text[] โ GIN index doesn't help and @> becomes a slow LIKE.document_chunks table with merchant_id, acl[], metadata jsonb, and vector for the read-heavy retrieval path.You can now look at any AI product and say: "Show me the schema for the retrieval layer" and immediately know whether it will scale, whether it will leak data between tenants, and whether a failed embedding job will corrupt answers.
psycopg2 + pgvector query that returns authorized, ranked chunks for a given merchant and user.EXPLAIN ANALYZE and point to the three indexes that must exist.You now understand the durable, permissioned, vector-augmented data layer that stores every training example, every RAG corpus, and every evaluation label your models will ever see. The next chapter teaches you how to measure the exact time, space, and memory-bandwidth cost of every query, every cache, and every forward pass that will run against those tables. Together they let you build retrieval and inference code that is both correct and efficient.