Turn the access-request scorer into a portable Docker image with .dockerignore, multi-stage builds, volume-mounted eval data, runtime secrets, docker compose, and a gate that reproduces the same 0.667 score before later GPU work.
Most AI engineering still fails at the boundary between "my laptop" and "any other machine". Git can keep files in sync, but it can't pin Python version, OS packages, runtime users, filesystem layout, or how secrets arrive. Docker matters because it turns those machine assumptions into one portable runtime contract that teammates, CI, and production can all run the same way.
You finished the Git chapter. The repo now has a tiny starter shape: eval/access_requests.jsonl, shell guards around repo hygiene, and a visible rule: three access-request rows go in, exact-match accuracy should come out as 0.667. You don't need to understand each line of the scorer yet. The later Python chapter will build that code carefully from scratch. Here, the job is narrower: make sure the same command runs in the same environment on each machine.
Files plus shell guards still leave one gap: runtime drift. Another engineer can clone the same repo and hit a different Python version, a missing dependency, a leaked .env, or a volume permission error. Later, the same project will add embeddings, a vector database, and a serving API. If the runtime isn't pinned early, later chapters get harder to reproduce.
Docker adds the next contract: the repo now states both the files and the runtime that runs them.[1][2]
Check your reasoning: if two people run the same files with different Python versions and different installed packages, Git did its job but the project still isn't reproducible.
Git locked the files. Docker locks Python version, packages, users, mounts, and secrets so the same command runs the same way elsewhere. The later Python chapter will harden the implementation and the checks around it.
A useful Docker setup answers six questions before a teammate has to ask them.
| Boundary | Question | Repo contract |
|---|---|---|
| Base image | Which Python and OS run the code? | A pinned base image such as python:3.12-slim-trixie |
| Dependencies | Which packages are installed? | requirements.txt copied before application code |
| Build context | Which files enter the image build? | .dockerignore that excludes secrets, caches, model weights, and virtualenvs |
| Runtime user | Who owns files inside the container? | A non-root user and explicit writable directories |
| Data | Where does eval data live? | Tiny fixtures can be copied; changing data and model caches should be mounted |
| Secrets | How does the API key arrive? | Runtime --env-file or Compose env_file, not ARG or COPY .env |
This is the same idea as the previous chapter. Git made the files reproducible. Docker makes the environment portable. The later Python chapter will make the implementation and tests explicit.
.dockerignoreCreate .dockerignore at the root of access-rag/, next to .gitignore:
1# .dockerignore - keep the build context tiny and secret-free
2.env
3.env.*
4*.pem
5secrets/
6.git/
7.github/
8.venv/
9__pycache__/
10*.py[cod]
11*.egg-info/
12node_modules/
13models/
14*.gguf
15*.safetensors
16*.bin
17chroma/
18faiss_index/
19*.db
20runs/
21eval_cache/
22wandb/
23mlruns/
24.DS_Store
25.idea/
26.vscode/
27*.swpDocker reads this file before sending the build context to the daemon. Without it, a broad COPY . /app can accidentally send .env, cached model files, local indexes, notebooks, and virtualenvs into the build. That makes images slower to build and easier to leak.
The Dockerfile below expects a dependency file and an importable scorer module. Create both before you build the image.
For now, requirements.txt can be empty. The file still matters because the Docker layer cache will treat dependency changes separately from code changes.
1touch requirements.txt
2mkdir -p scripts evalAdd the three-row fixture if it isn't already present from the previous chapter:
1cat > eval/access_requests.jsonl << 'EOF'
2{"prompt": "Access request 101 status?", "expected": "approved", "prediction": "approved"}
3{"prompt": "Access request 102 status?", "expected": "blocked", "prediction": "escalated"}
4{"prompt": "Access request 103 status?", "expected": "restored", "prediction": "restored"}
5EOFNow add a small starter scorer. The Python chapter later in the course will turn this into a cleaner typed module; this version exists so Docker has something real to run today.
1# scripts/score.py
2from __future__ import annotations
3
4import json
5import sys
6from pathlib import Path
7
8def normalize(label: str) -> str:
9 return label.strip().lower()
10
11def main() -> int:
12 path = Path(sys.argv[1] if len(sys.argv) > 1 else "eval/access_requests.jsonl")
13 rows = [json.loads(line) for line in path.read_text(encoding="utf-8").splitlines() if line.strip()]
14 if not rows:
15 raise SystemExit("No evaluation rows found")
16
17 print(f"Loading {len(rows)} evaluation rows from {path}")
18
19 correct = 0
20 for index, row in enumerate(rows, start=1):
21 expected = normalize(row["expected"])
22 prediction = normalize(row["prediction"])
23 if expected == prediction:
24 correct += 1
25 print(f"Row {index}: exact match")
26 else:
27 print(f"Row {index}: mismatch (expected {expected}, got {prediction})")
28
29 accuracy = correct / len(rows)
30 print(f"Exact-match accuracy: {accuracy:.3f} ({correct}/{len(rows)})")
31 if correct != 2 or len(rows) != 3:
32 raise SystemExit("Eval regression detected")
33
34 print("Gate passed.")
35 return 0
36
37if __name__ == "__main__":
38 raise SystemExit(main())Start with a CPU image. That choice is deliberate. The three-row scorer doesn't need a GPU, and a beginner should be able to prove the container contract on a normal laptop before adding NVIDIA runtime setup.
Create this Dockerfile:
1# syntax=docker/dockerfile:1
2FROM python:3.12-slim-trixie AS builder
3
4ENV PYTHONDONTWRITEBYTECODE=1 \
5 PYTHONUNBUFFERED=1 \
6 PIP_NO_CACHE_DIR=1 \
7 PIP_DISABLE_PIP_VERSION_CHECK=1
8
9RUN python -m venv /opt/venv
10ENV PATH="/opt/venv/bin:$PATH"
11
12COPY requirements.txt /tmp/requirements.txt
13RUN python -m pip install --upgrade pip setuptools wheel \
14 && python -m pip install --no-cache-dir -r /tmp/requirements.txt
15
16FROM python:3.12-slim-trixie AS runtime
17
18ENV PYTHONDONTWRITEBYTECODE=1 \
19 PYTHONUNBUFFERED=1 \
20 PATH="/opt/venv/bin:$PATH"
21
22RUN useradd --create-home --no-log-init --user-group --uid 10001 --shell /usr/sbin/nologin appuser
23
24WORKDIR /app
25
26COPY /opt/venv /opt/venv
27COPY scripts/ scripts/
28COPY eval/ eval/
29
30USER appuser
31
32ENTRYPOINT ["python", "-m", "scripts.score"]
33CMD ["eval/access_requests.jsonl"]The first stage installs dependencies into /opt/venv. The second stage starts from a clean Python image, copies the virtualenv plus the tiny app files, and runs as appuser instead of root.
The -slim-trixie suffix pins both the Python version and the OS: slim drops build tooling and docs for a smaller surface, and trixie names Debian 13, the current base for the official Python images since its release in August 2025.[1] Pin the distribution explicitly rather than relying on a bare python:3.12-slim; a floating tag silently follows whatever Debian release is newest and can shift system libraries (OpenSSL, glibc, compilers) under your code. For reproducibility that must survive months, go one step further and pin the image digest (python:3.12-slim-trixie@sha256:...) so the same bytes resolve on every machine and in CI.
Later, when you add FastAPI, Chroma, tokenizers, or PyTorch, dependency changes invalidate the dependency layer instead of the code layer. Editing scripts/score.py won't force a full package reinstall.
From the project root:
1docker build -t access-rag:local .Run the image without any host mount first:
1docker run --rm access-rag:localExpected output:
1Loading 3 evaluation rows from eval/access_requests.jsonl
2Row 1: exact match
3Row 2: mismatch (expected blocked, got escalated)
4Row 3: exact match
5Exact-match accuracy: 0.667 (2/3)
6Gate passed.The image ID isn't the evidence. The evidence is that the same three rows produce the same 0.667 after the code has moved into a pinned container that any machine can rebuild.
Work the tiny eval by hand before trusting the tool:
| Row | Expected label | Predicted label | Result |
|---|---|---|---|
| 1 | approved | approved | exact match |
| 2 | blocked | escalated | mismatch |
| 3 | restored | restored | exact match |
That gives 2 correct rows out of 3. The arithmetic is:
1accuracy = correct / total
2accuracy = 2 / 3
3accuracy = 0.666...
4printed as 0.667This small distinction matters. A bad gate might compare the raw float to 0.667 and fail because 2 / 3 is mathematically 0.666.... A better gate checks the count (correct >= 2) or compares against the exact fraction (accuracy >= 2 / 3). The container doesn't fix bad scoring logic. It makes bad scoring logic easier to reproduce.
| Contract check | What proves it |
|---|---|
| Same files are present | scripts/score.py and eval/access_requests.jsonl are copied into /app |
| Same command runs | ENTRYPOINT ["python", "-m", "scripts.score"] |
| Same data can be supplied later | -v "$(pwd)/eval:/app/eval:ro" overrides /app/eval with host data |
| Same result appears | both runs print Exact-match accuracy: 0.667 (2/3) |
Now run it with the eval directory mounted with :ro. The same image should score data supplied at runtime:
1docker run --rm \
2 -v "$(pwd)/eval:/app/eval:ro" \
3 access-rag:local \
4 eval/access_requests.jsonlThe output should still be 0.667. If it isn't, you have a real reproducibility bug: either the mounted file differs from the copied fixture, the scorer depends on host state, or the command points at the wrong path.
After the first successful run, make one small mistake and predict the symptom before you rerun the command.
| Change | Prediction | Why |
|---|---|---|
Rename eval/access_requests.jsonl on the host | mounted run fails with a missing file | the :ro mount replaces the image's /app/eval directory |
Remove requirements.txt from the build context | build fails at the COPY requirements.txt step | Docker can copy files that exist in the context |
Add env_file: .env to Compose before creating .env | Compose reports the missing env file | declared runtime files must exist locally |
| Run on a machine without Docker Compose v2 | docker compose version is unknown | Docker Engine and the Compose plugin are separate on some Linux installs |
These failures are useful. Each one proves which part of the contract you were relying on.
For one service, docker run is fine. As soon as the project adds a vector database, API, worker, or model cache, you want one checked-in Compose file so each engineer starts the same stack.
Start with a runnable docker-compose.yml for the scorer:
1services:
2 scorer:
3 build:
4 context: .
5 dockerfile: Dockerfile
6 image: access-rag:local
7 volumes:
8 - ./eval:/app/eval:ro
9 command: ["eval/access_requests.jsonl"]Validate the file before running it:
1if docker compose version >/dev/null 2>&1; then
2 docker compose config --quiet
3 docker compose run --rm scorer
4else
5 echo "Docker Compose v2 plugin missing. Install docker-compose-plugin before using compose."
6fiDocker Desktop includes Compose v2. Some Linux installs need the docker-compose-plugin package first.[1] If docker compose version is unknown, the Dockerfile is still usable through docker run, but the Compose workflow isn't installed yet. Note two modern conventions: the old top-level version: key is obsolete and Compose v2 ignores it (omit it, as above), and the preferred filename is now compose.yaml, though docker-compose.yml still works. Always use the hyphen-free docker compose command; the legacy docker-compose v1 binary is end-of-life.
When the RAG stack grows, add vector-db, api, and worker services to this same file. Don't add fake services before they exist. A Compose file that starts today is better than an impressive YAML file that fails on the first command.
Don't pass real secrets with ARG:
1# BAD: the value can leak through image history and layers
2ARG OPENAI_API_KEY
3RUN echo "$OPENAI_API_KEY" > /tmp/key.txtUse runtime environment instead:
1docker run --rm \
2 --env-file .env \
3 -v "$(pwd)/eval:/app/eval:ro" \
4 access-rag:local \
5 eval/access_requests.jsonlCompose uses the same idea:
1services:
2 scorer:
3 # Add this after you have a local .env file.
4 env_file:
5 - .envThe secret is available to the process when the container runs. It isn't copied into the image, pushed to the registry, or shown by docker history.
The CPU-first image is the right base contract here. It works on Linux, macOS, CI runners, and most developer laptops. A CUDA image is different: it targets Linux hosts with NVIDIA drivers and the NVIDIA Container Toolkit, which installs a runtime hook so the container can see the host GPU. You configure it once with sudo nvidia-ctk runtime configure --runtime=docker, then pass --gpus at run time.[2]
When a later PyTorch or inference chapter needs GPU acceleration, keep the same contracts and change the platform-specific pieces:
| Contract | CPU scorer now | GPU workload later |
|---|---|---|
| Base image | python:3.12-slim-trixie | official PyTorch CUDA image or NVIDIA CUDA runtime image matched to the torch wheel |
| Runtime check | scorer returns 0.667 | python -c "import torch; assert torch.cuda.is_available()" |
| Run command | docker run access-rag:local | docker run --gpus all --shm-size=2g ... |
| Portability claim | same Python runtime on normal machines | same image on compatible Linux NVIDIA machines |
Don't promise that one CUDA image gives Apple Silicon and NVIDIA parity. On Apple Silicon, use the CPU image for this foundation scorer or a separate Metal/MPS path for PyTorch. For production GPU serving, test the Linux NVIDIA image on a host that has the NVIDIA runtime.
The first GPU smoke test is:
1docker run --rm --gpus all nvidia/cuda:12.9.2-base-ubuntu24.04 nvidia-smiIf that fails, the Dockerfile isn't the problem yet. The host can't expose the GPU to containers.
That CUDA tag is illustrative; NVIDIA publishes new CUDA versions often, so pin to whatever release your PyTorch wheel targets rather than copying the number here. One subtle trap: the container's CUDA runtime must not be newer than the host's NVIDIA driver supports, or the container fails with CUDA driver version is insufficient for CUDA runtime version. That mismatch lives on the host, not in your Dockerfile.
Add a local or CI gate that proves the image still builds and the score still matches:
1#!/usr/bin/env bash
2set -euo pipefail
3
4docker build -t access-rag:check .
5docker run --rm \
6 -v "$PWD/eval:/app/eval:ro" \
7 access-rag:check \
8 eval/access_requests.jsonl
9if docker compose version >/dev/null 2>&1; then
10 docker compose config --quiet
11else
12 echo "Docker Compose v2 plugin missing. Install docker-compose-plugin before enabling the compose gate."
13fiThis catches broken COPY paths, missing files, invalid Compose syntax, and scorer drift before a teammate pulls the repo. If your machine doesn't have the Compose plugin yet, keep the docker build and docker run lines as the mandatory gate, then install Compose before using the Compose part.
| Symptom | Most common cause | Fix that belongs in the repo |
|---|---|---|
ModuleNotFoundError inside the container | dependency missing from requirements.txt | add it to requirements.txt, rebuild, and keep dependency install before COPY scripts/ |
.env appears in the image context | .dockerignore forgot .env* | add .env*, rebuild, and check docker history before pushing |
Permission denied on a mounted directory | container user can't read or write the host path | mount eval fixtures with :ro, use named volumes for writable data, or document the host permissions |
| each code edit reinstalls dependencies | COPY . /app happens before dependency install | copy requirements.txt first, install dependencies, then copy application code |
could not select device driver with capabilities: [[gpu]] | NVIDIA Container Toolkit missing or host isn't NVIDIA Linux | document host GPU setup and keep the CPU scorer path working |
| Cloud Run starts but can't read config | local .env was assumed to exist in production | use Secret Manager or the deployment platform's secret mechanism, not a copied file |
The important habit isn't memorizing each Docker flag. It's making the repo contain the diagnosis, the command, and the prevention.
The runtime contract is explicit:
git clone brings the code, the three-row eval, the scorer, the Dockerfile, .dockerignore, and docker-compose.yml.docker build -t access-rag:local . produces a pinned Python runtime.docker run --rm access-rag:local returns the expected 0.667 from the starter scorer.docker compose run --rm scorer gives teammates one stable local command.The next chapter opens the Python scorer instead of leaving it hidden behind a command. Docker matters there because code bugs are hard enough without also wondering whether two machines are running different Python environments.
docker compose config --quiet, runs the scorer through one stable local command, and keeps secrets in runtime env filesdocker history exposes it.Answer every question, then check your score. Score above 75% to mark this lesson complete.
8 questions remaining.