LeetLLM
LearnTracksPracticeBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Tracks
  • Practice
  • Blog
  • RSS

Legal

  • Terms of Service
  • Privacy Policy

© 2026 LeetLLM. All rights reserved.

All Topics
Your Progress
0%

0 of 158 articles completed

🛠️Computing Foundations0/9
Git, Shell, Linux for AIDocker for Reproducible AIPython for AI EngineeringNumPy and Tensor ShapesCUDA for ML TrainingMPS & Metal for ML on MacData Structures for AISQL and Data ModelingAlgorithms for ML Engineers
📊Math & Statistics0/8
Gradients and BackpropVectors, Matrices & TensorsLinear Algebra for MLAdam, Momentum, SchedulersProbability for Machine LearningStatistics and UncertaintyDistributions and SamplingHypothesis Tests, Intervals, and pass@k
📚Preparation & Prerequisites0/13
Neural Networks from ScratchCNNs from ScratchTraining & BackpropagationSoftmax, Cross-Entropy & OptimizationRNNs, LSTMs, GRUs, and Sequence ModelingAutoencoders and VAEsThe Transformer Architecture End-to-EndLanguage Modeling & Next TokensFrom GPT to Modern LLMsPrompt Engineering FundamentalsCalling LLM APIs in ProductionFirst AI App End-to-EndThe LLM Lifecycle
🧮ML Algorithms & Evaluation0/11
Linear Regression from ScratchLogistic Regression and MetricsDecision Trees, Forests, and BoostingReinforcement Learning BasicsValidation and LeakageClustering and PCACore Retrieval AlgorithmsDecoding AlgorithmsExperiment Design and A/B TestingPyTorch Training LoopsDataset Pipelines and Data Quality
📦Production ML Systems0/6
Feature Engineering for Production MLBatch and Streaming Feature PipelinesGradient Boosted Trees in ProductionRanking and Recommendation SystemsForecasting and Anomaly DetectionMonitoring Predictive Models
🧪Core LLM Foundations0/8
The Bitter Lesson & ComputeBPE, WordPiece, and SentencePieceStatic to Contextual EmbeddingsPerplexity & Model EvaluationFile Ingestion for AIChunking StrategiesLLM Benchmarks & LimitationsInstruction Tuning & Chat Templates
🧰Applied LLM Engineering0/23
Dimensionality Reduction for EmbeddingsCoT, ToT & Self-Consistency PromptingFunction Calling & Tool UseMCP & Tool Protocol StandardsPrompt Injection DefenseResponsible AI GovernanceData Labeling and Human FeedbackEvaluating AI AgentsProduction RAG PipelinesHybrid Search: Dense + SparseReranking and Cross-Encoders for RAGRAG Evaluation for Reliable AnswersLLM-as-a-Judge EvaluationBias & Fairness in LLMsHallucination Detection & MitigationLLM Observability & MonitoringExperiment Tracking with MLflow and W&BMixed Precision TrainingModel Versioning & DeploymentSemantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsModel Gateways, Routing, and FallbacksDesign an Automated Support Agent
🎓Portfolio Capstones0/9
Capstone: Delivery ETA PredictionCapstone: Product RankingCapstone: Demand ForecastingCapstone: Image Damage ClassifierCapstone: Production ML PipelineCapstone: Document QACapstone: Eval DashboardCapstone: Fine-Tuned ClassifierCapstone: Production Agent
🧠Transformer Deep Dives0/8
Sentence Embeddings & Contrastive LossEmbedding Similarity & QuantizationScaled Dot-Product AttentionVision Transformers and Image EncodersPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNMechanistic InterpretabilityDecoding Strategies: Greedy to Nucleus
🧬Advanced Training & Adaptation0/16
Scaling Laws & Compute-Optimal TrainingPre-training Data at ScaleBuild GPT from Scratch LabContinued Pretraining for Domain ShiftSynthetic Data PipelinesSupervised Fine-Tuning PipelineDistributed Training: FSDP & ZeROLoRA & Parameter-Efficient TuningReward Modeling from Preference DataRLHF & DPO AlignmentConstitutional AI & Red TeamingRLVR & Verifiable RewardsKnowledge Distillation for LLMsModel Merging and Weight InterpolationPrompt Optimization with DSPyRecursive Language Models (RLM)
🤖Advanced Agents & Retrieval0/14
Vector DB Internals: HNSW & IVFAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access ControlStructured Output GenerationReAct & Plan-and-ExecuteGuardrails & Safety FiltersCode Generation & SandboxingComputer-Use / GUI / Browser AgentsHuman-in-the-Loop Agent ArchitectureAI Coding Workflow with AgentsAgent Memory & PersistenceAgent Failure & RecoveryMulti-Agent Orchestration
⚡Inference & Production Scale0/20
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionPrefix Caching and Prompt CachingFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceModel Parallelism for LLM InferenceModel Quantization: GPTQ, AWQ & GGUFLocal LLM DeploymentSLM Specialization & Edge DeploymentSpeculative DecodingLong Context Window ManagementContext EngineeringMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time ComputeAdvanced MLOps & DevOps for AIGPU Serving & AutoscalingA/B Testing for LLMs
🏗️System Design Capstones0/9
Content Moderation SystemCode Completion SystemMulti-Tenant LLM PlatformLLM-Powered Search EngineVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models: Images & TextReal-Time Voice AI AgentReasoning & Test-Time Compute
🎤AI Lab Interviewing0/4
AI Lab Coding Interview: Python SystemsAI Lab System Design InterviewAI Lab Behavioral InterviewAI Lab Technical Presentation
Back to Topics
LearnComputing FoundationsDocker for Reproducible AI
⚙️EasyMLOps & Deployment

Docker for Reproducible AI

Turn the access-request scorer into a portable Docker image with .dockerignore, multi-stage builds, volume-mounted eval data, runtime secrets, docker compose, and a gate that reproduces the same 0.667 score before later GPU work.

13 min read
Learning path
Step 2 of 158 in the full curriculum
Git, Shell, Linux for AIPython for AI Engineering

Most AI engineering still fails at the boundary between "my laptop" and "any other machine". Git can keep files in sync, but it can't pin Python version, OS packages, runtime users, filesystem layout, or how secrets arrive. Docker matters because it turns those machine assumptions into one portable runtime contract that teammates, CI, and production can all run the same way.

You finished the Git chapter. The repo now has a tiny starter shape: eval/access_requests.jsonl, shell guards around repo hygiene, and a visible rule: three access-request rows go in, exact-match accuracy should come out as 0.667. You don't need to understand each line of the scorer yet. The later Python chapter will build that code carefully from scratch. Here, the job is narrower: make sure the same command runs in the same environment on each machine.

  • Git keeps files aligned.
  • Docker keeps runtime aligned.

Files plus shell guards still leave one gap: runtime drift. Another engineer can clone the same repo and hit a different Python version, a missing dependency, a leaked .env, or a volume permission error. Later, the same project will add embeddings, a vector database, and a serving API. If the runtime isn't pinned early, later chapters get harder to reproduce.

Docker adds the next contract: the repo now states both the files and the runtime that runs them.[1]Reference 1Docker Documentation.https://docs.docker.com/[2]Reference 2NVIDIA PyTorch Container Release Noteshttps://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-06.html

Check your reasoning: if two people run the same files with different Python versions and different installed packages, Git did its job but the project still isn't reproducible.

Docker reproducibility contract for the access-request scorer: repo files enter a builder stage, become a small runtime image, run with mounted eval data and runtime secrets, and reproduce the same 0.667 score with GPU handled as a later platform-specific extension. Docker reproducibility contract for the access-request scorer: repo files enter a builder stage, become a small runtime image, run with mounted eval data and runtime secrets, and reproduce the same 0.667 score with GPU handled as a later platform-specific extension.
The container turns the access-request scorer into a portable artifact. The base lesson is CPU-first so each student can run it; the GPU path keeps the same contracts but uses a platform-specific image and NVIDIA runtime checks.

Git locked the files. Docker locks Python version, packages, users, mounts, and secrets so the same command runs the same way elsewhere. The later Python chapter will harden the implementation and the checks around it.

What a container must guarantee

A useful Docker setup answers six questions before a teammate has to ask them.

BoundaryQuestionRepo contract
Base imageWhich Python and OS run the code?A pinned base image such as python:3.12-slim-trixie
DependenciesWhich packages are installed?requirements.txt copied before application code
Build contextWhich files enter the image build?.dockerignore that excludes secrets, caches, model weights, and virtualenvs
Runtime userWho owns files inside the container?A non-root user and explicit writable directories
DataWhere does eval data live?Tiny fixtures can be copied; changing data and model caches should be mounted
SecretsHow does the API key arrive?Runtime --env-file or Compose env_file, not ARG or COPY .env

This is the same idea as the previous chapter. Git made the files reproducible. Docker makes the environment portable. The later Python chapter will make the implementation and tests explicit.

Container boundary map showing what is baked, mounted, or injected at runtime. Container boundary map showing what is baked, mounted, or injected at runtime.
A good container setup answers the six boundary questions up front: what gets baked into the image, what stays on the host and mounts in, and what must only arrive at runtime.

Start with .dockerignore

Create .dockerignore at the root of access-rag/, next to .gitignore:

.dockerignore
1# .dockerignore - keep the build context tiny and secret-free 2.env 3.env.* 4*.pem 5secrets/ 6.git/ 7.github/ 8.venv/ 9__pycache__/ 10*.py[cod] 11*.egg-info/ 12node_modules/ 13models/ 14*.gguf 15*.safetensors 16*.bin 17chroma/ 18faiss_index/ 19*.db 20runs/ 21eval_cache/ 22wandb/ 23mlruns/ 24.DS_Store 25.idea/ 26.vscode/ 27*.swp

Docker reads this file before sending the build context to the daemon. Without it, a broad COPY . /app can accidentally send .env, cached model files, local indexes, notebooks, and virtualenvs into the build. That makes images slower to build and easier to leak.

Add the tiny files the image expects

The Dockerfile below expects a dependency file and an importable scorer module. Create both before you build the image.

For now, requirements.txt can be empty. The file still matters because the Docker layer cache will treat dependency changes separately from code changes.

terminal
1touch requirements.txt 2mkdir -p scripts eval

Add the three-row fixture if it isn't already present from the previous chapter:

terminal-2
1cat > eval/access_requests.jsonl << 'EOF' 2{"prompt": "Access request 101 status?", "expected": "approved", "prediction": "approved"} 3{"prompt": "Access request 102 status?", "expected": "blocked", "prediction": "escalated"} 4{"prompt": "Access request 103 status?", "expected": "restored", "prediction": "restored"} 5EOF

Now add a small starter scorer. The Python chapter later in the course will turn this into a cleaner typed module; this version exists so Docker has something real to run today.

scripts/score.py
1# scripts/score.py 2from __future__ import annotations 3 4import json 5import sys 6from pathlib import Path 7 8def normalize(label: str) -> str: 9 return label.strip().lower() 10 11def main() -> int: 12 path = Path(sys.argv[1] if len(sys.argv) > 1 else "eval/access_requests.jsonl") 13 rows = [json.loads(line) for line in path.read_text(encoding="utf-8").splitlines() if line.strip()] 14 if not rows: 15 raise SystemExit("No evaluation rows found") 16 17 print(f"Loading {len(rows)} evaluation rows from {path}") 18 19 correct = 0 20 for index, row in enumerate(rows, start=1): 21 expected = normalize(row["expected"]) 22 prediction = normalize(row["prediction"]) 23 if expected == prediction: 24 correct += 1 25 print(f"Row {index}: exact match") 26 else: 27 print(f"Row {index}: mismatch (expected {expected}, got {prediction})") 28 29 accuracy = correct / len(rows) 30 print(f"Exact-match accuracy: {accuracy:.3f} ({correct}/{len(rows)})") 31 if correct != 2 or len(rows) != 3: 32 raise SystemExit("Eval regression detected") 33 34 print("Gate passed.") 35 return 0 36 37if __name__ == "__main__": 38 raise SystemExit(main())

Build the smallest runnable scorer image

Start with a CPU image. That choice is deliberate. The three-row scorer doesn't need a GPU, and a beginner should be able to prove the container contract on a normal laptop before adding NVIDIA runtime setup.

Create this Dockerfile:

Dockerfile
1# syntax=docker/dockerfile:1 2FROM python:3.12-slim-trixie AS builder 3 4ENV PYTHONDONTWRITEBYTECODE=1 \ 5 PYTHONUNBUFFERED=1 \ 6 PIP_NO_CACHE_DIR=1 \ 7 PIP_DISABLE_PIP_VERSION_CHECK=1 8 9RUN python -m venv /opt/venv 10ENV PATH="/opt/venv/bin:$PATH" 11 12COPY requirements.txt /tmp/requirements.txt 13RUN python -m pip install --upgrade pip setuptools wheel \ 14 && python -m pip install --no-cache-dir -r /tmp/requirements.txt 15 16FROM python:3.12-slim-trixie AS runtime 17 18ENV PYTHONDONTWRITEBYTECODE=1 \ 19 PYTHONUNBUFFERED=1 \ 20 PATH="/opt/venv/bin:$PATH" 21 22RUN useradd --create-home --no-log-init --user-group --uid 10001 --shell /usr/sbin/nologin appuser 23 24WORKDIR /app 25 26COPY --from=builder /opt/venv /opt/venv 27COPY --chown=appuser:appuser scripts/ scripts/ 28COPY --chown=appuser:appuser eval/ eval/ 29 30USER appuser 31 32ENTRYPOINT ["python", "-m", "scripts.score"] 33CMD ["eval/access_requests.jsonl"]

The first stage installs dependencies into /opt/venv. The second stage starts from a clean Python image, copies the virtualenv plus the tiny app files, and runs as appuser instead of root.

The -slim-trixie suffix pins both the Python version and the OS: slim drops build tooling and docs for a smaller surface, and trixie names Debian 13, the current base for the official Python images since its release in August 2025.[1]Reference 1Docker Documentation.https://docs.docker.com/ Pin the distribution explicitly rather than relying on a bare python:3.12-slim; a floating tag silently follows whatever Debian release is newest and can shift system libraries (OpenSSL, glibc, compilers) under your code. For reproducibility that must survive months, go one step further and pin the image digest (python:3.12-slim-trixie@sha256:...) so the same bytes resolve on every machine and in CI.

Later, when you add FastAPI, Chroma, tokenizers, or PyTorch, dependency changes invalidate the dependency layer instead of the code layer. Editing scripts/score.py won't force a full package reinstall.

Build and run it

From the project root:

terminal-3
1docker build -t access-rag:local .

Run the image without any host mount first:

terminal-4
1docker run --rm access-rag:local

Expected output:

Output
1Loading 3 evaluation rows from eval/access_requests.jsonl 2Row 1: exact match 3Row 2: mismatch (expected blocked, got escalated) 4Row 3: exact match 5Exact-match accuracy: 0.667 (2/3) 6Gate passed.

Worked reasoning: why the score survives the image

The image ID isn't the evidence. The evidence is that the same three rows produce the same 0.667 after the code has moved into a pinned container that any machine can rebuild.

Work the tiny eval by hand before trusting the tool:

RowExpected labelPredicted labelResult
1approvedapprovedexact match
2blockedescalatedmismatch
3restoredrestoredexact match

That gives 2 correct rows out of 3. The arithmetic is:

text
1accuracy = correct / total 2accuracy = 2 / 3 3accuracy = 0.666... 4printed as 0.667

This small distinction matters. A bad gate might compare the raw float to 0.667 and fail because 2 / 3 is mathematically 0.666.... A better gate checks the count (correct >= 2) or compares against the exact fraction (accuracy >= 2 / 3). The container doesn't fix bad scoring logic. It makes bad scoring logic easier to reproduce.

Contract checkWhat proves it
Same files are presentscripts/score.py and eval/access_requests.jsonl are copied into /app
Same command runsENTRYPOINT ["python", "-m", "scripts.score"]
Same data can be supplied later-v "$(pwd)/eval:/app/eval:ro" overrides /app/eval with host data
Same result appearsboth runs print Exact-match accuracy: 0.667 (2/3)

Now run it with the eval directory mounted with :ro. The same image should score data supplied at runtime:

terminal-5
1docker run --rm \ 2 -v "$(pwd)/eval:/app/eval:ro" \ 3 access-rag:local \ 4 eval/access_requests.jsonl

The output should still be 0.667. If it isn't, you have a real reproducibility bug: either the mounted file differs from the copied fixture, the scorer depends on host state, or the command points at the wrong path.

Practice: break one thing on purpose

After the first successful run, make one small mistake and predict the symptom before you rerun the command.

ChangePredictionWhy
Rename eval/access_requests.jsonl on the hostmounted run fails with a missing filethe :ro mount replaces the image's /app/eval directory
Remove requirements.txt from the build contextbuild fails at the COPY requirements.txt stepDocker can copy files that exist in the context
Add env_file: .env to Compose before creating .envCompose reports the missing env filedeclared runtime files must exist locally
Run on a machine without Docker Compose v2docker compose version is unknownDocker Engine and the Compose plugin are separate on some Linux installs

These failures are useful. Each one proves which part of the contract you were relying on.

Compose keeps the local command stable

For one service, docker run is fine. As soon as the project adds a vector database, API, worker, or model cache, you want one checked-in Compose file so each engineer starts the same stack.

Start with a runnable docker-compose.yml for the scorer:

docker-compose.yml
1services: 2 scorer: 3 build: 4 context: . 5 dockerfile: Dockerfile 6 image: access-rag:local 7 volumes: 8 - ./eval:/app/eval:ro 9 command: ["eval/access_requests.jsonl"]

Validate the file before running it:

terminal-6
1if docker compose version >/dev/null 2>&1; then 2 docker compose config --quiet 3 docker compose run --rm scorer 4else 5 echo "Docker Compose v2 plugin missing. Install docker-compose-plugin before using compose." 6fi

Docker Desktop includes Compose v2. Some Linux installs need the docker-compose-plugin package first.[1]Reference 1Docker Documentation.https://docs.docker.com/ If docker compose version is unknown, the Dockerfile is still usable through docker run, but the Compose workflow isn't installed yet. Note two modern conventions: the old top-level version: key is obsolete and Compose v2 ignores it (omit it, as above), and the preferred filename is now compose.yaml, though docker-compose.yml still works. Always use the hyphen-free docker compose command; the legacy docker-compose v1 binary is end-of-life.

When the RAG stack grows, add vector-db, api, and worker services to this same file. Don't add fake services before they exist. A Compose file that starts today is better than an impressive YAML file that fails on the first command.

Secrets belong at runtime

Don't pass real secrets with ARG:

Dockerfile-2
1# BAD: the value can leak through image history and layers 2ARG OPENAI_API_KEY 3RUN echo "$OPENAI_API_KEY" > /tmp/key.txt

Use runtime environment instead:

terminal-7
1docker run --rm \ 2 --env-file .env \ 3 -v "$(pwd)/eval:/app/eval:ro" \ 4 access-rag:local \ 5 eval/access_requests.jsonl

Compose uses the same idea:

secrets-belong-at-runtime.yaml
1services: 2 scorer: 3 # Add this after you have a local .env file. 4 env_file: 5 - .env

The secret is available to the process when the container runs. It isn't copied into the image, pushed to the registry, or shown by docker history.

When the GPU enters the story

The CPU-first image is the right base contract here. It works on Linux, macOS, CI runners, and most developer laptops. A CUDA image is different: it targets Linux hosts with NVIDIA drivers and the NVIDIA Container Toolkit, which installs a runtime hook so the container can see the host GPU. You configure it once with sudo nvidia-ctk runtime configure --runtime=docker, then pass --gpus at run time.[2]Reference 2NVIDIA PyTorch Container Release Noteshttps://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-06.html

When a later PyTorch or inference chapter needs GPU acceleration, keep the same contracts and change the platform-specific pieces:

ContractCPU scorer nowGPU workload later
Base imagepython:3.12-slim-trixieofficial PyTorch CUDA image or NVIDIA CUDA runtime image matched to the torch wheel
Runtime checkscorer returns 0.667python -c "import torch; assert torch.cuda.is_available()"
Run commanddocker run access-rag:localdocker run --gpus all --shm-size=2g ...
Portability claimsame Python runtime on normal machinessame image on compatible Linux NVIDIA machines

Don't promise that one CUDA image gives Apple Silicon and NVIDIA parity. On Apple Silicon, use the CPU image for this foundation scorer or a separate Metal/MPS path for PyTorch. For production GPU serving, test the Linux NVIDIA image on a host that has the NVIDIA runtime.

The first GPU smoke test is:

terminal-8
1docker run --rm --gpus all nvidia/cuda:12.9.2-base-ubuntu24.04 nvidia-smi

If that fails, the Dockerfile isn't the problem yet. The host can't expose the GPU to containers.

That CUDA tag is illustrative; NVIDIA publishes new CUDA versions often, so pin to whatever release your PyTorch wheel targets rather than copying the number here. One subtle trap: the container's CUDA runtime must not be newer than the host's NVIDIA driver supports, or the container fails with CUDA driver version is insufficient for CUDA runtime version. That mismatch lives on the host, not in your Dockerfile.

The gate that protects the contract

Add a local or CI gate that proves the image still builds and the score still matches:

terminal-9
1#!/usr/bin/env bash 2set -euo pipefail 3 4docker build -t access-rag:check . 5docker run --rm \ 6 -v "$PWD/eval:/app/eval:ro" \ 7 access-rag:check \ 8 eval/access_requests.jsonl 9if docker compose version >/dev/null 2>&1; then 10 docker compose config --quiet 11else 12 echo "Docker Compose v2 plugin missing. Install docker-compose-plugin before enabling the compose gate." 13fi

This catches broken COPY paths, missing files, invalid Compose syntax, and scorer drift before a teammate pulls the repo. If your machine doesn't have the Compose plugin yet, keep the docker build and docker run lines as the mandatory gate, then install Compose before using the Compose part.

Failure modes worth memorizing

SymptomMost common causeFix that belongs in the repo
ModuleNotFoundError inside the containerdependency missing from requirements.txtadd it to requirements.txt, rebuild, and keep dependency install before COPY scripts/
.env appears in the image context.dockerignore forgot .env*add .env*, rebuild, and check docker history before pushing
Permission denied on a mounted directorycontainer user can't read or write the host pathmount eval fixtures with :ro, use named volumes for writable data, or document the host permissions
each code edit reinstalls dependenciesCOPY . /app happens before dependency installcopy requirements.txt first, install dependencies, then copy application code
could not select device driver with capabilities: [[gpu]]NVIDIA Container Toolkit missing or host isn't NVIDIA Linuxdocument host GPU setup and keep the CPU scorer path working
Cloud Run starts but can't read configlocal .env was assumed to exist in productionuse Secret Manager or the deployment platform's secret mechanism, not a copied file

The important habit isn't memorizing each Docker flag. It's making the repo contain the diagnosis, the command, and the prevention.

The runtime contract

The runtime contract is explicit:

  1. git clone brings the code, the three-row eval, the scorer, the Dockerfile, .dockerignore, and docker-compose.yml.
  2. docker build -t access-rag:local . produces a pinned Python runtime.
  3. docker run --rm access-rag:local returns the expected 0.667 from the starter scorer.
  4. docker compose run --rm scorer gives teammates one stable local command.
  5. Later GPU images, vector databases, API services, workers, and Cloud Run deployments must preserve the same habit: pinned runtime, mounted data, runtime secrets, and a gate that proves the expected output.

The next chapter opens the Python scorer instead of leaving it hidden behind a command. Docker matters there because code bugs are hard enough without also wondering whether two machines are running different Python environments.

Mastery check

Key concepts

  • multi-stage Dockerfile (builder vs runtime) for a slim Python image
  • .dockerignore patterns for AI projects
  • volume mounts (-v) and bind mounts for eval fixtures
  • docker compose for a stable local command
  • secrets management (env_file, --env-file, no ARG or baked secrets)
  • non-root users and layer cache optimization
  • reproducible build across laptops and CI
  • GPU container failures as a later platform-specific extension

Evaluation rubric

  • Foundational: Writes a correct multi-stage Dockerfile and .dockerignore that builds a slim runtime image containing the three-row eval scorer and produces 0.667 when run with or without the eval volume mount
  • Intermediate: Authors a docker-compose.yml that validates with docker compose config --quiet, runs the scorer through one stable local command, and keeps secrets in runtime env files
  • Advanced: Diagnoses and prevents common 'container worked on my laptop' failures: missing dependencies, secret leaks, volume permission problems, layer cache thrash, and premature GPU assumptions

Follow-up questions

Common pitfalls

  • Using a single-stage Dockerfile that leaves build tools and the entire context in the final image. The container is larger than necessary and harder to inspect.
  • COPY . /app before installing requirements.txt so each code change invalidates the dependency layer; teammates wait for package installs on each rebuild.
  • Passing secrets with --build-arg OPENAI_API_KEY or baking .env into the image; the key now lives in each layer and in the registry, and docker history exposes it.
  • Running the container as root and mounting host directories; permission denied errors or root-owned output files appear on teammate machines.
  • Making the first Docker lesson require NVIDIA runtime setup even though the scorer runs on CPU; the beginner doesn't prove the basic container contract.
Complete the lesson

Mastery Check

Answer every question, then check your score. Score above 75% to mark this lesson complete.

1.Two engineers have identical repo files, but one laptop uses a different Python minor version and lacks packages. Which setup directly turns the runtime assumptions into a portable contract?
2.A project root contains .env, .venv/, model weights, a local vector index, and a Dockerfile with a broad COPY . /app. What should be added before building?
3.Editing scripts/score.py makes pip install run again on every rebuild because the Dockerfile copies the whole repo before installing dependencies. Which structure fixes that?
4.An image contains /app/eval, but a run bind-mounts host ./eval over /app/eval:ro. The host rows are approved/approved, blocked/escalated, and restored/restored. What should the gate prove?
5.Only the CPU scorer exists today, but the project will later add an API, worker, vector database, and GPU workloads. Which Compose workflow should be checked in now?
6.A Dockerfile uses ARG OPENAI_API_KEY and writes the value into a file during build. How should the API key be supplied for the scorer container instead?
7.A later PyTorch image fails the smoke test docker run --rm --gpus all nvidia/cuda:... nvidia-smi with could not select device driver with capabilities: [[gpu]]. What is the most likely next step?
8.The container runs as non-root appuser and gets Permission denied when writing to a bind-mounted host directory. Which fix belongs in the repo's container workflow?

8 questions remaining.

Next Step
Continue to Python for AI Engineering

You now have a portable runtime contract (multi-stage image, `.dockerignore`, runtime secrets via env_file, compose, non-root user) that guarantees the same Python environment on any machine. The next chapter opens the scorer implementation and writes the actual Python code (dataclasses, validation, CLI entrypoint, tests) that runs inside that image and scores the exact same three-row `access_requests.jsonl` fixture.

PreviousGit, Shell, Linux for AI
Share this article
XFacebookLinkedInBlueskyRedditHacker NewsEmail
References

Docker Documentation.

Docker Inc. · 2026 · Official documentation

NVIDIA PyTorch Container Release Notes

NVIDIA · 2025