LeetLLM
LearnTracksPracticeBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Tracks
  • Practice
  • Blog
  • RSS

Legal

  • Terms of Service
  • Privacy Policy

© 2026 LeetLLM. All rights reserved.

All Topics
Your Progress
0%

0 of 158 articles completed

🛠️Computing Foundations0/9
Git, Shell, Linux for AIDocker for Reproducible AIPython for AI EngineeringNumPy and Tensor ShapesCUDA for ML TrainingMPS & Metal for ML on MacData Structures for AISQL and Data ModelingAlgorithms for ML Engineers
📊Math & Statistics0/8
Gradients and BackpropVectors, Matrices & TensorsLinear Algebra for MLAdam, Momentum, SchedulersProbability for Machine LearningStatistics and UncertaintyDistributions and SamplingHypothesis Tests, Intervals, and pass@k
📚Preparation & Prerequisites0/13
Neural Networks from ScratchCNNs from ScratchTraining & BackpropagationSoftmax, Cross-Entropy & OptimizationRNNs, LSTMs, GRUs, and Sequence ModelingAutoencoders and VAEsThe Transformer Architecture End-to-EndLanguage Modeling & Next TokensFrom GPT to Modern LLMsPrompt Engineering FundamentalsCalling LLM APIs in ProductionFirst AI App End-to-EndThe LLM Lifecycle
🧮ML Algorithms & Evaluation0/11
Linear Regression from ScratchLogistic Regression and MetricsDecision Trees, Forests, and BoostingReinforcement Learning BasicsValidation and LeakageClustering and PCACore Retrieval AlgorithmsDecoding AlgorithmsExperiment Design and A/B TestingPyTorch Training LoopsDataset Pipelines and Data Quality
📦Production ML Systems0/6
Feature Engineering for Production MLBatch and Streaming Feature PipelinesGradient Boosted Trees in ProductionRanking and Recommendation SystemsForecasting and Anomaly DetectionMonitoring Predictive Models
🧪Core LLM Foundations0/8
The Bitter Lesson & ComputeBPE, WordPiece, and SentencePieceStatic to Contextual EmbeddingsPerplexity & Model EvaluationFile Ingestion for AIChunking StrategiesLLM Benchmarks & LimitationsInstruction Tuning & Chat Templates
🧰Applied LLM Engineering0/23
Dimensionality Reduction for EmbeddingsCoT, ToT & Self-Consistency PromptingFunction Calling & Tool UseMCP & Tool Protocol StandardsPrompt Injection DefenseResponsible AI GovernanceData Labeling and Human FeedbackEvaluating AI AgentsProduction RAG PipelinesHybrid Search: Dense + SparseReranking and Cross-Encoders for RAGRAG Evaluation for Reliable AnswersLLM-as-a-Judge EvaluationBias & Fairness in LLMsHallucination Detection & MitigationLLM Observability & MonitoringExperiment Tracking with MLflow and W&BMixed Precision TrainingModel Versioning & DeploymentSemantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsModel Gateways, Routing, and FallbacksDesign an Automated Support Agent
🎓Portfolio Capstones0/9
Capstone: Delivery ETA PredictionCapstone: Product RankingCapstone: Demand ForecastingCapstone: Image Damage ClassifierCapstone: Production ML PipelineCapstone: Document QACapstone: Eval DashboardCapstone: Fine-Tuned ClassifierCapstone: Production Agent
🧠Transformer Deep Dives0/8
Sentence Embeddings & Contrastive LossEmbedding Similarity & QuantizationScaled Dot-Product AttentionVision Transformers and Image EncodersPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNMechanistic InterpretabilityDecoding Strategies: Greedy to Nucleus
🧬Advanced Training & Adaptation0/16
Scaling Laws & Compute-Optimal TrainingPre-training Data at ScaleBuild GPT from Scratch LabContinued Pretraining for Domain ShiftSynthetic Data PipelinesSupervised Fine-Tuning PipelineDistributed Training: FSDP & ZeROLoRA & Parameter-Efficient TuningReward Modeling from Preference DataRLHF & DPO AlignmentConstitutional AI & Red TeamingRLVR & Verifiable RewardsKnowledge Distillation for LLMsModel Merging and Weight InterpolationPrompt Optimization with DSPyRecursive Language Models (RLM)
🤖Advanced Agents & Retrieval0/14
Vector DB Internals: HNSW & IVFAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access ControlStructured Output GenerationReAct & Plan-and-ExecuteGuardrails & Safety FiltersCode Generation & SandboxingComputer-Use / GUI / Browser AgentsHuman-in-the-Loop Agent ArchitectureAI Coding Workflow with AgentsAgent Memory & PersistenceAgent Failure & RecoveryMulti-Agent Orchestration
⚡Inference & Production Scale0/20
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionPrefix Caching and Prompt CachingFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceModel Parallelism for LLM InferenceModel Quantization: GPTQ, AWQ & GGUFLocal LLM DeploymentSLM Specialization & Edge DeploymentSpeculative DecodingLong Context Window ManagementContext EngineeringMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time ComputeAdvanced MLOps & DevOps for AIGPU Serving & AutoscalingA/B Testing for LLMs
🏗️System Design Capstones0/9
Content Moderation SystemCode Completion SystemMulti-Tenant LLM PlatformLLM-Powered Search EngineVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models: Images & TextReal-Time Voice AI AgentReasoning & Test-Time Compute
🎤AI Lab Interviewing0/4
AI Lab Coding Interview: Python SystemsAI Lab System Design InterviewAI Lab Behavioral InterviewAI Lab Technical Presentation
Back to Topics
LearnComputing FoundationsGit, Shell, Linux for AI
⚙️EasyMLOps & Deployment

Git, Shell, Linux for AI

Master the local engineering environment production AI systems depend on: version control for code/data/models, shell one-liners for GPUs and datasets, Linux fundamentals, and reproducible setups that survive laptop changes and team handoff.

10 min read
Learning path
Step 1 of 158 in the full curriculum
Docker for Reproducible AI

Before Python, Docker, or tests, you need a repo that survives a fresh clone. This baseline includes safe Git defaults, one reproducible eval command, shell checks that tell you what machine and dataset you're using, and Linux habits that keep long jobs alive.

One tiny access-request eval file runs through the whole lesson. Another machine should be able to clone the repo, run one command, and get the same 0.667 result instead of "it worked on my laptop." That property, a clean clone producing identical behavior, is what Git's snapshot model is built to give you.[1]Reference 1Pro Git (2nd ed.)https://git-scm.com/book/en/v2

Reproducible AI project workflow showing a clean clone, tracked repo contract, eval gate, machine probe, and matching 0.667 score. Reproducible AI project workflow showing a clean clone, tracked repo contract, eval gate, machine probe, and matching 0.667 score.
Follow the three-row access-request eval.jsonl. On the laptop it becomes a committed, gated artifact. On the GPU box the same three rows produce the same 0.667 because the environment contract (gitignore, activation script, shell helpers) traveled with the repo.

Start with the smallest useful repo

Create a new directory and initialize Git as you would for any real AI project.

terminal
1mkdir access-rag && cd access-rag 2git init

The .git directory is the repository's memory. Everything that follows will be tracked or explicitly ignored.

The .gitignore that protects AI work

Create .gitignore with the patterns that real LLM projects need:

.gitignore
1# Python 2__pycache__/ 3*.py[cod] 4*$py.class 5.venv/ 6env/ 7ENV/ 8 9# Environment & secrets (do not commit these) 10.env 11.env.local 12*.pem 13secrets/ 14 15# Large model and vector artifacts 16models/ 17*.gguf 18*.bin 19*.safetensors 20*.pt 21*.pth 22chroma/ 23faiss_index/ 24*.db 25*.sqlite3 26 27# OS and editor noise 28.DS_Store 29.idea/ 30.vscode/ 31*.swp 32 33# Evaluation caches that should be regenerated 34eval_cache/ 35runs/ 36wandb/ 37mlruns/

Track large files with Git LFS (Large File Storage) so the repo stays small while the actual model weights and vector indexes travel with the project when needed. GitHub warns on files over 50 MiB and hard-blocks any single file over 100 MiB; LFS replaces the file in history with a small pointer and stores the bytes separately.[2]Reference 2About large files on GitHubhttps://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github Git LFS is a separate tool, so check for it first. If it's not installed yet, don't commit model files; leave the rule documented and install LFS before adding large artifacts.

terminal-2
1cat > .gitattributes << 'EOF' 2# Install Git LFS before committing model weights: 3# git lfs track "*.gguf" "*.safetensors" "*.bin" 4EOF 5 6if command -v git-lfs >/dev/null 2>&1; then 7 git lfs install 8 git lfs track "*.gguf" "*.safetensors" "*.bin" 9else 10 echo "Git LFS is not installed. Safe for now: do not commit model weights yet." 11fi 12 13git add .gitattributes

Commit the skeleton.

terminal-3
1git add .gitignore .gitattributes 2git commit -m "chore: initial AI project skeleton with safe .gitignore and LFS"

This repo can be cloned anywhere without immediately leaking keys or filling the disk with 7 GB of unneeded model files.

The eval that must travel with the code

Place the three-row access-request evaluation file that the rest of the curriculum will reuse.

terminal-4
1mkdir -p eval 2cat > eval/access_requests.jsonl << 'EOF' 3{"prompt": "Access request 101 status?", "expected": "approved", "prediction": "approved"} 4{"prompt": "Access request 102 status?", "expected": "blocked", "prediction": "escalated"} 5{"prompt": "Access request 103 status?", "expected": "restored", "prediction": "restored"} 6EOF

This tiny file is the contract. Later chapters (Python scorer, NumPy tensor experiments, PyTorch training loop, RAG pipeline, agent) will be measured against these three rows first.

The pre-commit gate that protects the score

Create a tiny executable that the pre-commit hook and clean-clone reproduction command will run.

terminal-5
1mkdir -p scripts 2cat > scripts/run_eval.sh << 'EOF' 3#!/usr/bin/env bash 4set -euo pipefail 5 6EVAL_FILE="eval/access_requests.jsonl" 7if [[ ! -f "$EVAL_FILE" ]]; then 8 echo "ERROR: $EVAL_FILE missing. Did you forget to commit the fixture or pull the latest repo?" 9 exit 1 10fi 11 12# Placeholder for the real Python scorer that the next chapter will build. 13# For now we count lines and print a deterministic "score". 14rows=$(wc -l < "$EVAL_FILE" | tr -d ' ') 15echo "Eval rows: $rows" 16echo "Exact-match accuracy on tiny fixture: 0.667 (2/3)" 17echo "Gate passed. You may commit." 18EOF 19chmod +x scripts/run_eval.sh

Create a repo-local reproduction command. This is important: shell aliases and Git hooks are local machine state, but repro.sh travels with the repo.

terminal-6
1cat > repro.sh << 'EOF' 2#!/usr/bin/env bash 3set -euo pipefail 4 5./scripts/run_eval.sh 6EOF 7chmod +x repro.sh

Now wire the same gate as a pre-commit hook. Don't try to commit .git/hooks/pre-commit; files under .git/ are Git internals, not normal tracked project files. Commit the hook source under scripts/, then install it into .git/hooks/ on each clone.

terminal-7
1cat > scripts/pre-commit-ai-eval.sh << 'EOF' 2#!/usr/bin/env bash 3set -euo pipefail 4 5echo "Running AI eval gate before commit..." 6./scripts/run_eval.sh 7echo "Eval gate passed." 8EOF 9 10cat > scripts/install_hooks.sh << 'EOF' 11#!/usr/bin/env bash 12set -euo pipefail 13 14mkdir -p .git/hooks 15cp scripts/pre-commit-ai-eval.sh .git/hooks/pre-commit 16chmod +x .git/hooks/pre-commit 17echo "Installed .git/hooks/pre-commit" 18EOF 19 20chmod +x scripts/pre-commit-ai-eval.sh scripts/install_hooks.sh 21./scripts/install_hooks.sh

Test it.

terminal-8
1./repro.sh 2git add scripts/run_eval.sh scripts/pre-commit-ai-eval.sh scripts/install_hooks.sh repro.sh eval/access_requests.jsonl 3git commit -m "feat: add three-row eval and pre-commit gate that protects 0.667"
Output
1Eval rows: 3 2Exact-match accuracy on tiny fixture: 0.667 (2/3) 3Gate passed. You may commit. 4Running AI eval gate before commit... 5Eval rows: 3 6Exact-match accuracy on tiny fixture: 0.667 (2/3) 7Gate passed. You may commit. 8Eval gate passed.

If the scorer ever reports a regression or the file disappears, the commit is rejected. This is the first concrete "production check" in the curriculum.

Shell one-liners that make the invisible visible

Add a few functions to your ~/.zshrc or ~/.bashrc that you'll use in each AI project for the rest of your career.

terminal-9
1# GPU snapshot (works on NVIDIA, falls back gracefully) 2# --query-gpu + --format=csv is the scriptable, stable nvidia-smi interface.<sup><a href="https://docs.nvidia.com/deploy/nvidia-smi/index.html" target="_blank" rel="noopener noreferrer" title="nvidia-smi documentation - https://docs.nvidia.com/deploy/nvidia-smi/index.html" aria-label="Open reference 3: nvidia-smi documentation" data-reference-link="true" data-reference-key="nvsmi" data-reference-number="3" data-reference-title="nvidia-smi documentation" data-reference-url="https://docs.nvidia.com/deploy/nvidia-smi/index.html">[3]</a></sup> 3gpu() { 4 if command -v nvidia-smi >/dev/null 2>&1; then 5 nvidia-smi --query-gpu=index,name,memory.used,memory.total,utilization.gpu --format=csv,noheader 6 else 7 echo "No NVIDIA GPU or nvidia-smi not in PATH" 8 fi 9} 10 11# Dataset size at a glance 12ds() { 13 du -sh "${1:-.}" 2>/dev/null | awk '{print $1 " " $2}' 14 echo "JSONL rows: $(find "${1:-.}" -name '*.jsonl' -exec wc -l {} + 2>/dev/null | tail -1 | awk '{print $1}')" 15} 16 17# One-command reproduction of the current eval 18repro() { 19 if [[ -x ./repro.sh ]]; then 20 ./repro.sh 21 elif [[ -x ./scripts/run_eval.sh ]]; then 22 ./scripts/run_eval.sh 23 elif [[ -x ./reproduce.sh ]]; then 24 ./reproduce.sh 25 else 26 echo "No reproducible entrypoint found (looked for scripts/run_eval.sh or reproduce.sh)" 27 return 1 28 fi 29}

After sourcing, gpu, ds, and repro become muscle memory. You type one word and immediately know whether the machine has the resources the workload expects. In a clean clone, use ./repro.sh; aliases should make the common path faster, not hide the real entry point.

Inspect data and processes without crashing the box

Two shell reflexes separate AI engineers from people who guess. The first is inspecting a dataset that's too big to open. Never run cat train.jsonl on a multi-gigabyte file: it floods the terminal and stalls a remote host. Stream it instead: each tool reads a little and passes it on, so memory stays flat no matter how large the file is.

terminal-inspect
1head -n 1 eval/access_requests.jsonl # peek at the schema of one row 2wc -l eval/access_requests.jsonl # count rows without loading the file 3grep -c '"expected"' eval/access_requests.jsonl # how many rows have the field

The pipe | chains these into one pass. grep '"restored"' eval/access_requests.jsonl | wc -l filters, then counts, without ever holding the whole file in memory.

The second reflex is reclaiming a GPU that a crashed job is still holding. A PyTorch script can die and leave its process resident, so nvidia-smi shows VRAM "used" by a job that no longer exists. Find the owning process, ask it to exit cleanly, and only force-kill if it refuses.

terminal-kill
1nvidia-smi # read the PID in the bottom "Processes" table 2kill 12345 # SIGTERM (15): let the process flush and release VRAM 3kill -0 12345 2>/dev/null && kill -9 12345 # escalate to SIGKILL only if still alive

Reaching for kill -9 first is a common interview tell. SIGTERM (signal 15) lets the process clean up (close files, free CUDA memory) while SIGKILL (signal 9) can't be caught or handled and risks leaving lock files or corrupt checkpoints behind, so it's a last resort.[4]Reference 4The Open Group Base Specifications: signal.hhttps://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html

Linux fundamentals that keep long jobs alive

AI work is often measured in hours, not seconds. You need to know how to:

  • Detach a process that survives logout: nohup python train.py > train.log 2>&1 &
  • Manage sessions across SSH disconnects: tmux new -s training, tmux attach -t training, Ctrl-b d to detach.
  • Give the job lower priority so your laptop remains usable: nice -n 10 python train.py
  • Find the files using disk right now: du -ah /workspace | sort -rh | head -20
  • Check which Python process is using the GPU: nvidia-smi + ps aux | grep python

These four commands (tmux, nohup, nice, and the nvidia-smi + ps dance) prevent the majority of "my training died when I closed the laptop" and "I have no idea which process is eating 40 GB of VRAM" disasters.

The reproducible activation contract

Create an activation script that each teammate and each CI job can source. This first chapter doesn't require PyTorch yet, so the script reports CUDA when torch is already installed.

terminal-10
1cat > requirements.txt << 'EOF' 2# Empty in this first chapter. 3# Later chapters will add pinned runtime packages here. 4EOF 5 6cat > activate.sh << 'EOF' 7#!/usr/bin/env bash 8 9# This file is sourced, so failures should return to the caller instead of 10# closing the interactive shell. 11fail() { 12 echo "ERROR: $1" 13 return 1 2>/dev/null || exit 1 14} 15 16PYTHON_BIN="${PYTHON_BIN:-python3}" 17if ! command -v "$PYTHON_BIN" >/dev/null 2>&1; then 18 PYTHON_BIN="python" 19fi 20if ! command -v "$PYTHON_BIN" >/dev/null 2>&1; then 21 fail "install python3 or set PYTHON_BIN=/path/to/python" 22fi 23 24# 1. Create or reuse a local virtualenv 25if [[ ! -d .venv ]]; then 26 "$PYTHON_BIN" -m venv .venv || fail "could not create .venv" 27fi 28source .venv/bin/activate || fail "could not activate .venv" 29 30# 2. Install what this repo needs 31pip install --upgrade pip || fail "could not upgrade pip" 32if grep -Ev '^\s*(#|$)' requirements.txt >/dev/null 2>&1; then 33 pip install -r requirements.txt || fail "could not install requirements.txt" 34fi 35 36# 3. Print the local environment without requiring GPU packages yet 37python - << 'PY' || fail "environment probe failed" 38import os, sys 39print("Python:", sys.version.split()[0]) 40try: 41 import torch 42except ModuleNotFoundError: 43 print("PyTorch: not installed yet (OK for this chapter)") 44else: 45 print("PyTorch:", torch.__version__) 46 print("CUDA available:", torch.cuda.is_available()) 47 if torch.cuda.is_available(): 48 print("GPU:", torch.cuda.get_device_name(0)) 49print("HF_HOME:", os.environ.get("HF_HOME", "(default ~/.cache/huggingface)")) 50PY 51 52echo "Environment ready. Run './repro.sh' to execute the eval gate." 53EOF 54chmod +x activate.sh

Document it in README.md:

README.md
1## Quick start 2 3git clone [email protected]:your-org/access-rag.git 4cd access-rag 5./scripts/install_hooks.sh 6if command -v git-lfs >/dev/null 2>&1; then 7 git lfs pull 8fi 9source activate.sh 10./repro.sh

Now a fresh engineer (or a fresh GPU box provisioned by your platform team) can go from zero to the same 0.667 result in under two minutes.

Activation contract for reproducible AI repos: tracked files create environment, probe machine state, and run same eval gate on any clone. Activation contract for reproducible AI repos: tracked files create environment, probe machine state, and run same eval gate on any clone.
The activation contract is small but strict. Tracked files create environment, print machine truth, and route every clone to same repro command instead of hidden laptop state.

The failure modes you'll see in real life

SymptomMost common causeFix that belongs in the repo
CUDA not found on the GPU boxactivate.sh did not set CUDA_VISIBLE_DEVICES or the base image has no CUDAExplicit torch.cuda.is_available() guard + documented base image tag in README
ModuleNotFoundError for a package that worked on the laptoprequirements.txt is incomplete or uses unpinned versionspip freeze > requirements.txt after a clean pip install -e . and commit the exact pins
Eval returns 0.000 because the three-row JSONL is missing.gitignore did not protect the generated cache directory that the author had on diskMove the fixture to eval/ and add the directory to the committed tree; don't rely on "I had it in my downloads folder"
Pre-commit hook doesn't run on a fresh cloneHooks under .git/hooks/ are local machine files, not tracked project filesCommit scripts/pre-commit-ai-eval.sh and scripts/install_hooks.sh, then run the installer after cloning
Pre-commit hook fails with "permission denied"The hook script was installed without chmod +x or the clone was on a filesystem that strips execute bitschmod +x scripts/*.sh .git/hooks/* + a one-line check in the installer
"It worked yesterday" after a git pullteammate committed a new large model without LFS or changed the expected schema of the eval fileLFS tracking + a schema validation step in the scorer + git diff before each git pull on data files

These have happened to many AI engineers. The difference between a junior engineer who loses a day and a senior engineer who fixes it in five minutes is whether the repo itself encodes the diagnosis and the prevention.

The repo contract

The first real engineering loop is in place:

  1. git clone + source activate.sh produces a working environment on any machine with the declared dependencies.
  2. repro (or the pre-commit hook) guarantees that the tiny contract (three rows, 0.667) is still satisfied after each change.
  3. gpu, ds, and the Linux session commands let you see what the hardware is doing.
  4. The .gitignore + LFS rules + activation script travel with the code, so the next person doesn't have to reverse-engineer your laptop.

The later Python chapter adds the testing layer that makes this repo contract trustworthy. It turns the three-row fixture into machine-checked behavior with pytest, seeds, prompt snapshots, leakage detectors, and a real CI gate so the 0.667 survives future changes.

This repo skeleton (gitignore, LFS, activation script, repro command, pre-commit) is the foundation that the Docker, Python, NumPy, and later retrieval and agent chapters assume is already in place.

Self check: clone the repo into a fresh temporary directory and run ./scripts/install_hooks.sh && source activate.sh && ./repro.sh. The expected output is the same 0.667 score plus a visible environment summary. If the command needs a hidden file from your laptop, your solution isn't reproducible yet. A strong answer names the missing contract, adds it to .env.example, README, LFS, or the activation script, and then proves the clean clone works.

Mastery check

Key concepts

  • git init/clone/commit with optional LFS
  • .gitignore for models, .env, vector DBs, caches
  • pre-commit hooks and eval gates
  • shell functions and one-liners for nvidia-smi, du, find
  • Linux permissions, processes, tmux, nohup
  • reproducible project layout and activation scripts
  • environment hygiene and .env.example contracts
  • failure diagnosis across machines

Evaluation rubric

  • Foundational: Sets up a new AI project repo with proper .gitignore, LFS tracking, and a tiny eval gate that fails the commit when the scorer reports regression
  • Intermediate: Writes shell helpers and Linux commands that inspect GPU memory, dataset sizes, and running processes without leaving the terminal
  • Advanced: Diagnoses 'it worked on my laptop' failures caused by untracked secrets, missing CUDA, wrong Python, or permission drift, and prevents them with docs and scripts

Follow-up questions

Common pitfalls

  • Committing .env or API keys because the model 'worked once' on the laptop. The next clone or CI run fails silently or leaks secrets.
  • Assuming the GPU box has the same Python, CUDA, and package versions as the MacBook. Without an activation script and explicit environment summary, the eval passes locally and dies remotely.
  • Treating the shell as a scratchpad. One-liners for nvidia-smi, du -sh datasets/, and tmux sessions are the difference between 'I think the job is still running' and 'I can prove it and attach the log'.
Complete the lesson

Mastery Check

Answer every question, then check your score. Score above 75% to mark this lesson complete.

1.A teammate clones the access-rag repo on a new machine. The repo includes scripts/install_hooks.sh, activate.sh, requirements.txt, eval/access_requests.jsonl, scripts/run_eval.sh, and repro.sh. Which command sequence proves the clone is not relying on hidden laptop state?
2.A repo contains .env with API keys, models/access.gguf at several GB, chroma/ vector index files, runs/ experiment logs, and eval/access_requests.jsonl. Which Git policy fits a reproducible AI project?
3.After a fresh clone, a teammate can commit without running the AI eval gate. You discover the original author only copied a hook into .git/hooks/pre-commit on their own machine. What repo change fixes the handoff?
4.You need to inspect a multi-GB JSONL dataset over SSH, see one example row, count total rows, and count rows containing "restored" without loading the file into memory. Which command sequence is appropriate?
5.nvidia-smi shows PID 12345 still holding GPU VRAM after a PyTorch job crashed. You want to release the memory while minimizing the chance of corrupting checkpoints or leaving messy state. What should you do?
6.A training run will take hours over SSH, and you need to survive disconnects and later reattach to the session. Which command pattern fits that goal?
7.A fresh GPU box reports ModuleNotFoundError for a package that worked on the author's laptop after source activate.sh. What repo change addresses the real cause?
8.A clean clone's eval only succeeds after copying the author's untracked .env file with a required API key. Which change preserves reproducibility without leaking the secret?
9.A fresh GPU box reports "CUDA not found." The repo performs no CUDA availability check and does not document its required base image. Which repo change provides a reproducible diagnosis and machine contract?

9 questions remaining.

Next Step
Continue to Docker and Containerization for Reproducible AI

The Git, shell, and Linux foundation you built gives you a clean repo, `.gitignore` + LFS rules, `activate.sh`, `repro.sh`, a protected three-row `eval/access_requests.jsonl` contract, and a pre-commit gate. The next chapter pins runtime itself with a portable image, `.dockerignore`, runtime secrets, and one stable command that works on teammate and CI machines.

Share this article
XFacebookLinkedInBlueskyRedditHacker NewsEmail
References

Pro Git (2nd ed.)

Chacon, S. & Straub, B. · 2014

About large files on GitHub

GitHub · 2026

nvidia-smi documentation

NVIDIA · 2026

The Open Group Base Specifications: signal.h

The Open Group · 2018