LeetLLM
LearnPracticeFeaturesBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Practice
  • Features
  • Blog

Legal

  • Terms of Service
  • Privacy Policy

© 2026 LeetLLM. All rights reserved.

All Topics
Your Progress
0%

0 of 158 articles completed

🛠️Computing Foundations0/9
Git, Shell, Linux for AIDocker for Reproducible AIPython for AI EngineeringNumPy and Tensor ShapesCUDA for ML TrainingMPS & Metal for ML on MacData Structures for AISQL and Data ModelingAlgorithms for ML Engineers
📊Math & Statistics0/8
Gradients and BackpropVectors, Matrices & TensorsLinear Algebra for MLAdam, Momentum, SchedulersProbability for Machine LearningStatistics and UncertaintyDistributions and SamplingHypothesis Tests, Intervals, and pass@k
📚Preparation & Prerequisites0/13
Neural Networks from ScratchCNNs from ScratchTraining & BackpropagationSoftmax, Cross-Entropy & OptimizationRNNs, LSTMs, GRUs, and Sequence ModelingAutoencoders and VAEsThe Transformer Architecture End-to-EndLanguage Modeling & Next TokensFrom GPT to Modern LLMsPrompt Engineering FundamentalsCalling LLM APIs in ProductionFirst AI App End-to-EndThe LLM Lifecycle
🧮ML Algorithms & Evaluation0/11
Linear Regression from ScratchLogistic Regression and MetricsDecision Trees, Forests, and BoostingReinforcement Learning BasicsValidation and LeakageClustering and PCACore Retrieval AlgorithmsDecoding AlgorithmsExperiment Design and A/B TestingPyTorch Training LoopsDataset Pipelines and Data Quality
📦Production ML Systems0/6
Feature Engineering for Production MLBatch and Streaming Feature PipelinesGradient Boosted Trees in ProductionRanking and Recommendation SystemsForecasting and Anomaly DetectionMonitoring Predictive Models
🧪Core LLM Foundations0/8
The Bitter Lesson & ComputeBPE, WordPiece, and SentencePieceStatic to Contextual EmbeddingsPerplexity & Model EvaluationFile Ingestion for AIChunking StrategiesLLM Benchmarks & LimitationsInstruction Tuning & Chat Templates
🧰Applied LLM Engineering0/23
Dimensionality Reduction for EmbeddingsCoT, ToT & Self-Consistency PromptingFunction Calling & Tool UseMCP & Tool Protocol StandardsPrompt Injection DefenseResponsible AI GovernanceData Labeling and Human FeedbackEvaluating AI AgentsProduction RAG PipelinesHybrid Search: Dense + SparseReranking and Cross-Encoders for RAGRAG Evaluation for Reliable AnswersLLM-as-a-Judge EvaluationBias & Fairness in LLMsHallucination Detection & MitigationLLM Observability & MonitoringExperiment Tracking with MLflow and W&BMixed Precision TrainingModel Versioning & DeploymentSemantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsModel Gateways, Routing, and FallbacksDesign an Automated Support Agent
🎓Portfolio Capstones0/9
Capstone: Delivery ETA PredictionCapstone: Product RankingCapstone: Demand ForecastingCapstone: Image Damage ClassifierCapstone: Production ML PipelineCapstone: Document QACapstone: Eval DashboardCapstone: Fine-Tuned ClassifierCapstone: Production Agent
🧠Transformer Deep Dives0/8
Sentence Embeddings & Contrastive LossEmbedding Similarity & QuantizationScaled Dot-Product AttentionVision Transformers and Image EncodersPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNMechanistic InterpretabilityDecoding Strategies: Greedy to Nucleus
🧬Advanced Training & Adaptation0/16
Scaling Laws & Compute-Optimal TrainingPre-training Data at ScaleBuild GPT from Scratch LabContinued Pretraining for Domain ShiftSynthetic Data PipelinesSupervised Fine-Tuning PipelineDistributed Training: FSDP & ZeROLoRA & Parameter-Efficient TuningReward Modeling from Preference DataRLHF & DPO AlignmentConstitutional AI & Red TeamingRLVR & Verifiable RewardsKnowledge Distillation for LLMsModel Merging and Weight InterpolationPrompt Optimization with DSPyRecursive Language Models (RLM)
🤖Advanced Agents & Retrieval0/14
Vector DB Internals: HNSW & IVFAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access ControlStructured Output GenerationReAct & Plan-and-ExecuteGuardrails & Safety FiltersCode Generation & SandboxingComputer-Use / GUI / Browser AgentsHuman-in-the-Loop Agent ArchitectureAI Coding Workflow with AgentsAgent Memory & PersistenceAgent Failure & RecoveryMulti-Agent Orchestration
⚡Inference & Production Scale0/20
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionPrefix Caching and Prompt CachingFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceModel Parallelism for LLM InferenceModel Quantization: GPTQ, AWQ & GGUFLocal LLM DeploymentSLM Specialization & Edge DeploymentSpeculative DecodingLong Context Window ManagementContext EngineeringMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time ComputeAdvanced MLOps & DevOps for AIGPU Serving & AutoscalingA/B Testing for LLMs
🏗️System Design Capstones0/9
Content Moderation SystemCode Completion SystemMulti-Tenant LLM PlatformLLM-Powered Search EngineVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models: Images & TextReal-Time Voice AI AgentReasoning & Test-Time Compute
🎤AI Lab Interviewing0/4
AI Lab Coding Interview: Python SystemsAI Lab System Design InterviewAI Lab Behavioral InterviewAI Lab Technical Presentation
Back to Topics
LearnPreparation & PrerequisitesFirst AI App End-to-End
⚙️EasyMLOps & Deployment

First AI App End-to-End

Ship one traceable rotation-decision workflow: validated input, model boundary, stored status, clear UI states, failure tests, and deploy checks.

13 min read
Learning path
Step 29 of 158 in the full curriculum
Calling LLM APIs in ProductionThe LLM Lifecycle

Alex opens a stale service-account key for account acct_10234. Your earlier wrapper can ask a model whether policy line P-7 supports a rotation, but a customer can't use a wrapper by itself. They need a form, a clear result, and a useful error when the call fails.

Now turn that checked model call into a small application. One request travels from browser input to a server route, through the model boundary, into a trace record, and back to the screen. The app decides eligibility only; creating a rotation job remains a separate idempotent action.

Vertical slice for stale service-account key acct_10234. Browser states send a typed request into a deterministic server boundary around the probabilistic P-7 model call. Checked results and named failures update a redacted trace ledger, while tests and deploy checks form proof rails below. Rotation-job creation remains locked. Vertical slice for stale service-account key acct_10234. Browser states send a typed request into a deterministic server boundary around the probabilistic P-7 model call. Checked results and named failures update a redacted trace ledger, while tests and deploy checks form proof rails below. Rotation-job creation remains locked.
Wrap one probabilistic decision in deterministic rails: typed input, policy verification, explicit UI states, durable redacted traces, boundary tests, and deploy checks. Rotation job creation stays outside this workflow.

Define the one job your app does

A first AI app doesn't need chat history, autonomous tools, or a retrieval system. It needs one user problem with an observable answer:

Given a credential report for account acct_10234, decide whether policy line P-7 supports a rotation, cite that line in the result, and never create a rotation job as a hidden side effect.

That sentence fixes the app boundary before you choose a framework:

LayerResponsibility in this appWhat it must not do
BrowserCollect account ID and credential report; show state.Call a model with a secret key.
API routeValidate request; return stable response fields.Embed provider-specific prompt logic.
Decision serviceCall the checked wrapper; translate outcome into status.Issue a rotation job.
Trace storeSave status, evidence, latency, and redacted input fingerprint.Store unnecessary customer text.
TestsProve completion, rejection, timeout, and health behavior.Spend money on model calls.

Start with data the browser and server agree on

The route receives a report and returns a decision. Before a model call exists, write those shapes down as schemas. Pydantic validates Python data against declared fields and constraints; FastAPI can use the same models at the HTTP boundary later.[1]

This first runnable cell rejects missing identifiers and constrains the response to the decisions your UI understands:

define-rotation-contract.py
1from typing import Literal 2 3from pydantic import BaseModel, Field 4 5class RotationReport(BaseModel): 6 account_id: str = Field(pattern=r"^acct_\d{5}$") 7 item: str = Field(min_length=3, max_length=80) 8 credential_report: str = Field(min_length=10, max_length=500) 9 10class RotationDecision(BaseModel): 11 decision: Literal["eligible", "not_eligible", "needs_review"] 12 rotation_window_days: int = Field(ge=0, le=30) 13 source_line_ids: list[str] = Field(min_length=1) 14 15report = RotationReport( 16 account_id="acct_10234", 17 item="ceramic lamp", 18 credential_report="Service-account key is older than policy.", 19) 20decision = RotationDecision( 21 decision="eligible", 22 rotation_window_days=30, 23 source_line_ids=["P-7"], 24) 25 26print(report.account_id, "=>", decision.decision, decision.source_line_ids)
Validated contract
1acct_10234 => eligible ['P-7']

The schema proves field shape, not truth. An output can match the schema and still cite the wrong policy or claim a window your policy doesn't allow.

The application therefore performs business-rule checks after parsing. Here, P-7 says stale service-account keys may be rotated within 30 days. The browser doesn't get to claim how old a credential is; the service reads that fact from a trusted account record:

check-policy-evidence.py
1from typing import Literal 2 3from pydantic import BaseModel, Field 4 5class RotationDecision(BaseModel): 6 decision: Literal["eligible", "not_eligible", "needs_review"] 7 rotation_window_days: int = Field(ge=0, le=90) 8 source_line_ids: list[str] = Field(min_length=1) 9 10def verify_p7(decision: RotationDecision, *, credential_age_days: int) -> RotationDecision: 11 if "P-7" not in decision.source_line_ids: 12 raise ValueError("decision lacks policy evidence") 13 if decision.rotation_window_days != 30: 14 raise ValueError("result conflicts with P-7") 15 if decision.decision == "eligible" and credential_age_days > decision.rotation_window_days: 16 raise ValueError("eligible result exceeds P-7 window") 17 return decision 18 19checked = verify_p7( 20 RotationDecision( 21 decision="eligible", 22 rotation_window_days=30, 23 source_line_ids=["P-7"], 24 ), 25 credential_age_days=12, # Trusted account record for acct_10234. 26) 27 28print("checked:", checked.decision, "under", checked.source_line_ids[0]) 29 30try: 31 verify_p7(checked, credential_age_days=45) 32except ValueError as error: 33 print("rejected:", error)
Policy check
1checked: eligible under P-7 2rejected: eligible result exceeds P-7 window

This guardrail catches contradictions the server can prove from trusted data. It doesn't prove that free-form text describes real damage, so ambiguous reports can still become needs_review.

Give every request a trace

If an answer looks wrong tomorrow, you need more than the text shown on screen. A trace record is a small stored account of one request: its identifier, status, prompt version, policy evidence, and failure class when something went wrong.

Don't save raw customer descriptions merely because storage is easy. This local record stores a fingerprint of the report, enough to match repeat test inputs without retaining its text:

create-trace-record.py
1from dataclasses import asdict, dataclass 2import hmac 3from hashlib import sha256 4 5FINGERPRINT_KEY = b"local-demo-key" # Demo only; inject a secret in production. 6 7def fingerprint(text: str) -> str: 8 return hmac.new(FINGERPRINT_KEY, text.encode("utf-8"), sha256).hexdigest()[:12] 9 10@dataclass(frozen=True) 11class TraceRecord: 12 trace_id: str 13 account_id: str 14 status: str 15 input_fingerprint: str 16 prompt_version: str 17 policy_line_ids: tuple[str, ...] 18 19record = TraceRecord( 20 trace_id="trace_acct_10234_01", 21 account_id="acct_10234", 22 status="completed", 23 input_fingerprint=fingerprint("Service-account key is older than policy."), 24 prompt_version="rotation_decision@2", 25 policy_line_ids=("P-7",), 26) 27 28stored = asdict(record) 29assert "Service-account key is older than policy." not in str(stored) 30print(stored["trace_id"], stored["status"], stored["input_fingerprint"])
Redacted trace record
1trace_acct_10234_01 completed c50985cbe665

A keyed fingerprint makes trivial offline guessing harder than a plain hash, but it isn't anonymization. Keep the key in server-side secret storage, choose retention and access rules for your risk, and omit the fingerprint when you don't need it for debugging or evaluation.

Make status durable

A Python dictionary vanishes when a process restarts. SQLite is enough for a local lab because it makes the transition from running to completed tangible without adding infrastructure.

This store writes the lifecycle your UI will show:

persist-status-transitions.py
1import sqlite3 2 3db = sqlite3.connect(":memory:") 4db.execute( 5 """ 6 CREATE TABLE tasks ( 7 trace_id TEXT PRIMARY KEY, 8 account_id TEXT NOT NULL, 9 status TEXT NOT NULL, 10 decision TEXT, 11 error_code TEXT 12 ) 13 """ 14) 15 16db.execute( 17 "INSERT INTO tasks VALUES (?, ?, ?, ?, ?)", 18 ("trace_acct_10234_01", "acct_10234", "running", None, None), 19) 20db.execute( 21 "UPDATE tasks SET status = ?, decision = ? WHERE trace_id = ?", 22 ("completed", "eligible", "trace_acct_10234_01"), 23) 24 25row = db.execute( 26 "SELECT trace_id, status, decision FROM tasks WHERE trace_id = ?", 27 ("trace_acct_10234_01",), 28).fetchone() 29print(row)
Stored lifecycle
1('trace_acct_10234_01', 'completed', 'eligible')
Trace lifecycle for account acct_10234. Browser input is validated, written as a running trace, routed through the P-7 decision boundary, and resolved exactly once into either completed with eligible plus P-7 evidence or failed with provider_timeout. The UI mirrors those terminal states, and neither branch creates a rotation job. Trace lifecycle for account acct_10234. Browser input is validated, written as a running trace, routed through the P-7 decision boundary, and resolved exactly once into either completed with eligible plus P-7 evidence or failed with provider_timeout. The UI mirrors those terminal states, and neither branch creates a rotation job.
Write `running` before the uncertain call, then persist exactly one terminal snapshot. `completed` carries checked evidence; `failed` carries a named error. Both records drive honest UI states without creating a rotation job.

Plug in the model boundary you already built

The previous chapter handled provider credentials, timeouts, structured parsing, policy evidence, and retry rules. This app shouldn't duplicate that code inside a route. Instead, it asks for a small dependency with one method, decide.

A fixture provider makes the contract runnable without a key or network access:

inject-decision-provider.py
1from typing import Literal, Protocol 2 3from pydantic import BaseModel 4 5class RotationReport(BaseModel): 6 account_id: str 7 item: str 8 credential_report: str 9 10class RotationDecision(BaseModel): 11 decision: Literal["eligible", "not_eligible", "needs_review"] 12 rotation_window_days: int 13 source_line_ids: list[str] 14 15class DecisionProvider(Protocol): 16 def decide(self, report: RotationReport) -> RotationDecision: ... 17 18class FixtureProvider: 19 def decide(self, report: RotationReport) -> RotationDecision: 20 assert report.account_id == "acct_10234" 21 return RotationDecision( 22 decision="eligible", 23 rotation_window_days=30, 24 source_line_ids=["P-7"], 25 ) 26 27CREDENTIAL_AGE_DAYS = {"acct_10234": 12} # Trusted account records. 28 29def evaluate_report(report: RotationReport, provider: DecisionProvider) -> RotationDecision: 30 decision = provider.decide(report) 31 if "P-7" not in decision.source_line_ids or decision.rotation_window_days != 30: 32 raise ValueError("decision conflicts with P-7") 33 if decision.decision == "eligible" and CREDENTIAL_AGE_DAYS[report.account_id] > decision.rotation_window_days: 34 raise ValueError("eligible result exceeds P-7 window") 35 return decision 36 37result = evaluate_report( 38 RotationReport( 39 account_id="acct_10234", 40 item="ceramic lamp", 41 credential_report="Service-account key is stale.", 42 ), 43 FixtureProvider(), 44) 45print(result.model_dump())
Injected boundary
1{'decision': 'eligible', 'rotation_window_days': 30, 'source_line_ids': ['P-7']}

Later, a provider-backed implementation can satisfy the same interface. The service still verifies business evidence after parsing. Unit tests remain fast because they inject a fixture or a deliberate failure instead of calling a remote model.

Orchestrate completed and failed requests

The application service sits between route and model boundary. It creates a running trace, calls the provider, verifies evidence, and records exactly one terminal result: completed or failed.

Watch both branches in one local example:

record-success-and-timeout.py
1from dataclasses import dataclass 2 3@dataclass 4class Report: 5 account_id: str 6 credential_report: str 7 8class GoodProvider: 9 def decide(self, report: Report) -> dict: 10 return {"decision": "eligible", "rotation_window_days": 30, "source_line_ids": ["P-7"]} 11 12class TimeoutProvider: 13 def decide(self, report: Report) -> dict: 14 raise TimeoutError("provider deadline exceeded") 15 16CREDENTIAL_AGE_DAYS = {"acct_10234": 12} # Trusted account records. 17 18def process(report: Report, provider) -> dict: 19 trace = {"trace_id": "trace_acct_10234_01", "status": "running"} 20 try: 21 result = provider.decide(report) 22 if "P-7" not in result["source_line_ids"] or result["rotation_window_days"] != 30: 23 raise ValueError("missing evidence") 24 if result["decision"] == "eligible" and CREDENTIAL_AGE_DAYS[report.account_id] > result["rotation_window_days"]: 25 raise ValueError("eligible result exceeds P-7 window") 26 trace.update(status="completed", decision=result["decision"]) 27 except TimeoutError: 28 trace.update(status="failed", error_code="provider_timeout") 29 except ValueError: 30 trace.update(status="failed", error_code="invalid_decision") 31 return trace 32 33report = Report("acct_10234", "Service-account key is stale.") 34print(process(report, GoodProvider())["status"]) 35print(process(report, TimeoutProvider())["error_code"])
Terminal statuses
1completed 2provider_timeout

A timeout isn't an empty answer, and it isn't permission to retry a side effect. The UI can tell the customer to retry the decision while a later rotation job-creation route uses its own idempotency rule.

Put a thin HTTP route in front

FastAPI is useful here because the same declared models validate incoming JSON and shape outgoing JSON.[1] The route below has no prompt text and no API key. It handles HTTP concerns and delegates the decision.

post-rotation-decision.py
1from itertools import count 2from typing import Literal 3 4from fastapi import FastAPI 5from fastapi.testclient import TestClient 6from pydantic import BaseModel, Field 7 8app = FastAPI() 9TASKS: dict[str, dict] = {} 10TRACE_IDS = count(1) 11 12class RotationReport(BaseModel): 13 account_id: str = Field(pattern=r"^acct_\d{5}$") 14 credential_report: str = Field(min_length=10, max_length=500) 15 16class DecisionResponse(BaseModel): 17 trace_id: str 18 status: Literal["completed", "failed"] 19 decision: Literal["eligible", "not_eligible", "needs_review"] | None = None 20 source_line_ids: list[str] = Field(default_factory=list) 21 error_code: str | None = None 22 error: str | None = None 23 24def decide_rotation(report: RotationReport) -> tuple[Literal["eligible"], list[str]]: 25 assert report.account_id == "acct_10234" 26 return "eligible", ["P-7"] 27 28@app.post("/rotation/decide", response_model=DecisionResponse) 29def rotation_decide(report: RotationReport) -> DecisionResponse: 30 trace_id = f"trace_{report.account_id}_{next(TRACE_IDS):02d}" 31 TASKS[trace_id] = {"status": "running"} 32 decision, evidence = decide_rotation(report) 33 TASKS[trace_id] = { 34 "status": "completed", 35 "decision": decision, 36 "source_line_ids": evidence, 37 } 38 return DecisionResponse( 39 trace_id=trace_id, 40 status="completed", 41 decision=decision, 42 source_line_ids=evidence, 43 ) 44 45client = TestClient(app) 46response = client.post( 47 "/rotation/decide", 48 json={"account_id": "acct_10234", "credential_report": "Service-account key is stale."}, 49) 50print(response.status_code, response.json()["decision"], TASKS["trace_acct_10234_01"]["status"])
Route response
1200 eligible completed

The stable response contract is what a browser consumes:

eligible-response.json
1{ 2 "trace_id": "trace_acct_10234_01", 3 "status": "completed", 4 "decision": "eligible", 5 "source_line_ids": ["P-7"], 6 "error_code": null, 7 "error": null 8}

No raw model text enters the page. That matters because model output is untrusted content: OWASP's LLM application risks include prompt injection and improper output handling.[2]

count() keeps this local output predictable while assigning a different trace ID to each request. A production deployment needs opaque, collision-resistant IDs and durable storage because process-local counters and dictionaries don't survive multiple workers or restarts.

Make the screen honest about state

A result card that displays only success will turn latency or failures into a mystery. Even with a synchronous route, the browser experiences four states:

UI stateTriggerMessageAction allowed
IdlePage openedDescribe stale service-account key.Submit report.
RunningRequest submittedChecking policy P-7...Prevent duplicate submit.
CompletedChecked response arrivedEligible under P-7.Review rotation job step.
FailedNamed error arrivedDecision unavailable; retry safely.Retry decision only.

You can unit-test the display rule without running a browser:

render-ui-state.py
1def view_text(state: str, payload: dict | None = None) -> str: 2 if state == "idle": 3 return "Describe stale service-account key" 4 if state == "running": 5 return "Checking policy P-7..." 6 if state == "completed": 7 return f"Eligible under {payload['source_line_ids'][0]}" 8 return "Decision unavailable; retry safely" 9 10success = {"source_line_ids": ["P-7"]} 11print(view_text("running")) 12print(view_text("completed", success)) 13print(view_text("failed", {"error_code": "provider_timeout"}))
Rendered states
1Checking policy P-7... 2Eligible under P-7 3Decision unavailable; retry safely

Notice what the completed state doesn't say: it doesn't claim a rotation job was created or an approval was granted.

Test failures at your boundary

Tests shouldn't ask a hosted model to behave consistently. They should verify your deterministic code: validation, storage, provider-failure handling, and HTTP responses.

This route injects a service. A timeout fixture proves the customer gets an explicit failure while the stored trace remains useful:

test-provider-timeout.py
1from typing import Literal 2 3from fastapi import FastAPI 4from fastapi.testclient import TestClient 5from pydantic import BaseModel, Field 6 7app = FastAPI() 8TASKS: dict[str, dict] = {} 9 10class RotationReport(BaseModel): 11 account_id: str 12 credential_report: str 13 14class DecisionResponse(BaseModel): 15 trace_id: str 16 status: Literal["completed", "failed"] 17 decision: Literal["eligible", "not_eligible", "needs_review"] | None = None 18 source_line_ids: list[str] = Field(default_factory=list) 19 error_code: str | None = None 20 error: str | None = None 21 22class TimeoutService: 23 def decide(self, report: RotationReport) -> dict: 24 raise TimeoutError("provider deadline exceeded") 25 26SERVICE = TimeoutService() 27 28@app.post("/rotation/decide", response_model=DecisionResponse) 29def rotation_decide(report: RotationReport) -> DecisionResponse: 30 trace_id = "trace_acct_10234_timeout" 31 TASKS[trace_id] = {"status": "running"} 32 try: 33 return SERVICE.decide(report) 34 except TimeoutError: 35 TASKS[trace_id] = {"status": "failed", "error_code": "provider_timeout"} 36 error_message = "Decision unavailable; retry safely." 37 return DecisionResponse( 38 trace_id=trace_id, 39 status="failed", 40 error_code="provider_timeout", 41 error=error_message, 42 ) 43 44client = TestClient(app) 45payload = client.post( 46 "/rotation/decide", 47 json={"account_id": "acct_10234", "credential_report": "Service-account key is stale."}, 48).json() 49 50assert payload["status"] == "failed" 51assert TASKS[payload["trace_id"]]["error_code"] == "provider_timeout" 52print(payload)
Timeout test
1{'trace_id': 'trace_acct_10234_timeout', 'status': 'failed', 'decision': None, 'source_line_ids': [], 'error_code': 'provider_timeout', 'error': 'Decision unavailable; retry safely.'}

Input rejection deserves a separate test because it occurs before a model boundary should run:

test-invalid-report.py
1from fastapi import FastAPI 2from fastapi.testclient import TestClient 3from pydantic import BaseModel, Field 4 5app = FastAPI() 6calls_to_model = 0 7 8class RotationReport(BaseModel): 9 account_id: str = Field(pattern=r"^acct_\d{5}$") 10 credential_report: str = Field(min_length=10) 11 12@app.post("/rotation/decide") 13def rotation_decide(report: RotationReport) -> dict: 14 global calls_to_model 15 calls_to_model += 1 16 return {"status": "completed"} 17 18client = TestClient(app) 19response = client.post( 20 "/rotation/decide", 21 json={"account_id": "wrong", "credential_report": "stale key"}, 22) 23 24assert response.status_code == 422 25assert calls_to_model == 0 26print(response.status_code, "model_calls=", calls_to_model)
Input rejection
1422 model_calls= 0

Add a health check and a tiny eval

A deployed service needs a cheap answer to "is the web process alive?" A health endpoint shouldn't call a paid or rate-limited model dependency. It only checks that the application can respond:

health-check-does-not-call-model.py
1from fastapi import FastAPI 2from fastapi.testclient import TestClient 3 4app = FastAPI() 5model_calls = 0 6 7@app.get("/healthz") 8def healthz() -> dict: 9 return {"status": "ok"} 10 11client = TestClient(app) 12response = client.get("/healthz") 13 14assert response.status_code == 200 15assert model_calls == 0 16print(response.json(), "model_calls=", model_calls)
Health response
1{'status': 'ok'} model_calls= 0

This lab's /healthz route is a liveness check: it proves the process responds. A production platform may also need a separate readiness check before sending traffic, for example while the app opens a required local file or database connection. Neither probe should call a paid hosted model.

Liveness doesn't prove the decision still follows policy. A generic classification smoke test isn't enough either. For example, this old-style check runs successfully, but it says nothing about rotation eligibility or policy evidence:

spot-an-inadequate-eval.py
1from dataclasses import dataclass 2 3@dataclass(frozen=True) 4class Ticket: 5 category: str 6 7text = "The app crashes when I upload rotation evidence." 8ticket = Ticket(category="general") 9 10print(text[:20].strip(), "=>", ticket.category)
Insufficient smoke check
1The app crashes when => general

The output is stable, yet it hasn't tested P-7, an eligibility decision, or a stored trace. Keep a tiny eval set with expected evidence and outcomes instead. Start with obvious cases, then add real failures as you encounter them:

run-rotation-decision-eval.py
1def fixture_decide(credential_age_days: int, stale: bool) -> tuple[str, list[str]]: 2 if stale and credential_age_days <= 30: 3 return "eligible", ["P-7"] 4 return "needs_review", ["P-7"] 5 6cases = [ 7 {"name": "stale in window", "days": 12, "stale": True, "expected": "eligible"}, 8 {"name": "stale after window", "days": 45, "stale": True, "expected": "needs_review"}, 9 {"name": "not marked stale", "days": 12, "stale": False, "expected": "needs_review"}, 10] 11 12passed = 0 13for case in cases: 14 decision, evidence = fixture_decide(case["days"], case["stale"]) 15 ok = decision == case["expected"] and evidence == ["P-7"] 16 passed += int(ok) 17 print(case["name"], "=>", decision, "PASS" if ok else "FAIL") 18 19assert passed == len(cases) 20print("passed:", passed, "/", len(cases))
Pre-deploy eval
1stale in window => eligible PASS 2stale after window => needs_review PASS 3not marked stale => needs_review PASS 4passed: 3 / 3

Prepare a deployable artifact

The learning artifact now has a clear contract:

ArtifactEvidence it should contain
app.py/rotation/decide and /healthz routes with typed input/output.
service.pyProvider boundary injection and P-7 verification.
store.pyTrace status updates with redacted credential data.
tests/Invalid input, provider timeout, completed decision, and health check.
evals/rotations.jsonlSmall set of expected decisions and evidence lines.
README.mdStartup command, configuration, sample request, and known limits.

Docker can package a Python web app and its startup command into a repeatable container image, but it doesn't replace tests, secret injection, logging, or rollback planning.[3]

Before exposing the app to users, answer these questions:

CheckEvidence
SecretsProvider credentials enter only server runtime configuration.
StartupA clean checkout starts with one documented command.
Liveness/healthz returns 200 without a model request.
TraceabilityYou can find a failed trace_id and its named error.
BehaviorTiny eval cases pass before deploy.
LimitsREADME states that eligibility isn't rotation-job creation or approval execution.

Production machine-learning systems accumulate debt when model behavior, data dependencies, and serving code aren't tracked together.[4][5] Your first app is small enough to build that habit correctly from the start.

Diagnose a broken version

Suppose a teammate ships this alternative:

text
1The browser calls the hosted model directly with a public key. 2The response text is inserted into a green "Rotation approved" card. 3No trace is written when the provider times out. 4The only test asserts that the page loads.

Write down four defects before reading the table:

DefectConsequenceRepair
Browser holds the model key.Credential can be exposed and abused.Send request to a server route; keep secret server-side.
Raw output becomes approval.Model text can trigger an unsupported customer promise.Validate decision and policy evidence; keep actions separate.
Timeout leaves no record.Team can't distinguish slow provider from broken UI.Persist a failed trace with named error code.
Page-load test only.Core behavior can regress unnoticed.Test invalid input, timeout, completed result, health, and eval cases.

The repair isn't more infrastructure. It's a smaller contract with explicit states and evidence.

Mastery check

Key concepts

  • Smallest useful AI workflow from browser input to checked response
  • FastAPI request and response schemas
  • Injected model boundary rather than provider code in routes
  • Redacted trace records and terminal statuses
  • Failure tests, health checks, and pre-deploy eval cases

Evaluation rubric

  • Foundational: Defines a useful AI task that doesn't hide side effects
  • Foundational: Validates incoming report data and outgoing decision data before display
  • Foundational: Explains idle, running, completed, and failed user states
  • Intermediate: Stores a redacted trace with prompt version, status, evidence, and named errors
  • Intermediate: Tests validation and provider failure without a live model dependency
  • Intermediate: Separates service health from model-behavior evaluation before deployment

Follow-up questions

Common pitfalls

SymptomCauseFix
Customer sees an unsupported rotation promise.UI renders model text as an action.Render only validated decision fields and keep jobs/approvals separate.
Bug report has no reproduction path.Request status, evidence, or prompt version wasn't stored.Write a redacted trace for completed and failed outcomes.
Unit tests are slow or flaky.They call a hosted model.Inject fixture providers and test deterministic app behavior.
Service is "healthy" while decisions regress.Health check was treated as an eval.Run both liveness checks and policy-rotation-job eval cases.
Secrets appear in browser tooling.Provider calls were made client-side.Put the model boundary behind the server route.
Complete the lesson

Mastery Check

Answer every question, then check your score. Score above 75% to mark this lesson complete.

1.A page sends a credential report directly to a hosted model using a browser key, then inserts the generated text into a "Rotation approved" card. Which repair addresses both defects without creating a rotation job as a side effect?
2.An input-validation test posts {"account_id": "wrong", "credential_report": "old"} to a route whose schema requires account_id like acct_ plus five digits and credential_report length at least 10. What outcome should the test assert?
3.Assume P-7 allows stale service-account keys to rotate within 30 days. A parsed model response for account acct_10234 is decision="eligible", rotation_window_days=30, and source_line_ids=["P-7"], but the trusted account record says credential age was 45 days ago. What should the service do before showing a result?
4.After a process restart, the team must investigate a failed request without retaining the raw credential report. Which trace design fits that requirement?
5.During /rotation/decide, the service creates trace_acct_10234_01 with status running. The provider then raises TimeoutError("provider deadline exceeded"). What should be persisted and returned?
6.A rotation-decision request is submitted, remains pending, and then returns provider_timeout. How should the interface behave?
7.Which unit-test setup verifies the application's responsibilities without paying for hosted model calls?
8.Before deployment, the app has a /healthz route. Which check design correctly separates process liveness from model behavior?

8 questions remaining.

Next Step
Continue to The LLM Lifecycle

You have shipped one traceable app request around a model boundary. Next, zoom out to understand how the models behind that boundary are trained, adapted, evaluated, and served.

PreviousCalling LLM APIs in Production
Share this article
XFacebookLinkedInBlueskyRedditHacker NewsEmail
References

FastAPI Documentation.

FastAPI Project. · 2026 · Official documentation

OWASP Top 10 for Large Language Model Applications

OWASP Foundation · 2025

Docker Documentation.

Docker Inc. · 2026 · Official documentation

Hidden Technical Debt in Machine Learning Systems.

Sculley et al. · 2015

Challenges in Deploying Machine Learning: a Survey of Case Studies.

Paleyes, A., Urma, R. G., & Lawrence, N. D. · 2022 · ACM Computing Surveys