Ship one traceable rotation-decision workflow: validated input, model boundary, stored status, clear UI states, failure tests, and deploy checks.
Alex opens a stale service-account key for account acct_10234. Your earlier wrapper can ask a model whether policy line P-7 supports a rotation, but a customer can't use a wrapper by itself. They need a form, a clear result, and a useful error when the call fails.
Now turn that checked model call into a small application. One request travels from browser input to a server route, through the model boundary, into a trace record, and back to the screen. The app decides eligibility only; creating a rotation job remains a separate idempotent action.
A first AI app doesn't need chat history, autonomous tools, or a retrieval system. It needs one user problem with an observable answer:
Given a credential report for account
acct_10234, decide whether policy lineP-7supports a rotation, cite that line in the result, and never create a rotation job as a hidden side effect.
That sentence fixes the app boundary before you choose a framework:
| Layer | Responsibility in this app | What it must not do |
|---|---|---|
| Browser | Collect account ID and credential report; show state. | Call a model with a secret key. |
| API route | Validate request; return stable response fields. | Embed provider-specific prompt logic. |
| Decision service | Call the checked wrapper; translate outcome into status. | Issue a rotation job. |
| Trace store | Save status, evidence, latency, and redacted input fingerprint. | Store unnecessary customer text. |
| Tests | Prove completion, rejection, timeout, and health behavior. | Spend money on model calls. |
The route receives a report and returns a decision. Before a model call exists, write those shapes down as schemas. Pydantic validates Python data against declared fields and constraints; FastAPI can use the same models at the HTTP boundary later.[1]
This first runnable cell rejects missing identifiers and constrains the response to the decisions your UI understands:
1from typing import Literal
2
3from pydantic import BaseModel, Field
4
5class RotationReport(BaseModel):
6 account_id: str = Field(pattern=r"^acct_\d{5}$")
7 item: str = Field(min_length=3, max_length=80)
8 credential_report: str = Field(min_length=10, max_length=500)
9
10class RotationDecision(BaseModel):
11 decision: Literal["eligible", "not_eligible", "needs_review"]
12 rotation_window_days: int = Field(ge=0, le=30)
13 source_line_ids: list[str] = Field(min_length=1)
14
15report = RotationReport(
16 account_id="acct_10234",
17 item="ceramic lamp",
18 credential_report="Service-account key is older than policy.",
19)
20decision = RotationDecision(
21 decision="eligible",
22 rotation_window_days=30,
23 source_line_ids=["P-7"],
24)
25
26print(report.account_id, "=>", decision.decision, decision.source_line_ids)1acct_10234 => eligible ['P-7']The schema proves field shape, not truth. An output can match the schema and still cite the wrong policy or claim a window your policy doesn't allow.
The application therefore performs business-rule checks after parsing. Here, P-7 says stale service-account keys may be rotated within 30 days. The browser doesn't get to claim how old a credential is; the service reads that fact from a trusted account record:
1from typing import Literal
2
3from pydantic import BaseModel, Field
4
5class RotationDecision(BaseModel):
6 decision: Literal["eligible", "not_eligible", "needs_review"]
7 rotation_window_days: int = Field(ge=0, le=90)
8 source_line_ids: list[str] = Field(min_length=1)
9
10def verify_p7(decision: RotationDecision, *, credential_age_days: int) -> RotationDecision:
11 if "P-7" not in decision.source_line_ids:
12 raise ValueError("decision lacks policy evidence")
13 if decision.rotation_window_days != 30:
14 raise ValueError("result conflicts with P-7")
15 if decision.decision == "eligible" and credential_age_days > decision.rotation_window_days:
16 raise ValueError("eligible result exceeds P-7 window")
17 return decision
18
19checked = verify_p7(
20 RotationDecision(
21 decision="eligible",
22 rotation_window_days=30,
23 source_line_ids=["P-7"],
24 ),
25 credential_age_days=12, # Trusted account record for acct_10234.
26)
27
28print("checked:", checked.decision, "under", checked.source_line_ids[0])
29
30try:
31 verify_p7(checked, credential_age_days=45)
32except ValueError as error:
33 print("rejected:", error)1checked: eligible under P-7
2rejected: eligible result exceeds P-7 windowThis guardrail catches contradictions the server can prove from trusted data. It doesn't prove that free-form text describes real damage, so ambiguous reports can still become needs_review.
If an answer looks wrong tomorrow, you need more than the text shown on screen. A trace record is a small stored account of one request: its identifier, status, prompt version, policy evidence, and failure class when something went wrong.
Don't save raw customer descriptions merely because storage is easy. This local record stores a fingerprint of the report, enough to match repeat test inputs without retaining its text:
1from dataclasses import asdict, dataclass
2import hmac
3from hashlib import sha256
4
5FINGERPRINT_KEY = b"local-demo-key" # Demo only; inject a secret in production.
6
7def fingerprint(text: str) -> str:
8 return hmac.new(FINGERPRINT_KEY, text.encode("utf-8"), sha256).hexdigest()[:12]
9
10@dataclass(frozen=True)
11class TraceRecord:
12 trace_id: str
13 account_id: str
14 status: str
15 input_fingerprint: str
16 prompt_version: str
17 policy_line_ids: tuple[str, ...]
18
19record = TraceRecord(
20 trace_id="trace_acct_10234_01",
21 account_id="acct_10234",
22 status="completed",
23 input_fingerprint=fingerprint("Service-account key is older than policy."),
24 prompt_version="rotation_decision@2",
25 policy_line_ids=("P-7",),
26)
27
28stored = asdict(record)
29assert "Service-account key is older than policy." not in str(stored)
30print(stored["trace_id"], stored["status"], stored["input_fingerprint"])1trace_acct_10234_01 completed c50985cbe665A keyed fingerprint makes trivial offline guessing harder than a plain hash, but it isn't anonymization. Keep the key in server-side secret storage, choose retention and access rules for your risk, and omit the fingerprint when you don't need it for debugging or evaluation.
A Python dictionary vanishes when a process restarts. SQLite is enough for a local lab because it makes the transition from running to completed tangible without adding infrastructure.
This store writes the lifecycle your UI will show:
1import sqlite3
2
3db = sqlite3.connect(":memory:")
4db.execute(
5 """
6 CREATE TABLE tasks (
7 trace_id TEXT PRIMARY KEY,
8 account_id TEXT NOT NULL,
9 status TEXT NOT NULL,
10 decision TEXT,
11 error_code TEXT
12 )
13 """
14)
15
16db.execute(
17 "INSERT INTO tasks VALUES (?, ?, ?, ?, ?)",
18 ("trace_acct_10234_01", "acct_10234", "running", None, None),
19)
20db.execute(
21 "UPDATE tasks SET status = ?, decision = ? WHERE trace_id = ?",
22 ("completed", "eligible", "trace_acct_10234_01"),
23)
24
25row = db.execute(
26 "SELECT trace_id, status, decision FROM tasks WHERE trace_id = ?",
27 ("trace_acct_10234_01",),
28).fetchone()
29print(row)1('trace_acct_10234_01', 'completed', 'eligible')
The previous chapter handled provider credentials, timeouts, structured parsing, policy evidence, and retry rules. This app shouldn't duplicate that code inside a route. Instead, it asks for a small dependency with one method, decide.
A fixture provider makes the contract runnable without a key or network access:
1from typing import Literal, Protocol
2
3from pydantic import BaseModel
4
5class RotationReport(BaseModel):
6 account_id: str
7 item: str
8 credential_report: str
9
10class RotationDecision(BaseModel):
11 decision: Literal["eligible", "not_eligible", "needs_review"]
12 rotation_window_days: int
13 source_line_ids: list[str]
14
15class DecisionProvider(Protocol):
16 def decide(self, report: RotationReport) -> RotationDecision: ...
17
18class FixtureProvider:
19 def decide(self, report: RotationReport) -> RotationDecision:
20 assert report.account_id == "acct_10234"
21 return RotationDecision(
22 decision="eligible",
23 rotation_window_days=30,
24 source_line_ids=["P-7"],
25 )
26
27CREDENTIAL_AGE_DAYS = {"acct_10234": 12} # Trusted account records.
28
29def evaluate_report(report: RotationReport, provider: DecisionProvider) -> RotationDecision:
30 decision = provider.decide(report)
31 if "P-7" not in decision.source_line_ids or decision.rotation_window_days != 30:
32 raise ValueError("decision conflicts with P-7")
33 if decision.decision == "eligible" and CREDENTIAL_AGE_DAYS[report.account_id] > decision.rotation_window_days:
34 raise ValueError("eligible result exceeds P-7 window")
35 return decision
36
37result = evaluate_report(
38 RotationReport(
39 account_id="acct_10234",
40 item="ceramic lamp",
41 credential_report="Service-account key is stale.",
42 ),
43 FixtureProvider(),
44)
45print(result.model_dump())1{'decision': 'eligible', 'rotation_window_days': 30, 'source_line_ids': ['P-7']}Later, a provider-backed implementation can satisfy the same interface. The service still verifies business evidence after parsing. Unit tests remain fast because they inject a fixture or a deliberate failure instead of calling a remote model.
The application service sits between route and model boundary. It creates a running trace, calls the provider, verifies evidence, and records exactly one terminal result: completed or failed.
Watch both branches in one local example:
1from dataclasses import dataclass
2
3@dataclass
4class Report:
5 account_id: str
6 credential_report: str
7
8class GoodProvider:
9 def decide(self, report: Report) -> dict:
10 return {"decision": "eligible", "rotation_window_days": 30, "source_line_ids": ["P-7"]}
11
12class TimeoutProvider:
13 def decide(self, report: Report) -> dict:
14 raise TimeoutError("provider deadline exceeded")
15
16CREDENTIAL_AGE_DAYS = {"acct_10234": 12} # Trusted account records.
17
18def process(report: Report, provider) -> dict:
19 trace = {"trace_id": "trace_acct_10234_01", "status": "running"}
20 try:
21 result = provider.decide(report)
22 if "P-7" not in result["source_line_ids"] or result["rotation_window_days"] != 30:
23 raise ValueError("missing evidence")
24 if result["decision"] == "eligible" and CREDENTIAL_AGE_DAYS[report.account_id] > result["rotation_window_days"]:
25 raise ValueError("eligible result exceeds P-7 window")
26 trace.update(status="completed", decision=result["decision"])
27 except TimeoutError:
28 trace.update(status="failed", error_code="provider_timeout")
29 except ValueError:
30 trace.update(status="failed", error_code="invalid_decision")
31 return trace
32
33report = Report("acct_10234", "Service-account key is stale.")
34print(process(report, GoodProvider())["status"])
35print(process(report, TimeoutProvider())["error_code"])1completed
2provider_timeoutA timeout isn't an empty answer, and it isn't permission to retry a side effect. The UI can tell the customer to retry the decision while a later rotation job-creation route uses its own idempotency rule.
FastAPI is useful here because the same declared models validate incoming JSON and shape outgoing JSON.[1] The route below has no prompt text and no API key. It handles HTTP concerns and delegates the decision.
1from itertools import count
2from typing import Literal
3
4from fastapi import FastAPI
5from fastapi.testclient import TestClient
6from pydantic import BaseModel, Field
7
8app = FastAPI()
9TASKS: dict[str, dict] = {}
10TRACE_IDS = count(1)
11
12class RotationReport(BaseModel):
13 account_id: str = Field(pattern=r"^acct_\d{5}$")
14 credential_report: str = Field(min_length=10, max_length=500)
15
16class DecisionResponse(BaseModel):
17 trace_id: str
18 status: Literal["completed", "failed"]
19 decision: Literal["eligible", "not_eligible", "needs_review"] | None = None
20 source_line_ids: list[str] = Field(default_factory=list)
21 error_code: str | None = None
22 error: str | None = None
23
24def decide_rotation(report: RotationReport) -> tuple[Literal["eligible"], list[str]]:
25 assert report.account_id == "acct_10234"
26 return "eligible", ["P-7"]
27
28@app.post("/rotation/decide", response_model=DecisionResponse)
29def rotation_decide(report: RotationReport) -> DecisionResponse:
30 trace_id = f"trace_{report.account_id}_{next(TRACE_IDS):02d}"
31 TASKS[trace_id] = {"status": "running"}
32 decision, evidence = decide_rotation(report)
33 TASKS[trace_id] = {
34 "status": "completed",
35 "decision": decision,
36 "source_line_ids": evidence,
37 }
38 return DecisionResponse(
39 trace_id=trace_id,
40 status="completed",
41 decision=decision,
42 source_line_ids=evidence,
43 )
44
45client = TestClient(app)
46response = client.post(
47 "/rotation/decide",
48 json={"account_id": "acct_10234", "credential_report": "Service-account key is stale."},
49)
50print(response.status_code, response.json()["decision"], TASKS["trace_acct_10234_01"]["status"])1200 eligible completedThe stable response contract is what a browser consumes:
1{
2 "trace_id": "trace_acct_10234_01",
3 "status": "completed",
4 "decision": "eligible",
5 "source_line_ids": ["P-7"],
6 "error_code": null,
7 "error": null
8}No raw model text enters the page. That matters because model output is untrusted content: OWASP's LLM application risks include prompt injection and improper output handling.[2]
count() keeps this local output predictable while assigning a different trace ID to each request. A production deployment needs opaque, collision-resistant IDs and durable storage because process-local counters and dictionaries don't survive multiple workers or restarts.
A result card that displays only success will turn latency or failures into a mystery. Even with a synchronous route, the browser experiences four states:
| UI state | Trigger | Message | Action allowed |
|---|---|---|---|
| Idle | Page opened | Describe stale service-account key. | Submit report. |
| Running | Request submitted | Checking policy P-7... | Prevent duplicate submit. |
| Completed | Checked response arrived | Eligible under P-7. | Review rotation job step. |
| Failed | Named error arrived | Decision unavailable; retry safely. | Retry decision only. |
You can unit-test the display rule without running a browser:
1def view_text(state: str, payload: dict | None = None) -> str:
2 if state == "idle":
3 return "Describe stale service-account key"
4 if state == "running":
5 return "Checking policy P-7..."
6 if state == "completed":
7 return f"Eligible under {payload['source_line_ids'][0]}"
8 return "Decision unavailable; retry safely"
9
10success = {"source_line_ids": ["P-7"]}
11print(view_text("running"))
12print(view_text("completed", success))
13print(view_text("failed", {"error_code": "provider_timeout"}))1Checking policy P-7...
2Eligible under P-7
3Decision unavailable; retry safelyNotice what the completed state doesn't say: it doesn't claim a rotation job was created or an approval was granted.
Tests shouldn't ask a hosted model to behave consistently. They should verify your deterministic code: validation, storage, provider-failure handling, and HTTP responses.
This route injects a service. A timeout fixture proves the customer gets an explicit failure while the stored trace remains useful:
1from typing import Literal
2
3from fastapi import FastAPI
4from fastapi.testclient import TestClient
5from pydantic import BaseModel, Field
6
7app = FastAPI()
8TASKS: dict[str, dict] = {}
9
10class RotationReport(BaseModel):
11 account_id: str
12 credential_report: str
13
14class DecisionResponse(BaseModel):
15 trace_id: str
16 status: Literal["completed", "failed"]
17 decision: Literal["eligible", "not_eligible", "needs_review"] | None = None
18 source_line_ids: list[str] = Field(default_factory=list)
19 error_code: str | None = None
20 error: str | None = None
21
22class TimeoutService:
23 def decide(self, report: RotationReport) -> dict:
24 raise TimeoutError("provider deadline exceeded")
25
26SERVICE = TimeoutService()
27
28@app.post("/rotation/decide", response_model=DecisionResponse)
29def rotation_decide(report: RotationReport) -> DecisionResponse:
30 trace_id = "trace_acct_10234_timeout"
31 TASKS[trace_id] = {"status": "running"}
32 try:
33 return SERVICE.decide(report)
34 except TimeoutError:
35 TASKS[trace_id] = {"status": "failed", "error_code": "provider_timeout"}
36 error_message = "Decision unavailable; retry safely."
37 return DecisionResponse(
38 trace_id=trace_id,
39 status="failed",
40 error_code="provider_timeout",
41 error=error_message,
42 )
43
44client = TestClient(app)
45payload = client.post(
46 "/rotation/decide",
47 json={"account_id": "acct_10234", "credential_report": "Service-account key is stale."},
48).json()
49
50assert payload["status"] == "failed"
51assert TASKS[payload["trace_id"]]["error_code"] == "provider_timeout"
52print(payload)1{'trace_id': 'trace_acct_10234_timeout', 'status': 'failed', 'decision': None, 'source_line_ids': [], 'error_code': 'provider_timeout', 'error': 'Decision unavailable; retry safely.'}Input rejection deserves a separate test because it occurs before a model boundary should run:
1from fastapi import FastAPI
2from fastapi.testclient import TestClient
3from pydantic import BaseModel, Field
4
5app = FastAPI()
6calls_to_model = 0
7
8class RotationReport(BaseModel):
9 account_id: str = Field(pattern=r"^acct_\d{5}$")
10 credential_report: str = Field(min_length=10)
11
12@app.post("/rotation/decide")
13def rotation_decide(report: RotationReport) -> dict:
14 global calls_to_model
15 calls_to_model += 1
16 return {"status": "completed"}
17
18client = TestClient(app)
19response = client.post(
20 "/rotation/decide",
21 json={"account_id": "wrong", "credential_report": "stale key"},
22)
23
24assert response.status_code == 422
25assert calls_to_model == 0
26print(response.status_code, "model_calls=", calls_to_model)1422 model_calls= 0A deployed service needs a cheap answer to "is the web process alive?" A health endpoint shouldn't call a paid or rate-limited model dependency. It only checks that the application can respond:
1from fastapi import FastAPI
2from fastapi.testclient import TestClient
3
4app = FastAPI()
5model_calls = 0
6
7@app.get("/healthz")
8def healthz() -> dict:
9 return {"status": "ok"}
10
11client = TestClient(app)
12response = client.get("/healthz")
13
14assert response.status_code == 200
15assert model_calls == 0
16print(response.json(), "model_calls=", model_calls)1{'status': 'ok'} model_calls= 0This lab's /healthz route is a liveness check: it proves the process responds. A production platform may also need a separate readiness check before sending traffic, for example while the app opens a required local file or database connection. Neither probe should call a paid hosted model.
Liveness doesn't prove the decision still follows policy. A generic classification smoke test isn't enough either. For example, this old-style check runs successfully, but it says nothing about rotation eligibility or policy evidence:
1from dataclasses import dataclass
2
3@dataclass(frozen=True)
4class Ticket:
5 category: str
6
7text = "The app crashes when I upload rotation evidence."
8ticket = Ticket(category="general")
9
10print(text[:20].strip(), "=>", ticket.category)1The app crashes when => generalThe output is stable, yet it hasn't tested P-7, an eligibility decision, or a stored trace. Keep a tiny eval set with expected evidence and outcomes instead. Start with obvious cases, then add real failures as you encounter them:
1def fixture_decide(credential_age_days: int, stale: bool) -> tuple[str, list[str]]:
2 if stale and credential_age_days <= 30:
3 return "eligible", ["P-7"]
4 return "needs_review", ["P-7"]
5
6cases = [
7 {"name": "stale in window", "days": 12, "stale": True, "expected": "eligible"},
8 {"name": "stale after window", "days": 45, "stale": True, "expected": "needs_review"},
9 {"name": "not marked stale", "days": 12, "stale": False, "expected": "needs_review"},
10]
11
12passed = 0
13for case in cases:
14 decision, evidence = fixture_decide(case["days"], case["stale"])
15 ok = decision == case["expected"] and evidence == ["P-7"]
16 passed += int(ok)
17 print(case["name"], "=>", decision, "PASS" if ok else "FAIL")
18
19assert passed == len(cases)
20print("passed:", passed, "/", len(cases))1stale in window => eligible PASS
2stale after window => needs_review PASS
3not marked stale => needs_review PASS
4passed: 3 / 3The learning artifact now has a clear contract:
| Artifact | Evidence it should contain |
|---|---|
app.py | /rotation/decide and /healthz routes with typed input/output. |
service.py | Provider boundary injection and P-7 verification. |
store.py | Trace status updates with redacted credential data. |
tests/ | Invalid input, provider timeout, completed decision, and health check. |
evals/rotations.jsonl | Small set of expected decisions and evidence lines. |
README.md | Startup command, configuration, sample request, and known limits. |
Docker can package a Python web app and its startup command into a repeatable container image, but it doesn't replace tests, secret injection, logging, or rollback planning.[3]
Before exposing the app to users, answer these questions:
| Check | Evidence |
|---|---|
| Secrets | Provider credentials enter only server runtime configuration. |
| Startup | A clean checkout starts with one documented command. |
| Liveness | /healthz returns 200 without a model request. |
| Traceability | You can find a failed trace_id and its named error. |
| Behavior | Tiny eval cases pass before deploy. |
| Limits | README states that eligibility isn't rotation-job creation or approval execution. |
Production machine-learning systems accumulate debt when model behavior, data dependencies, and serving code aren't tracked together.[4][5] Your first app is small enough to build that habit correctly from the start.
Suppose a teammate ships this alternative:
1The browser calls the hosted model directly with a public key.
2The response text is inserted into a green "Rotation approved" card.
3No trace is written when the provider times out.
4The only test asserts that the page loads.Write down four defects before reading the table:
| Defect | Consequence | Repair |
|---|---|---|
| Browser holds the model key. | Credential can be exposed and abused. | Send request to a server route; keep secret server-side. |
| Raw output becomes approval. | Model text can trigger an unsupported customer promise. | Validate decision and policy evidence; keep actions separate. |
| Timeout leaves no record. | Team can't distinguish slow provider from broken UI. | Persist a failed trace with named error code. |
| Page-load test only. | Core behavior can regress unnoticed. | Test invalid input, timeout, completed result, health, and eval cases. |
The repair isn't more infrastructure. It's a smaller contract with explicit states and evidence.
| Symptom | Cause | Fix |
|---|---|---|
| Customer sees an unsupported rotation promise. | UI renders model text as an action. | Render only validated decision fields and keep jobs/approvals separate. |
| Bug report has no reproduction path. | Request status, evidence, or prompt version wasn't stored. | Write a redacted trace for completed and failed outcomes. |
| Unit tests are slow or flaky. | They call a hosted model. | Inject fixture providers and test deterministic app behavior. |
| Service is "healthy" while decisions regress. | Health check was treated as an eval. | Run both liveness checks and policy-rotation-job eval cases. |
| Secrets appear in browser tooling. | Provider calls were made client-side. | Put the model boundary behind the server route. |
Answer every question, then check your score. Score above 75% to mark this lesson complete.
8 questions remaining.
FastAPI Documentation.
FastAPI Project. · 2026 · Official documentation
OWASP Top 10 for Large Language Model Applications
OWASP Foundation · 2025
Docker Documentation.
Docker Inc. · 2026 · Official documentation
Hidden Technical Debt in Machine Learning Systems.
Sculley et al. · 2015
Challenges in Deploying Machine Learning: a Survey of Case Studies.
Paleyes, A., Urma, R. G., & Lawrence, N. D. · 2022 · ACM Computing Surveys