Turn an audited cost contract into a model gateway that preserves data, schema, review, and budget requirements across routing and fallback.
Your CodeAssist developer assistant can price each generated answer. The budget contract says an evaluated answer may cost at most 0.004570 USD and must still keep its evidence. A production app faces the harder routing question: which large language model (LLM) lane may answer each request, and what happens when that lane is unavailable?
A model gateway makes that decision. The app submits one request and one set of requirements. The gateway selects a compatible lane, invokes a model adapter, records why it chose that lane, and either finds a compatible fallback or escalates safely after failure.
The important word is compatible. A private, high-risk break-glass access request needs privacy, human review, cited structured output, and a budget ceiling together. A small gateway can enforce that full contract so code can't discard one requirement while satisfying another.
In the previous lesson, the cost ledger measured model generation after cache decisions had already been made. The gateway consumes that budget contract alongside product and safety requirements. It doesn't decide whether a stored semantic-cache answer is trustworthy; it decides where a request that still requires generation may run.
Use three terms precisely:
| Term | Job | Example here |
|---|---|---|
| Adapter | Translates a stable application request into one provider's API shape | Send messages and parse a structured reply |
| Lane | Names a model path with measured capabilities and cost | local-private-cited-review |
| Gateway | Compiles requirements, filters lanes, selects or falls back, and logs the decision | Reject a cheap lane that drops citations |
An OpenAI-shaped adapter isn't proof that two lanes have the same contract. Anthropic's OpenAI SDK compatibility documentation describes that path as a comparison/testing option rather than its usual production path, and documents limitations including unsupported prompt caching and an ignored strict tool-calling parameter. Native structured outputs are required for guaranteed schema conformance there.[1] A gateway therefore records capabilities explicitly instead of assuming interface similarity means behavior similarity.
The first lab cell defines the artifact arriving from cost engineering, the request facts the app knows, the contract the gateway must compile, and a registry of candidate lanes. The dollar figures are teaching fixtures measured before promotion, not live provider prices. They are preflight evidence, not the final bill: after generation, the gateway still records provider-reported usage and sends it back through the cost ledger.
1from dataclasses import dataclass
2from decimal import Decimal
3from enum import Enum
4import json
5
6COST_RELEASE_ID = "assistant-release-2026-05-cost-v1"
7GATEWAY_POLICY_ID = "gateway-policy-v1"
8MAX_GENERATED_ANSWER_USD = Decimal("0.004570")
9MAX_GENERATION_ATTEMPTS = 2
10REQUEST_DEADLINE_MS = 2_500
11
12class DataClass(str, Enum):
13 PUBLIC = "public"
14 TENANT_PRIVATE = "tenant_private"
15
16@dataclass(frozen=True)
17class GatewayRequest:
18 request_id: str
19 task: str
20 data_class: DataClass
21 context_tokens: int
22 risk_amount_cents: int = 0
23 requires_citations: bool = False
24 requires_schema: bool = True
25
26@dataclass(frozen=True)
27class RouteContract:
28 request_id: str
29 data_class: DataClass
30 context_tokens: int
31 needs_citations: bool
32 needs_schema: bool
33 needs_human_review: bool
34 max_answer_cost_usd: Decimal
35
36@dataclass(frozen=True)
37class Lane:
38 name: str
39 provider: str
40 allowed_data_classes: frozenset[DataClass]
41 max_context_tokens: int
42 supports_schema: bool
43 supports_citations: bool
44 supports_human_review: bool
45 evaluated_answer_cost_usd: Decimal
46 expected_latency_ms: int
47
48LANES = [
49 Lane(
50 "fast-public-json", "hosted-fast", frozenset({DataClass.PUBLIC}),
51 16_000, True, False, False, Decimal("0.001100"), 260,
52 ),
53 Lane(
54 "public-cited-review", "hosted-cited", frozenset({DataClass.PUBLIC}),
55 64_000, True, True, True, Decimal("0.003800"), 860,
56 ),
57 Lane(
58 "primary-private-cited-review", "hosted-private", frozenset({DataClass.TENANT_PRIVATE}),
59 64_000, True, True, True, Decimal("0.004200"), 940,
60 ),
61 Lane(
62 "local-private-cited-review", "local-private", frozenset({DataClass.TENANT_PRIVATE}),
63 32_000, True, True, True, Decimal("0.004500"), 1_100,
64 ),
65 Lane(
66 "regional-private-cited-review", "regional-private", frozenset({DataClass.TENANT_PRIVATE}),
67 64_000, True, True, True, Decimal("0.004560"), 1_300,
68 ),
69 Lane(
70 "cheap-text-fallback", "hosted-cheap", frozenset({DataClass.PUBLIC, DataClass.TENANT_PRIVATE}),
71 32_000, False, False, False, Decimal("0.001500"), 420,
72 ),
73]
74
75print(f"cost_release={COST_RELEASE_ID}")
76print(f"gateway_policy={GATEWAY_POLICY_ID}")
77print(f"max_generated_answer_usd={MAX_GENERATED_ANSWER_USD}")
78print(f"max_generation_attempts={MAX_GENERATION_ATTEMPTS}")
79print(f"request_deadline_ms={REQUEST_DEADLINE_MS}")
80print(f"registered_lanes={len(LANES)}")1cost_release=assistant-release-2026-05-cost-v1
2gateway_policy=gateway-policy-v1
3max_generated_answer_usd=0.004570
4max_generation_attempts=2
5request_deadline_ms=2500
6registered_lanes=6A tempting router is a stack of early returns:
1if request.data_class == "tenant_private":
2 return "local-private"
3if request.risk_amount_cents >= 50_000:
4 return "human-review"For a private break-glass access request with 900 USD of risk, that code returns from the first branch and silently drops the second requirement. The safe sequence is different:
The next cell compiles the requirements. A break-glass access request at or above the lesson's policy threshold needs human review; private incident notes independently require a private data lane. Neither check erases the other.
1HIGH_RISK_ACCESS_CENTS = 50_000
2
3def compile_contract(request: GatewayRequest) -> RouteContract:
4 return RouteContract(
5 request_id=request.request_id,
6 data_class=request.data_class,
7 context_tokens=request.context_tokens,
8 needs_citations=request.requires_citations,
9 needs_schema=request.requires_schema,
10 needs_human_review=request.risk_amount_cents >= HIGH_RISK_ACCESS_CENTS,
11 max_answer_cost_usd=MAX_GENERATED_ANSWER_USD,
12 )
13
14private_access = GatewayRequest(
15 request_id="access-R900",
16 task="prod_access_decision",
17 data_class=DataClass.TENANT_PRIVATE,
18 context_tokens=24_000,
19 risk_amount_cents=90_000,
20 requires_citations=True,
21)
22private_contract = compile_contract(private_access)
23
24print(f"request={private_contract.request_id}")
25print(f"data_class={private_contract.data_class.value}")
26print(f"needs_citations={private_contract.needs_citations}")
27print(f"needs_human_review={private_contract.needs_human_review}")
28print(f"max_answer_cost_usd={private_contract.max_answer_cost_usd}")1request=access-R900
2data_class=tenant_private
3needs_citations=True
4needs_human_review=True
5max_answer_cost_usd=0.004570Now the filter can explain every rejection. This is more useful than returning a Boolean: if no lane survives, an operator needs to know whether the missing capability is private-data handling, context length, citations, review, or budget.
1def contract_violations(lane: Lane, contract: RouteContract) -> list[str]:
2 failures: list[str] = []
3 if contract.data_class not in lane.allowed_data_classes:
4 failures.append("data_boundary")
5 if lane.max_context_tokens < contract.context_tokens:
6 failures.append("context_length")
7 if contract.needs_schema and not lane.supports_schema:
8 failures.append("schema")
9 if contract.needs_citations and not lane.supports_citations:
10 failures.append("citations")
11 if contract.needs_human_review and not lane.supports_human_review:
12 failures.append("human_review")
13 if lane.evaluated_answer_cost_usd > contract.max_answer_cost_usd:
14 failures.append("budget")
15 return failures
16
17def compatible_lanes(contract: RouteContract) -> list[Lane]:
18 return [lane for lane in LANES if not contract_violations(lane, contract)]
19
20for lane in LANES:
21 failures = contract_violations(lane, private_contract)
22 status = "compatible" if not failures else "reject=" + ",".join(failures)
23 print(f"{lane.name}: {status}")1fast-public-json: reject=data_boundary,context_length,citations,human_review
2public-cited-review: reject=data_boundary
3primary-private-cited-review: compatible
4local-private-cited-review: compatible
5regional-private-cited-review: compatible
6cheap-text-fallback: reject=schema,citations,human_reviewThree private cited review lanes survive. The gateway can now choose between them by a soft preference without weakening a hard requirement. Here it chooses the lower measured answer cost, then latency as a deterministic tie-breaker.
1@dataclass(frozen=True)
2class RouteDecision:
3 request_id: str
4 lane: str | None
5 action: str
6 reasons: tuple[str, ...]
7
8def choose_primary(contract: RouteContract) -> RouteDecision:
9 feasible = compatible_lanes(contract)
10 if not feasible:
11 return RouteDecision(contract.request_id, None, "escalate", ("no_compatible_lane",))
12 lane = min(feasible, key=lambda candidate: (candidate.evaluated_answer_cost_usd, candidate.expected_latency_ms))
13 reasons = (
14 f"data={contract.data_class.value}",
15 f"review={str(contract.needs_human_review).lower()}",
16 f"citations={str(contract.needs_citations).lower()}",
17 f"budget<={contract.max_answer_cost_usd}",
18 )
19 return RouteDecision(contract.request_id, lane.name, "generate", reasons)
20
21status_request = GatewayRequest(
22 "docs-Q102", "deploy_policy", DataClass.PUBLIC, 2_000,
23 requires_citations=False,
24)
25
26for request in (status_request, private_access):
27 decision = choose_primary(compile_contract(request))
28 print(f"{decision.request_id} -> {decision.lane} action={decision.action}")
29 print(" " + " ".join(decision.reasons))1docs-Q102 -> fast-public-json action=generate
2 data=public review=false citations=false budget<=0.004570
3access-R900 -> primary-private-cited-review action=generate
4 data=tenant_private review=true citations=true budget<=0.004570The filter above is deliberately rule-based. Data boundaries, schema requirements, review requirements, and approved cost ceilings are policy constraints. Letting a probabilistic classifier relax any of them would make a plausible prediction more important than authorization.
Learned routing becomes useful inside the feasible set. RouteLLM trains routers from preference data to choose between a stronger and a weaker LLM; its paper reports cost reductions greater than two times in some evaluated settings without a reported response-quality loss on those settings.[2] That result supports experimenting with quality-cost routing. It doesn't turn a learned router into a privacy or authorization check.
This flow diagram shows the control boundary. A soft router may rank already-compatible candidates; a failure re-enters the same contract filter rather than jumping directly to a convenient provider.
A fallback is a second route attempt after the first attempt fails. It isn't permission to drop requirements. If a cited access answer must be private, structured, reviewable, and within budget before an outage, it must remain so during an outage.
Gateway frameworks expose fallback mechanisms, but configuration doesn't prove semantic compatibility. For example, LiteLLM documents ordered fallbacks and separate regular, context-window, and content-policy fallback settings.[3] Those mechanisms decide when another lane may be attempted. Your contract still decides which replacement lane is acceptable.
Failure type matters before the gateway even looks for a candidate. A rate limit or timeout before any output is shown can be retried against a compatible lane. Once a model has streamed visible text, silently switching models can produce a contradictory continuation. The safer product behavior is to stop that answer and ask the user to retry or hand it to an operator.
This fixture routes answer generation only. It doesn't replay side-effecting tool calls. A write action such as granting break-glass access needs its own authorization boundary and idempotency key; a gateway must not silently execute it again because generation failed. A context rejection also stops here: the filter already checked context capacity, so a provider rejection signals stale registry data or another mismatch that needs investigation rather than a hidden downgrade.
The next cell classifies failures by whether a transparent fallback is still allowed.
1class FailureKind(str, Enum):
2 RATE_LIMIT_BEFORE_OUTPUT = "rate_limit_before_output"
3 TIMEOUT_BEFORE_OUTPUT = "timeout_before_output"
4 CONTEXT_REJECTED = "context_rejected"
5 MID_STREAM_DROP = "mid_stream_drop"
6 SCHEMA_INVALID = "schema_invalid"
7
8def may_retry_transparently(failure: FailureKind) -> bool:
9 return failure in {
10 FailureKind.RATE_LIMIT_BEFORE_OUTPUT,
11 FailureKind.TIMEOUT_BEFORE_OUTPUT,
12 }
13
14for failure in FailureKind:
15 action = "try_compatible_fallback" if may_retry_transparently(failure) else "stop_or_escalate"
16 print(f"{failure.value}: {action}")1rate_limit_before_output: try_compatible_fallback
2timeout_before_output: try_compatible_fallback
3context_rejected: stop_or_escalate
4mid_stream_drop: stop_or_escalate
5schema_invalid: stop_or_escalateFor the private break-glass access request, suppose the lower-cost primary lane times out before emitting output. The fallback selector excludes the failed lane, applies the same contract filter, and then chooses from what remains.
1def choose_fallback(contract: RouteContract, failed_lane: str) -> RouteDecision:
2 candidates = [
3 lane for lane in compatible_lanes(contract)
4 if lane.name != failed_lane
5 ]
6 if not candidates:
7 return RouteDecision(contract.request_id, None, "escalate", ("fallback_contract_unmet",))
8 lane = min(candidates, key=lambda candidate: (candidate.evaluated_answer_cost_usd, candidate.expected_latency_ms))
9 return RouteDecision(
10 contract.request_id,
11 lane.name,
12 "fallback_generate",
13 (f"primary_failed={failed_lane}", "contract_preserved=true"),
14 )
15
16primary = choose_primary(private_contract)
17fallback = choose_fallback(private_contract, primary.lane or "")
18print(f"primary={primary.lane}")
19print(f"fallback={fallback.lane} action={fallback.action}")
20print("cheap_text_rejected=" + ",".join(contract_violations(LANES[-1], private_contract)))1primary=primary-private-cited-review
2fallback=local-private-cited-review action=fallback_generate
3cheap_text_rejected=schema,citations,human_reviewBoth candidate lanes fit the per-answer admission ceiling, but a failed primary attempt can still consume billable provider work before the timeout reaches the gateway. That means max_answer_cost_usd isn't a guarantee on total request spend during an incident. Production policy also needs a retry budget: a maximum attempt count, one wall-clock deadline shared across attempts, and post-call pricing for usage reported by every provider attempt. This lab allows at most one fallback (MAX_GENERATION_ATTEMPTS = 2) and carries a 2.5-second request deadline into the exported policy. The simulator doesn't advance a clock, so the downstream runtime must enforce that deadline.
Fallback can make an outage worse if every request keeps trying an unhealthy primary before moving to the backup. A circuit breaker tracks recent failures for an upstream target. Once that target reaches a failure threshold, its circuit opens and new requests skip it during a cooldown period. After cooldown, one probe is allowed through; success closes the circuit, while failure opens it again.
This compact implementation models those three states: closed, open, and half_open. Its key is provider because each fixture provider names one upstream target. A production registry may need a provider, endpoint, deployment, or lane key depending on failure scope, plus bounded retry attempts, backoff, and a deadline for the half-open probe.
1class CircuitStatus(str, Enum):
2 CLOSED = "closed"
3 OPEN = "open"
4 HALF_OPEN = "half_open"
5
6@dataclass
7class CircuitState:
8 status: CircuitStatus = CircuitStatus.CLOSED
9 failures: int = 0
10 opened_until: float = 0.0
11
12class CircuitBreaker:
13 def __init__(self, threshold: int = 2, cooldown_seconds: float = 10.0) -> None:
14 self.threshold = threshold
15 self.cooldown_seconds = cooldown_seconds
16 self.states: dict[str, CircuitState] = {}
17
18 def permit(self, provider: str, now: float) -> bool:
19 state = self.states.setdefault(provider, CircuitState())
20 if state.status == CircuitStatus.OPEN:
21 if now < state.opened_until:
22 return False
23 state.status = CircuitStatus.HALF_OPEN
24 return True
25 return state.status == CircuitStatus.CLOSED
26
27 def failure(self, provider: str, now: float) -> None:
28 state = self.states.setdefault(provider, CircuitState())
29 state.failures += 1
30 if state.status == CircuitStatus.HALF_OPEN or state.failures >= self.threshold:
31 state.status = CircuitStatus.OPEN
32 state.opened_until = now + self.cooldown_seconds
33
34 def success(self, provider: str) -> None:
35 self.states[provider] = CircuitState()
36
37breaker = CircuitBreaker()
38breaker.failure("hosted-private", 100.0)
39breaker.failure("hosted-private", 101.0)
40print(f"during_cooldown={breaker.permit('hosted-private', 105.0)}")
41print(f"probe_after_cooldown={breaker.permit('hosted-private', 112.0)}")
42breaker.success("hosted-private")
43print(f"after_success={breaker.states['hosted-private'].status.value}")1during_cooldown=False
2probe_after_cooldown=True
3after_success=closedRouting decisions are operational evidence. An audit row should carry the gateway-policy identifier, inherited cost-release identifier, selected lane, fallback event, evaluated cost, and every hard requirement that caused selection. That record lets a later incident review distinguish a model-quality problem from a bad policy decision.
The next cell simulates several attempts for the private break-glass access request. It consults circuit state before attempting the primary lane and records failures when an attempt breaks. A timeout before output uses the compatible local fallback. A mid-stream drop escalates because changing generators after output begins would hide an inconsistent user experience. The final attempt opens both earlier circuits and proves that the gateway scans ranked contract-compatible fallbacks until it finds a permitted regional lane.
1@dataclass(frozen=True)
2class RouteEvent:
3 request_id: str
4 policy_id: str
5 cost_release_id: str
6 action: str
7 lane: str | None
8 contract_summary: str
9 reason: str
10 evaluated_cost_usd: Decimal
11
12def lane_by_name(name: str) -> Lane:
13 return next(lane for lane in LANES if lane.name == name)
14
15def summarize_contract(contract: RouteContract) -> str:
16 return (
17 f"data={contract.data_class.value};"
18 f"schema={str(contract.needs_schema).lower()};"
19 f"citations={str(contract.needs_citations).lower()};"
20 f"review={str(contract.needs_human_review).lower()};"
21 f"budget<={contract.max_answer_cost_usd}"
22 )
23
24def permitted_fallback_lane(contract: RouteContract, failed_lane: str, now: float) -> Lane | None:
25 candidates = sorted(
26 (lane for lane in compatible_lanes(contract) if lane.name != failed_lane),
27 key=lambda candidate: (candidate.evaluated_answer_cost_usd, candidate.expected_latency_ms),
28 )
29 for lane in candidates:
30 if breaker.permit(lane.provider, now):
31 return lane
32 return None
33
34def execute_with_failure(request: GatewayRequest, failure: FailureKind | None, now: float = 200.0) -> RouteEvent:
35 contract = compile_contract(request)
36 contract_summary = summarize_contract(contract)
37 primary_decision = choose_primary(contract)
38 if primary_decision.lane is None:
39 return RouteEvent(request.request_id, GATEWAY_POLICY_ID, COST_RELEASE_ID, "escalate", None, contract_summary, "no_primary_lane", Decimal("0"))
40 primary_lane = lane_by_name(primary_decision.lane)
41 if not breaker.permit(primary_lane.provider, now):
42 fallback_lane = permitted_fallback_lane(contract, primary_lane.name, now)
43 if fallback_lane is None:
44 return RouteEvent(request.request_id, GATEWAY_POLICY_ID, COST_RELEASE_ID, "escalate", None, contract_summary, "no_healthy_safe_fallback", Decimal("0"))
45 breaker.success(fallback_lane.provider)
46 return RouteEvent(
47 request.request_id, GATEWAY_POLICY_ID, COST_RELEASE_ID, "served_fallback", fallback_lane.name,
48 contract_summary, "primary_circuit_open;contract_preserved", fallback_lane.evaluated_answer_cost_usd,
49 )
50 if failure is None:
51 breaker.success(primary_lane.provider)
52 return RouteEvent(
53 request.request_id, GATEWAY_POLICY_ID, COST_RELEASE_ID, "served", primary_lane.name,
54 contract_summary, "primary_contract_match", primary_lane.evaluated_answer_cost_usd,
55 )
56 breaker.failure(primary_lane.provider, now)
57 if not may_retry_transparently(failure):
58 return RouteEvent(
59 request.request_id, GATEWAY_POLICY_ID, COST_RELEASE_ID, "escalate", None,
60 contract_summary, f"primary_{failure.value}", Decimal("0"),
61 )
62 fallback_lane = permitted_fallback_lane(contract, primary_lane.name, now)
63 if fallback_lane is None:
64 return RouteEvent(request.request_id, GATEWAY_POLICY_ID, COST_RELEASE_ID, "escalate", None, contract_summary, "no_healthy_safe_fallback", Decimal("0"))
65 breaker.success(fallback_lane.provider)
66 return RouteEvent(
67 request.request_id, GATEWAY_POLICY_ID, COST_RELEASE_ID, "served_fallback", fallback_lane.name,
68 contract_summary, f"primary_{failure.value};contract_preserved", fallback_lane.evaluated_answer_cost_usd,
69 )
70
71breaker = CircuitBreaker()
72before_output = execute_with_failure(private_access, FailureKind.TIMEOUT_BEFORE_OUTPUT, now=200.0)
73mid_stream = execute_with_failure(private_access, FailureKind.MID_STREAM_DROP, now=201.0)
74breaker.failure("local-private", 201.0)
75breaker.failure("local-private", 202.0)
76skip_open_local = execute_with_failure(private_access, None, now=203.0)
77
78print(f"before_output={before_output.action} lane={before_output.lane} reason={before_output.reason}")
79print(f"audit_policy={before_output.policy_id} cost_release={before_output.cost_release_id}")
80print(f"audit_contract={before_output.contract_summary}")
81print(f"mid_stream={mid_stream.action} lane={mid_stream.lane} reason={mid_stream.reason}")
82print(f"circuit_after_failures={breaker.states['hosted-private'].status.value}")
83print(f"open_local_skipped={skip_open_local.action} lane={skip_open_local.lane} reason={skip_open_local.reason}")1before_output=served_fallback lane=local-private-cited-review reason=primary_timeout_before_output;contract_preserved
2audit_policy=gateway-policy-v1 cost_release=assistant-release-2026-05-cost-v1
3audit_contract=data=tenant_private;schema=true;citations=true;review=true;budget<=0.004570
4mid_stream=escalate lane=None reason=primary_mid_stream_drop
5circuit_after_failures=open
6open_local_skipped=served_fallback lane=regional-private-cited-review reason=primary_circuit_open;contract_preservedA gateway policy shouldn't be promoted because three hand-picked examples look sensible. Replay an evaluated set that represents low-risk traffic, high-risk cases, private data, outage paths, and unsupported contracts. For every generated response, check that selected lane preserves the contract and stays inside the inherited budget.
This small replay adds a request with a private context too large for either approved private lane. It also opens the primary circuit on a simulated timeout and confirms that the next request uses a contract-preserving fallback during cooldown. An honest gateway escalates unsupported work rather than truncating evidence or routing private incident notes somewhere unapproved.
1too_large_private_request = GatewayRequest(
2 "access-long-context", "prod_access_decision", DataClass.TENANT_PRIVATE, 70_000,
3 risk_amount_cents=90_000, requires_citations=True,
4)
5
6breaker = CircuitBreaker(threshold=1, cooldown_seconds=10.0)
7replay_cases = [
8 (status_request, None),
9 (private_access, None),
10 (private_access, FailureKind.TIMEOUT_BEFORE_OUTPUT),
11 (private_access, None),
12 (too_large_private_request, None),
13]
14events = [
15 execute_with_failure(request, failure, now=300.0 + index)
16 for index, (request, failure) in enumerate(replay_cases)
17]
18
19served = []
20for (request, _), event in zip(replay_cases, events):
21 if event.action.startswith("served"):
22 lane = lane_by_name(event.lane or "")
23 assert not contract_violations(lane, compile_contract(request))
24 assert event.evaluated_cost_usd <= MAX_GENERATED_ANSWER_USD
25 served.append(event)
26
27for event in events:
28 print(f"{event.request_id}: {event.action} lane={event.lane}")
29print(f"generated_with_contract={len(served)}/{len(events)}")
30print(f"unsafe_generation_events=0")1docs-Q102: served lane=fast-public-json
2access-R900: served lane=primary-private-cited-review
3access-R900: served_fallback lane=local-private-cited-review
4access-R900: served_fallback lane=local-private-cited-review
5access-long-context: escalate lane=None
6generated_with_contract=4/5
7unsafe_generation_events=0The replay isn't an answer-quality evaluation by itself. Its cost assertion checks preflight lane evidence, not the final bill. After generation, record provider-reported usage and price it with the inherited rate card in the ledger from the previous lesson. Before promotion, also join these route events with the answer-correctness and citation checks from the evaluation lessons. A routing policy can satisfy the mechanical contract and still send too many difficult public questions to a weak but formally compatible lane.
Only after hard requirements pass should you test learned or heuristic ranking for quality and cost. Useful comparison metrics are generated-answer correctness, citation correctness, human-escalation rate, fallback success rate, p95 latency, and actual spend by lane.
The next lesson assembles a complete developer assistant. It shouldn't re-create routing rules inside orchestration code. The gateway should export a small, versioned policy artifact that states what has been approved and what must escalate.
The final cell emits the artifact from the lesson: an inherited cost contract, two demonstrated routes, and failure behavior that refuses unsafe degradation.
1policy_artifact = {
2 "policy_id": GATEWAY_POLICY_ID,
3 "cost_release_id": COST_RELEASE_ID,
4 "max_generated_answer_usd": str(MAX_GENERATED_ANSWER_USD),
5 "retry_limits": {
6 "max_generation_attempts": MAX_GENERATION_ATTEMPTS,
7 "request_deadline_ms": REQUEST_DEADLINE_MS,
8 },
9 "approved_examples": {
10 "public_deploy_policy": "fast-public-json",
11 "private_high_risk_access": "primary-private-cited-review",
12 "private_high_risk_access_fallback": "local-private-cited-review",
13 },
14 "escalate_when": [
15 "no lane preserves all contract fields",
16 "failure occurs after visible output begins",
17 "approved private context capacity is exceeded",
18 "retry attempts or request deadline are exhausted",
19 ],
20}
21
22print(json.dumps(policy_artifact, indent=2))1{
2 "policy_id": "gateway-policy-v1",
3 "cost_release_id": "assistant-release-2026-05-cost-v1",
4 "max_generated_answer_usd": "0.004570",
5 "retry_limits": {
6 "max_generation_attempts": 2,
7 "request_deadline_ms": 2500
8 },
9 "approved_examples": {
10 "public_deploy_policy": "fast-public-json",
11 "private_high_risk_access": "primary-private-cited-review",
12 "private_high_risk_access_fallback": "local-private-cited-review"
13 },
14 "escalate_when": [
15 "no lane preserves all contract fields",
16 "failure occurs after visible output begins",
17 "approved private context capacity is exceeded",
18 "retry attempts or request deadline are exhausted"
19 ]
20}Answer every question, then check your score. Score above 75% to mark this lesson complete.
10 questions remaining.