LeetLLM
LearnFeaturesBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Features
  • Blog

Legal

  • Terms of Service
  • Privacy Policy

© 2026 LeetLLM. All rights reserved.

All Topics
Your Progress
0%

0 of 155 articles completed

🛠️Computing Foundations0/6
NumPy and Tensor ShapesCUDA for ML TrainingMPS & Metal for ML on MacData Structures for AISQL and Data ModelingAlgorithms for ML Engineers
📊Math & Statistics0/8
Gradients and BackpropVectors, Matrices & TensorsLinear Algebra for MLAdam, Momentum, SchedulersProbability for Machine LearningStatistics and UncertaintyDistributions and SamplingHypothesis Tests, Intervals, and pass@k
📚Preparation & Prerequisites0/13
Neural Networks from ScratchCNNs from ScratchTraining & BackpropagationSoftmax, Cross-Entropy & OptimizationRNNs, LSTMs, GRUs, and Sequence ModelingAutoencoders and VAEsThe Transformer Architecture End-to-EndLanguage Modeling & Next TokensFrom GPT to Modern LLMsPrompt Engineering FundamentalsCalling LLM APIs in ProductionFirst AI App End-to-EndThe LLM Lifecycle
🧮ML Algorithms & Evaluation0/11
Linear Regression from ScratchLogistic Regression and MetricsDecision Trees, Forests, and BoostingReinforcement Learning BasicsValidation and LeakageClustering and PCACore Retrieval AlgorithmsDecoding AlgorithmsExperiment Design and A/B TestingPyTorch Training LoopsDataset Pipelines and Data Quality
📦Production ML Systems0/6
Feature Engineering for Production MLBatch and Streaming Feature PipelinesGradient Boosted Trees in ProductionRanking and Recommendation SystemsForecasting and Anomaly DetectionMonitoring Predictive Models
🧪Core LLM Foundations0/8
The Bitter Lesson & ComputeBPE, WordPiece, and SentencePieceStatic to Contextual EmbeddingsPerplexity & Model EvaluationFile Ingestion for AIChunking StrategiesLLM Benchmarks & LimitationsInstruction Tuning & Chat Templates
🧰Applied LLM Engineering0/23
Dimensionality Reduction for EmbeddingsCoT, ToT & Self-Consistency PromptingFunction Calling & Tool UseMCP & Tool Protocol StandardsPrompt Injection DefenseResponsible AI GovernanceData Labeling and Human FeedbackEvaluating AI AgentsProduction RAG PipelinesHybrid Search: Dense + SparseReranking and Cross-Encoders for RAGRAG Evaluation for Reliable AnswersLLM-as-a-Judge EvaluationBias & Fairness in LLMsHallucination Detection & MitigationLLM Observability & MonitoringExperiment Tracking with MLflow and W&BMixed Precision TrainingModel Versioning & DeploymentSemantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsModel Gateways, Routing, and FallbacksDesign an Automated Support Agent
🎓Portfolio Capstones0/9
Capstone: Delivery ETA PredictionCapstone: Product RankingCapstone: Demand ForecastingCapstone: Image Damage ClassifierCapstone: Production ML PipelineCapstone: Document QACapstone: Eval DashboardCapstone: Fine-Tuned ClassifierCapstone: Production Agent
🧠Transformer Deep Dives0/8
Sentence Embeddings & Contrastive LossEmbedding Similarity & QuantizationScaled Dot-Product AttentionVision Transformers and Image EncodersPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNMechanistic InterpretabilityDecoding Strategies: Greedy to Nucleus
🧬Advanced Training & Adaptation0/16
Scaling Laws & Compute-Optimal TrainingPre-training Data at ScaleBuild GPT from Scratch LabContinued Pretraining for Domain ShiftSynthetic Data PipelinesSupervised Fine-Tuning PipelineDistributed Training: FSDP & ZeROLoRA & Parameter-Efficient TuningReward Modeling from Preference DataRLHF & DPO AlignmentConstitutional AI & Red TeamingRLVR & Verifiable RewardsKnowledge Distillation for LLMsModel Merging and Weight InterpolationPrompt Optimization with DSPyRecursive Language Models (RLM)
🤖Advanced Agents & Retrieval0/14
Vector DB Internals: HNSW & IVFAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access ControlStructured Output GenerationReAct & Plan-and-ExecuteGuardrails & Safety FiltersCode Generation & SandboxingComputer-Use / GUI / Browser AgentsHuman-in-the-Loop Agent ArchitectureAI Coding Workflow with AgentsAgent Memory & PersistenceAgent Failure & RecoveryMulti-Agent Orchestration
⚡Inference & Production Scale0/20
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionPrefix Caching and Prompt CachingFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceModel Parallelism for LLM InferenceModel Quantization: GPTQ, AWQ & GGUFLocal LLM DeploymentSLM Specialization & Edge DeploymentSpeculative DecodingLong Context Window ManagementContext EngineeringMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time ComputeAdvanced MLOps & DevOps for AIGPU Serving & AutoscalingA/B Testing for LLMs
🏗️System Design Capstones0/9
Content Moderation SystemCode Completion SystemMulti-Tenant LLM PlatformLLM-Powered Search EngineVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models & Image GenerationReal-Time Voice AI AgentReasoning & Test-Time Compute
🎤AI Lab Interviewing0/4
AI Lab Coding Interview: Python SystemsAI Lab System Design InterviewAI Lab Behavioral InterviewAI Lab Technical Presentation
Back to Topics
LearnAdvanced Agents & RetrievalHuman-in-the-Loop Agent Architecture
🤖HardLLM Agents & Tool Use

Human-in-the-Loop Agent Architecture

Build approval gates, durable checkpoints, and guarded resumes for agent actions that change real-world state.

37 min read
Learning path
Step 118 of 155 in the full curriculum
Computer-Use / GUI / Browser AgentsAI Coding Workflow with Agents

Human-in-the-Loop Agent Architecture

Computer-use agents can click buttons, fill forms, and operate real software. That makes the next production question unavoidable: what happens when the agent is about to spend money, change customer data, send an external message, or deploy code?

Human-in-the-Loop (HITL) architecture adds a hard execution boundary for those moments. An agent can propose a side effect, but a downstream policy service pauses it until the required reviewer authorizes the exact action.

This addresses the opaque execution problem: even capable models can produce confident but incorrect outputs, and in e-commerce operations, one bad refund or misdirected shipment can cost real money and customer trust. HITL doesn't make an action correct by itself. It creates a point where policy, evidence, authorization, and final-state checks can stop a bad action.

This is also one practical control for one of the top agent risks. The OWASP Top 10 for LLM Applications lists LLM06: Excessive Agency, which it breaks into excessive functionality, excessive permissions, and excessive autonomy. Requiring human approval for high-impact actions is one recommended mitigation for the excessive-autonomy facet, ideally enforced in a downstream system rather than left to the model to decide. [1] Read this article as the design pattern that implements that control.

Like a warehouse system that can retrieve inventory records but must route a money movement to an authorized reviewer, a HITL agent runs only within a defined risk boundary. Unlike a chatbot that waits for input at every turn, this pattern supervises actions according to their possible effect.

To make this concrete, we'll follow Riley, an order-support agent for an online electronics store. Riley can look up authorized order data and draft suggested responses. Riley can also propose a refund, an address change, or a customer email, but a downstream gate controls whether any of those side effects run. The dollar thresholds in this lesson are illustrative policy choices, not universal rules.

By the end, you'll be able to design Riley's risk policy, implement checkpoint/resume logic that lets Riley wait durably for review, and build an approval UI that gives reviewers enough evidence without exposing unnecessary customer data. We'll assume you already know LLM tool calling, state graphs, guardrail policy, and browser-agent risk. If those terms are fuzzy, review Function Calling & Tool Use, Agentic Architectures, Guardrails & Safety Filters, and Computer Use & Browser Agents first.

The three pillars of human-agent interaction

Not all human oversight is created equal. Production systems distinguish between three distinct interaction models based on risk tolerance and the nature of the task. Understanding these distinctions helps you design appropriate governance frameworks.

Human-in-the-Loop (HITL): Active gatekeeping

In this model, the agent can't proceed with a gated action without explicit human approval. Use it for actions whose impact isn't acceptable to discover after execution, such as money movement, production deployment, or a customer-record change. The agent proposes an action, pauses execution, and waits for an approve, reject, or modified proposal decision before continuing. Modified proposals still need validation.

Human-on-the-Loop (HOTL): Supervisory control

Here the agent performs tasks autonomously while a human monitors the process and retains the ability to interrupt or override. This fits actions whose effect can be stopped or repaired after detection, such as routing internal draft suggestions into a queue. It isn't a sufficient gate for sending an incorrect customer message or issuing a refund: the human may see the mistake only after the effect happened. HITL blocks before execution, while HOTL can only interrupt execution that is already allowed.

Human-out-of-the-Loop (HOOTL): Full autonomy

At the far end of the spectrum, some actions are automated with no real-time human oversight. This applies only to bounded actions with clear authorization and acceptable failure modes. An authorized order-status read or a draft created in an internal queue may fit; an unrestricted database read or web action doesn't become low risk simply because it doesn't write data. Some actions stay HITL because their effect, policy, or applicable duties demand a pre-execution gate.

The trust spectrum

Not all agent actions carry the same risk. Designing a HITL system begins by classifying every tool and action along a trust spectrum. This principle drives the level of friction required before execution.

A naive implementation treats all actions equally: either everything is autonomous (dangerous) or everything requires approval (tedious). A production-grade system defines granular policies:

Risk LevelDescriptionExamplesInteraction Model
LowBounded, authorized read or isolated draftLook up permitted order fields, Check inventory, Draft reply without sendingAuto-Execute: Run within least-privilege access.
MediumInternal staged change with a recoverable effectQueue review packet, Add a permitted internal classification labelPolicy Decision: Audit and notify, or require review if policy says so.
HighExternal or monetary side effectSend customer email, Create return shipment, Process any refundApprove: Pause and wait for an authorized reviewer.
CriticalDestructive, unusually high-impact, or legally constrainedDelete record subject to retention rules, Process unusually large refund, Change address after shipmentEscalate or Block: Require stronger authorization or deny.

Risk classification system

We can implement this policy using a RiskLevel enum and a policy lookup table. This decouples the agent's logic ("I want to do X") from the governance logic ("Should I allow X?").

Risk-tier flow for human approval policies, from low-risk auto-run actions to critical escalated actions with runtime escalation and timeout notes. Risk-tier flow for human approval policies, from low-risk auto-run actions to critical escalated actions with runtime escalation and timeout notes.
Start with static risk tiers, then add runtime escalation and timeout rules. Bounded reads and drafts can stay direct, while external side effects pause before execution.

To implement this, we define a policy lookup table that maps each available tool to a specific risk tier. This structure serves as the core policy engine, taking a tool's name and runtime arguments as contextual input and outputting the required authorization flow before execution.

risk-classification-system.py
1from enum import Enum 2from dataclasses import dataclass 3 4class RiskLevel(Enum): 5 AUTO = "auto" # Execute immediately, no human needed 6 NOTIFY = "notify" # Execute and notify human asynchronously 7 APPROVE = "approve" # Pause and wait for human approval 8 ESCALATE = "escalate" # Route to senior human + require 2 approvals 9 10@dataclass 11class ToolPolicy: 12 tool_name: str 13 risk_level: RiskLevel 14 escalate_above_amount: float | None = None 15 requires_reason: bool = False 16 timeout_minutes: int = 60 # Auto-reject after timeout 17 18TOOL_POLICIES = { 19 # Safe actions run automatically 20 "look_up_order": ToolPolicy("look_up_order", RiskLevel.AUTO), 21 "check_inventory": ToolPolicy("check_inventory", RiskLevel.AUTO), 22 23 # Customer communication requires justification 24 "send_email": ToolPolicy("send_email", RiskLevel.APPROVE, requires_reason=True), 25 26 # Refunds create a monetary side effect, so they always begin at APPROVE. 27 "process_refund": ToolPolicy( 28 "process_refund", 29 RiskLevel.APPROVE, 30 escalate_above_amount=500.0, 31 timeout_minutes=30, 32 ), 33 34 # Destructive or high-value actions require escalation 35 "delete_customer": ToolPolicy("delete_customer", RiskLevel.ESCALATE), 36 "change_shipped_address": ToolPolicy("change_shipped_address", RiskLevel.ESCALATE), 37} 38 39RISK_PRIORITY = { 40 RiskLevel.AUTO: 0, 41 RiskLevel.NOTIFY: 1, 42 RiskLevel.APPROVE: 2, 43 RiskLevel.ESCALATE: 3, 44} 45 46def escalate_risk(current: RiskLevel, target: RiskLevel) -> RiskLevel: 47 return target if RISK_PRIORITY[target] > RISK_PRIORITY[current] else current 48 49print("look_up_order:", TOOL_POLICIES["look_up_order"].risk_level.value) 50print("refund escalation threshold:", TOOL_POLICIES["process_refund"].escalate_above_amount) 51print("critical wins:", escalate_risk(RiskLevel.ESCALATE, RiskLevel.APPROVE).value)
Output
1look_up_order: auto 2refund escalation threshold: 500.0 3critical wins: escalate

Risk is not static. A "send email" tool might be Low Risk when emailing an internal test address, but High Risk when emailing an external domain. Production policies are dynamic, checking arguments at runtime.

Architecture: the Checkpoint/Resume pattern

The fundamental architectural challenge of HITL is the pause. When an agent pauses for approval, it can't sleep the thread (time.sleep()) or block in memory.

Checkpoint and resume flow for a human-in-the-loop agent, showing pause, persist, review, guarded resume, and final execution. Checkpoint and resume flow for a human-in-the-loop agent, showing pause, persist, review, guarded resume, and final execution.
Pause before side effects, persist state, collect a reviewer decision, then resume with a guarded write so stale clicks cannot trigger duplicate execution.

Why in-memory blocking fails

  1. Durability: If the server restarts or crashes while waiting for approval (which could take hours), the agent's state is lost.
  2. Resource Usage: Holding a thread open for a human response wastes compute resources.
  3. Scalability: You can't scale to thousands of concurrent agents if each one is blocking a thread.

The solution is the Checkpoint/Resume pattern. When an approval is needed, the agent persists its graph state (messages, structured variables, and pending task metadata) to durable storage and returns control to the caller. When the human approves, the runtime reloads that state and re-enters the waiting node.

To see what gets saved, here is what Riley's state might look like when paused:

why-in-memory-blocking-fails.json
1{ 2 "thread_id": "order_78291", 3 "request_summary": "Customer reports laptop not received after a delivery scan.", 4 "evidence": ["order total: $899", "carrier status: delivered"], 5 "pending_action": { 6 "tool": "process_refund", 7 "args": {"order_id": "78291", "amount": 899.00, "reason": "not_received"} 8 }, 9 "approval": { 10 "status": "pending", 11 "policy_rule": "refund.requires_review", 12 "action_hash": "sha256:5e73...", 13 "version": 3, 14 "expires_at": "2026-08-22T19:00:00Z" 15 } 16}

When an authorized reviewer clicks Approve, the runtime resolves this pending decision with a guarded write, reloads the checkpoint, rechecks current state and arguments, and only then attempts the action. If the reviewer clicks Reject, Riley can draft a response without running the refund. Checkpoint state is ordinary inspectable data, but emergency changes should still go through an authenticated, versioned, audited path rather than an ad hoc database edit.

Before building a workflow engine, you can test the persistence boundary with ordinary data. The checkpoint below stores a redacted summary and an action hash, but not the customer's raw message.

persist-minimized-checkpoint.py
1from hashlib import sha256 2import json 3 4raw_message = "I'm Casey ([email protected]). My laptop didn't arrive." 5pending_action = { 6 "tool": "process_refund", 7 "args": {"order_id": "78291", "amount": 899.00}, 8} 9action_bytes = json.dumps(pending_action, sort_keys=True).encode() 10checkpoint = { 11 "request_summary": "Customer reports delivery problem for order 78291.", 12 "pending_action": pending_action, 13 "action_hash": sha256(action_bytes).hexdigest()[:12], 14 "status": "pending", 15} 16stored = json.dumps(checkpoint) 17 18print("status:", checkpoint["status"]) 19print("action hash present:", bool(checkpoint["action_hash"])) 20print("raw message stored:", raw_message in stored)
Output
1status: pending 2action hash present: True 3raw message stored: False

Here is a comparison of two popular approaches for implementing this pattern:

FeatureLangGraph (interrupt)Temporal
Best forGraph-based agents with checkpointed interruptsWorkflow orchestration with timers, retries, and cross-service activities
State PersistenceDurable checkpointer (Postgres, Redis, etc.)Built-in durable event history
Resume MechanismAPI calls invoking Command(resume=...)Signals or Updates
Operational shapeAgent runtime plus durable checkpointerWorkflow service plus activity workers and message handlers

Using LangGraph checkpointing

LangGraph provides first-class support for this via interrupt(). [2] When you compile the graph with a checkpointer, LangGraph persists checkpoints for the thread and pauses execution until the graph is invoked again with Command(resume=...). [3] One subtle detail matters in production: when execution resumes, LangGraph reruns the node from the top, so any code before interrupt() must be idempotent. [2]

using-langgraph-checkpointing.py
1from langgraph.graph import StateGraph 2from langgraph.checkpoint.postgres import PostgresSaver 3from langgraph.types import interrupt, Command 4from langchain_core.messages import AIMessage 5from typing import TypedDict 6 7Message = dict[str, object] 8ToolAction = dict[str, object] 9 10class AgentState(TypedDict): 11 messages: list[Message] 12 pending_action: ToolAction | None 13 approval_status: str | None 14 15def execute_action_node(state: AgentState) -> dict: 16 """Execute a tool call, pausing for approval if needed.""" 17 action = state["pending_action"] 18 if action is None: 19 return {"messages": [], "pending_action": None, "approval_status": None} 20 21 policy = TOOL_POLICIES.get(action["tool"]) 22 if policy is None: 23 return { 24 "messages": [AIMessage(content="Action blocked: tool has no policy.")], 25 "pending_action": None, 26 "approval_status": "rejected", 27 } 28 execution_args = action["args"] 29 30 # Check if we need to pause 31 if policy and policy.risk_level in (RiskLevel.APPROVE, RiskLevel.ESCALATE): 32 # PAUSE execution here. 33 # The graph state is automatically saved to Postgres by the checkpointer. 34 # The runtime handles the pause; don't catch interrupt() in try/except. 35 human_response = interrupt({ 36 "action": action, 37 "risk_level": policy.risk_level.value, 38 "reason": f"Agent wants to {action['tool']} with args: {action['args']}", 39 "requires_reason": policy.requires_reason, 40 }) 41 42 # This code runs ONLY after the human resumes execution 43 if human_response["decision"] == "reject": 44 return { 45 "messages": [AIMessage(content="Action was rejected by human reviewer.")], 46 "pending_action": None, 47 "approval_status": "rejected", 48 } 49 50 # An approved edit is a new proposal. Validate it before execution. 51 if human_response["decision"] == "modify": 52 execution_args = validate_modified_args( 53 action["tool"], 54 human_response["modified_args"], 55 policy=policy, 56 ) 57 else: 58 execution_args = validate_current_args( 59 action["tool"], 60 action["args"], 61 policy=policy, 62 ) 63 else: 64 execution_args = validate_autonomous_args( 65 action["tool"], 66 action["args"], 67 policy=policy, 68 ) 69 70 # Bind execution to an idempotency key so retries can't repeat a side effect. 71 result = execute_tool( 72 action["tool"], 73 execution_args, 74 idempotency_key=action_idempotency_key(action), 75 ) 76 77 return { 78 "messages": [AIMessage(content=f"Action completed: {result}")], 79 "pending_action": None, 80 "approval_status": "executed", 81 } 82 83# Setup the graph with persistence 84graph = StateGraph(AgentState) 85graph.add_node("plan", plan_node) 86graph.add_node("execute", execute_action_node) 87# ... define edges ... 88 89# The checkpointer provides durable HITL state. 90# The first time you use a Postgres checkpointer, call setup() to create tables. 91with PostgresSaver.from_conn_string("postgresql://...") as checkpointer: 92 checkpointer.setup() 93 app = graph.compile(checkpointer=checkpointer)

The approval flow

The flow separates the Agent Runtime (the Python application, often powered by a Large Language Model (LLM)) from the Approval Interface (Web/Slack). In production, they coordinate through persisted state plus a thin resume endpoint, not a long-lived in-memory call stack. The checkpoint figure above is the source of truth for the sequence: pause, persist state, create a versioned approval record, notify the reviewer, validate the decision, reload the checkpoint, execute, and audit.

REST API for approval queue

To build the "Resume API" component, we need an endpoint that looks up the suspended thread and issues a command to resume it. This endpoint takes a thread ID plus a versioned approval decision payload as input, validates the thread's state, and outputs a resume command to awaken the agent with the provided instructions.

rest-api-for-approval-queue.py
1from fastapi import FastAPI, HTTPException 2from pydantic import BaseModel 3from typing import Literal 4from datetime import datetime, timezone 5 6app = FastAPI() 7JsonDict = dict[str, object] 8 9# Assume langgraph_app is compiled elsewhere 10# langgraph_app = graph.compile(...) 11 12class ApprovalDecision(BaseModel): 13 approval_id: str 14 expected_version: int 15 action_hash: str 16 decision: Literal["approve", "reject", "modify"] 17 reason: str | None = None 18 modified_args: JsonDict | None = None 19 20@app.post("/api/approvals/{thread_id}/decide") 21async def decide_approval(thread_id: str, decision: ApprovalDecision): 22 """ 23 Human approves, rejects, or modifies the pending action. 24 This wakes up the dormant agent. 25 """ 26 # 1. Verify the thread currently has an outstanding interrupt 27 config = {"configurable": {"thread_id": thread_id}} 28 state = langgraph_app.get_state(config) 29 if not any(task.interrupts for task in state.tasks): 30 raise HTTPException(400, "Agent isn't waiting for input") 31 32 # 2. Resolve the approval row with compare-and-swap semantics 33 # load_pending_approval()/try_resolve_approval() are persistence helpers. 34 approval = load_pending_approval(thread_id, decision.approval_id) 35 if approval is None or approval["status"] != "pending": 36 raise HTTPException(404, "Approval request not found") 37 if approval["version"] != decision.expected_version: 38 raise HTTPException(409, "Approval request is stale") 39 if approval["action_hash"] != decision.action_hash: 40 raise HTTPException(409, "Proposed action changed; request fresh review") 41 if approval["expires_at"] <= datetime.now(timezone.utc): 42 raise HTTPException(409, "Approval request expired") 43 require_reviewer_permission(current_reviewer(), approval["required_role"]) 44 45 updated = try_resolve_approval( 46 approval_id=decision.approval_id, 47 expected_version=decision.expected_version, 48 expected_action_hash=decision.action_hash, 49 next_status="rejected" if decision.decision == "reject" else "authorized", 50 ) 51 if not updated: 52 raise HTTPException(409, "Approval was already resolved") 53 54 # 3. Enqueue an idempotent resume job after recording the decision. 55 # A worker invokes Command(resume=...) and retries safely if execution fails. 56 enqueue_resume_job( 57 thread_id=thread_id, 58 approval_id=decision.approval_id, 59 resume_payload={ 60 "approval_id": decision.approval_id, 61 "expected_version": decision.expected_version, 62 "action_hash": decision.action_hash, 63 "decision": decision.decision, 64 "reason": decision.reason, 65 "modified_args": decision.modified_args, 66 "timestamp": datetime.now(timezone.utc).isoformat(), 67 }, 68 ) 69 return {"status": "decision_recorded"}

The compare-and-swap check records at most one reviewer decision. Store each approval request with fields such as status, version, action_hash, expires_at, required_role, and resolved_at, then resolve it with a guarded update such as WHERE id = ? AND status = 'pending' AND version = ? AND action_hash = ? AND expires_at > now(). Record authorized separately from executed: an approval may be accepted and then fail during execution. A durable resume worker must recheck current business state and use an idempotency key so a retry can't issue the same refund twice.

This small executable model shows those two separate transitions. It takes a versioned decision and an idempotency key as input, then shows that a stale click is rejected and an execution retry returns the previously recorded outcome.

resolve-once-execute-once.py
1from dataclasses import dataclass 2 3@dataclass 4class Approval: 5 status: str = "pending" 6 version: int = 3 7 action_hash: str = "sha256:refund-78291-899" 8 9completed_effects: dict[str, str] = {} 10 11def record_decision( 12 approval: Approval, 13 *, 14 expected_version: int, 15 action_hash: str, 16) -> str: 17 if approval.status != "pending" or approval.version != expected_version: 18 return "blocked: stale decision" 19 if approval.action_hash != action_hash: 20 return "blocked: action changed" 21 approval.status = "authorized" 22 approval.version += 1 23 return "authorized" 24 25def execute_once(approval: Approval, *, idempotency_key: str) -> str: 26 if idempotency_key in completed_effects: 27 return completed_effects[idempotency_key] 28 if approval.status != "authorized": 29 return "blocked: missing authorization" 30 result = "refund recorded once" 31 completed_effects[idempotency_key] = result 32 approval.status = "executed" 33 return result 34 35pending = Approval() 36print("decision:", record_decision( 37 pending, 38 expected_version=3, 39 action_hash="sha256:refund-78291-899", 40)) 41print("first execution:", execute_once(pending, idempotency_key="apr_123")) 42print("retry:", execute_once(pending, idempotency_key="apr_123")) 43print("second click:", record_decision( 44 pending, 45 expected_version=3, 46 action_hash="sha256:refund-78291-899", 47))
Output
1decision: authorized 2first execution: refund recorded once 3retry: refund recorded once 4second click: blocked: stale decision

An expiry guard is separate from version matching. This example takes the current time and expiry time as inputs, then rejects a decision whose review window has already passed.

reject-expired-approval.py
1from datetime import datetime, timezone 2 3def decision_allowed(*, expires_at: str, now: datetime) -> str: 4 expiry = datetime.fromisoformat(expires_at.replace("Z", "+00:00")) 5 return "accepted" if now < expiry else "blocked: approval expired" 6 7clock = datetime(2026, 8, 22, 19, 0, tzinfo=timezone.utc) 8print("fresh:", decision_allowed( 9 expires_at="2026-08-22T19:01:00Z", 10 now=clock, 11)) 12print("stale:", decision_allowed( 13 expires_at="2026-08-22T18:59:00Z", 14 now=clock, 15))
Output
1fresh: accepted 2stale: blocked: approval expired

Designing the approval interface

A "Yes/No" button is rarely enough for production systems. The approval interface must provide context and control. When Riley asks "Can I refund order #78291 for $899?", the human needs to know what data will change and why.

The approval UI should show the authorized reviewer the smallest evidence packet needed for the decision: a redacted request summary, the policy trigger, source references the reviewer is allowed to inspect, and the exact arguments Riley proposes to execute. Dumping an entire customer conversation into every review card creates unnecessary privacy exposure and makes the meaningful change harder to see.

The approval payload

Riley should pause with a structured payload that the UI can render clearly. This payload takes authorized evidence and proposed tool arguments as input and structures them into a redacted typed object for the reviewer. For data mutations, this should ideally be a visual diff rather than raw JSON.

the-approval-payload.ts
1// Frontend type + sample payload 2type ApprovalRequest = { 3 id: string; 4 agentId: string; 5 requestVersion: number; 6 actionHash: string; 7 expiresAt: string; 8 action: { 9 tool: "update_user_record"; 10 reason: string; 11 policyRule: string; 12 args: { 13 user_id: string; 14 updates: { address: string }; 15 }; 16 riskLevel: "HIGH" | "CRITICAL"; 17 }; 18 context: { 19 chatHistorySummary: string; 20 }; 21}; 22 23const pendingApproval: ApprovalRequest = { 24 id: "apr_123", 25 agentId: "agent_42", 26 requestVersion: 4, 27 actionHash: "sha256:address-change-u_123-v4", 28 expiresAt: "2026-08-22T19:00:00Z", 29 action: { 30 tool: "update_user_record", 31 reason: "User requested address change via support chat", 32 policyRule: "customer_profile.write_requires_review", 33 args: { 34 user_id: "u_123", 35 updates: { address: "123 New St" }, // Render this as a diff in the UI 36 }, 37 riskLevel: "HIGH", 38 }, 39 context: { 40 chatHistorySummary: "User authenticated via 2FA...", 41 }, 42};
Approval review packet showing permitted evidence, exact mutation diff, and stale-click guards such as action hash, version, and expiry. Approval review packet showing permitted evidence, exact mutation diff, and stale-click guards such as action hash, version, and expiry.
Reviewers need permitted evidence, exact mutation, and hash/version/expiry guards in one place. A loose summary can't bind approval to one effect.

By rendering this payload, the Human-in-the-Loop UI becomes a useful decision surface. The reviewer can compare Riley's proposed effect with permitted evidence before authorizing the tool call. For complex actions, consider adding a dry-run diff that shows what would happen if the action were approved. Version, expiry, and action hash fields matter too: they let the backend reject stale clicks instead of replaying an approval long after the underlying state changed.

The packet builder is also a data-minimization boundary. This executable example takes a customer note and exact action as input, redacts the email address, and emits a stable hash that binds the review card to the proposed mutation.

build-redacted-review-packet.py
1from hashlib import sha256 2import json 3import re 4 5def build_packet(note: str, action: dict[str, object]) -> dict[str, object]: 6 redacted_note = re.sub(r"[\w.+-]+@[\w.-]+", "[email redacted]", note) 7 action_hash = sha256( 8 json.dumps(action, sort_keys=True).encode() 9 ).hexdigest()[:12] 10 return {"evidence": redacted_note, "action": action, "action_hash": action_hash} 11 12packet = build_packet( 13 "Contact [email protected] only after review.", 14 {"tool": "process_refund", "order_id": "78291", "amount": 899.00}, 15) 16print("evidence:", packet["evidence"]) 17print("has action hash:", bool(packet["action_hash"]))
Output
1evidence: Contact [email redacted] only after review. 2has action hash: True

Give the reviewer a concise rationale, the exact tool arguments, a diff, and the policy rule that triggered review. Don't make raw chain-of-thought your approval primitive. Explanations can be unfaithful, and they may reveal internal reasoning or sensitive context you didn't intend to surface. [4]

Also keep approval state compact. Persist a data-minimized audit record with access controls and retention policy, then inject only structured fields such as decision, reason, modified_args, or a short reviewer summary back into the runtime. Otherwise long-lived threads accumulate reviewer chatter that burns context window on approval metadata instead of task state.

Using Temporal for long-running workflows

For workflows that coordinate several durable activities, timers, retries, and approval messages, Temporal can be a better fit than a hand-rolled checkpoint table. LangGraph checkpoints can also wait durably; the choice isn't simply "short wait versus long wait." Temporal's workflow execution records event history and communicates with callers through activities plus message-passing primitives such as Signals and Updates. [5]

Signals are a good default when the approval service can fire-and-forget. If the UI needs synchronous confirmation that the workflow accepted the decision, or you want the runtime to reject a stale approval before it lands in history, an Update is usually cleaner. [6]

Temporal also gives you durable state, timers, retries, and event history out of the box. That means you don't have to build the orchestration layer for "pause, notify, wait, resume" yourself.

This approach is useful for multi-step approval chains (for example, waiting for both a support lead and finance sign-off). The workflow can wait without keeping a worker blocked, evaluate partial approvals, and continue waiting for the remaining required decision.

The Temporal workflow below uses an Update for the approval decision because the UI needs to learn whether the action ID and version still match the pending request. The workflow then calls a validation activity before any external side effect.

using-temporal-for-long-running-workflows.py
1from temporalio import workflow 2 3ApprovalPayload = dict[str, object] 4 5# Assume activities are defined elsewhere 6# plan_actions, notify_human, validate_for_execution, execute_action, is_risky = ... 7 8@workflow.defn 9class AgentWorkflow: 10 def __init__(self) -> None: 11 self._pending_action_id: str | None = None 12 self._pending_version: int | None = None 13 self._human_decision: ApprovalPayload | None = None 14 15 @workflow.update 16 def decide(self, decision: ApprovalPayload) -> str: 17 self._human_decision = decision 18 return "accepted" 19 20 @decide.validator 21 def validate_decision(self, decision: ApprovalPayload) -> None: 22 """Reject stale decisions before the Update is accepted.""" 23 if decision["action_id"] != self._pending_action_id: 24 raise ValueError("stale action id") 25 if decision["expected_version"] != self._pending_version: 26 raise ValueError("stale action version") 27 28 @workflow.run 29 async def run(self, task: str): 30 # Step 1: Plan 31 plan = await workflow.execute_activity(plan_actions, task, ...) 32 33 for action in plan.actions: 34 if is_risky(action): 35 self._pending_action_id = action["id"] 36 self._pending_version = action["version"] 37 self._human_decision = None 38 39 # Send notification (Slack/Email) 40 await workflow.execute_activity( 41 notify_human, 42 {"action_id": action["id"], "action": action}, 43 ..., 44 ) 45 46 # Wait durably until a validated decision Update arrives. 47 await workflow.wait_condition( 48 lambda: self._human_decision is not None 49 ) 50 51 decision = self._human_decision 52 self._pending_action_id = None 53 self._pending_version = None 54 self._human_decision = None 55 56 if decision["decision"] == "reject": 57 continue 58 59 # The reviewer decision isn't enough: validate any modified 60 # arguments and current business state in an Activity. 61 action = await workflow.execute_activity( 62 validate_for_execution, 63 {"action": action, "decision": decision}, 64 ..., 65 ) 66 67 # Step 2: Execute with an idempotency key carried by the action. 68 await workflow.execute_activity(execute_action, action, ...)

Keep policy code inside the workflow deterministic. If is_risky() depends on live balances, anomaly services, or a policy database, fetch that data in an Activity first and pass the result into workflow state. Temporal replays workflow code against event history, so non-deterministic logic inside the workflow will break replay. [5]

If a notification doesn't need a response, a Signal may still be appropriate. Approval buttons need synchronous accept/reject feedback, so an Update with validation is a stronger fit for this decision path. [6]

For approval chains that can run for months or accumulate a large event history, periodically use Continue-As-New to cap history size while carrying forward unresolved approval state. [5]

Advanced patterns

Building a resilient Human-in-the-Loop system goes beyond simple pause-and-resume mechanics. Production environments require granular controls to handle complex, high-volume, or ambiguous scenarios effectively. By implementing advanced patterns, we can reduce friction for human reviewers while maintaining strict safety boundaries.

Dynamic risk escalation

Static policies need runtime context. Sending one customer email is already an external side effect that may require approval; attempting hundreds of emails in a short interval should move into a stricter queue or be blocked outright. The figure below starts from a static policy floor and conditionally escalates the required authorization layer.

Dynamic risk escalation flow moving from proposed tool call through static policy and runtime checks to a stricter final oversight tier. Dynamic risk escalation flow moving from proposed tool call through static policy and runtime checks to a stricter final oversight tier.
Static policy sets the floor. Runtime context can promote a request to stricter oversight, but should not silently relax a known high-risk action.

Dynamic policies evaluate context by taking the proposed action and operational metadata as input, applying rule-based checks, and outputting an escalated risk tier if anomalies are detected. The following function combines the static base policy with runtime conditions. It checks the transaction amount, recent failure count, and time window, elevating the required authorization level when thresholds are crossed:

dynamic-risk-escalation.py
1from dataclasses import dataclass 2from enum import Enum 3from typing import cast 4 5class RiskLevel(Enum): 6 AUTO = "auto" 7 NOTIFY = "notify" 8 APPROVE = "approve" 9 ESCALATE = "escalate" 10 11@dataclass 12class ToolPolicy: 13 tool_name: str 14 risk_level: RiskLevel 15 16TOOL_POLICIES = { 17 "look_up_order": ToolPolicy("look_up_order", RiskLevel.AUTO), 18 "process_refund": ToolPolicy("process_refund", RiskLevel.APPROVE), 19 "change_shipped_address": ToolPolicy("change_shipped_address", RiskLevel.ESCALATE), 20} 21 22RISK_PRIORITY = { 23 RiskLevel.AUTO: 0, 24 RiskLevel.NOTIFY: 1, 25 RiskLevel.APPROVE: 2, 26 RiskLevel.ESCALATE: 3, 27} 28 29def escalate_risk(current: RiskLevel, target: RiskLevel) -> RiskLevel: 30 return target if RISK_PRIORITY[target] > RISK_PRIORITY[current] else current 31 32ToolArgs = dict[str, float | str | bool] 33ToolAction = dict[str, object] 34RuntimeContext = dict[str, object] 35 36def is_business_hours(context: RuntimeContext) -> bool: 37 hour = int(context.get("local_hour", 12)) 38 return 8 <= hour < 18 39 40def calculate_dynamic_risk(action: ToolAction, context: RuntimeContext) -> RiskLevel: 41 """Riley's risk increases with refund amount, recent anomalies, and time of day.""" 42 tool_name = str(action["tool"]) 43 args = cast(ToolArgs, action.get("args", {})) 44 base_risk = TOOL_POLICIES[tool_name].risk_level 45 46 # 1. Value-based escalation: large refunds are critical 47 if float(args.get("amount", 0)) > 500: 48 base_risk = escalate_risk(base_risk, RiskLevel.ESCALATE) 49 50 # 2. Anomaly-based escalation: many recent failures suggests something is wrong 51 if context.get("recent_failures", 0) > 3: 52 base_risk = escalate_risk(base_risk, RiskLevel.APPROVE) 53 54 # 3. Temporal escalation: off-hours refunds get extra scrutiny 55 if tool_name == "process_refund" and not is_business_hours(context): 56 base_risk = escalate_risk(base_risk, RiskLevel.APPROVE) 57 58 return base_risk 59 60print("refund $899:", calculate_dynamic_risk( 61 {"tool": "process_refund", "args": {"amount": 899.00}}, 62 {"recent_failures": 0, "local_hour": 14}, 63).value) 64print("read after failures:", calculate_dynamic_risk( 65 {"tool": "look_up_order", "args": {}}, 66 {"recent_failures": 4, "local_hour": 14}, 67).value) 68print("shipped address:", calculate_dynamic_risk( 69 {"tool": "change_shipped_address", "args": {}}, 70 {"recent_failures": 0, "local_hour": 10}, 71).value)
Output
1refund $899: escalate 2read after failures: approve 3shipped address: escalate

Approval with modification

A useful HITL pattern lets the human modify a proposed action rather than merely reject it. Instead of a binary yes/no choice, the human can correct arguments, such as changing a full refund proposal to a partial one or correcting an order ID. That edit is a new proposal, not a guarantee of safety.

Build the approval UI as a form, not a button. Populate the form with the agent's proposed arguments (e.g., email body, SQL query) and allow the human to edit them before hitting "Approve".

For example, if Riley proposes a full refund: process_refund(order_id="78291", amount=899.00), a human reviewer can modify it to process_refund(order_id="78291", amount=449.50, partial=True) to offer a partial refund instead. Before execution, the host must validate schema, reviewer permission, applicable policy, action version, and current order state. The edit doesn't train Riley and doesn't bypass a stronger approval tier.

Batch approvals

If Riley needs to propose 50 return-label creations at the end of a holiday weekend, asking for separate decisions for each one creates reviewer fatigue. Reviewers who face repetitive requests may stop inspecting individual effects. A batch review can group related proposals without hiding the identity, destination, cost, or policy status of each item.

Instead of creating one alert per proposed operation, the orchestrator collects pending actions over a bounded window or groups them by a shared task identifier. The UI can then present: "Riley proposes 50 return labels (review all items)." The approval record must bind to the exact item list or item hashes, so a label inserted after approval can't ride along in the batch.

Implementing batch approvals requires per-item state and idempotency. If a batch contains 50 actions and the reviewer authorizes the fixed set, the orchestrator must track each action separately. If action 42 fails, already-completed labels aren't automatically undone unless the operation provides a compensation path. Retry only the failed authorized item with its idempotency key, and re-request review if its inputs or destination changed.

Bind a batch approval to the exact ordered items shown to the reviewer. Here a later label inserted into the queue changes the digest, so it can't reuse the earlier authorization.

bind-batch-to-reviewed-items.py
1from hashlib import sha256 2import json 3 4def batch_digest(items: list[dict[str, str]]) -> str: 5 payload = json.dumps(items, sort_keys=True, separators=(",", ":")) 6 return sha256(payload.encode()).hexdigest()[:12] 7 8reviewed = [ 9 {"id": "label_1", "order_id": "78291"}, 10 {"id": "label_2", "order_id": "78292"}, 11] 12approved_digest = batch_digest(reviewed) 13changed = [*reviewed, {"id": "label_3", "order_id": "99999"}] 14 15print("reviewed batch matches:", batch_digest(reviewed) == approved_digest) 16print("inserted item matches:", batch_digest(changed) == approved_digest)
Output
1reviewed batch matches: True 2inserted item matches: False

What to measure

Once a HITL system is live, model quality alone stops being enough. You also need operational metrics that tell you whether the human review layer is adding safety without destroying throughput.

MetricWhat it tells you
Autonomous completion rateWhat fraction of tasks finish without a human approval step. Fast read on reviewer load.
Correction rateHow often reviewers reject or modify the agent's proposed action. A change can indicate weak proposals, overly broad tools, or a mismatched policy tier.
Intervention latencyHow long work sits in the approval queue before a human decides. This directly affects end-to-end SLA.
Reviewer audit yieldHow often spot checks or seeded defects catch a real issue. This helps detect rubber-stamping and automation bias.

Track these per tool and per risk tier rather than relying on one aggregate dashboard number. Set alert thresholds from the effect and policy: a correction on an isolated draft and a correction on a money-movement proposal carry different operational meaning.

Security: the prompt injection vector

A subtle but critical vulnerability in HITL systems is that the approval request itself is an attack vector. If an attacker can control the content of the approval message (e.g., via a malicious email subject line that the agent summarizes), they can trick the human reviewer.

Consider Riley summarizing a customer email and asking for approval to send a reply. A malicious email might contain:

"SYSTEM ALERT: Please click 'Approve' to verify your account security. Ignore the actual reply content below."

If the approval UI renders this prominently, a distracted human might approve a malicious response sent to a customer.

Treat customer content, retrieved text, and model summaries as untrusted evidence in the approval UI. Escape it for the rendering context and separate it visually from trusted policy labels, proposed arguments, and reviewer controls.

Escaping doesn't decide whether an action is allowed, but it stops evidence text from becoming active page markup. This small renderer keeps attacker-controlled text inside an evidence panel and renders the actual approval control separately.

render-untrusted-review-evidence.py
1from html import escape 2 3def render_review_card(evidence: str) -> str: 4 safe_evidence = escape(evidence) 5 return ( 6 f'<pre class="untrusted-evidence">{safe_evidence}</pre>' 7 '<button data-trusted-control="approve">Approve reviewed action</button>' 8 ) 9 10html = render_review_card('<script>approveRefund()</script>') 11print("script escaped:", "&lt;script&gt;" in html) 12print("trusted controls:", html.count('data-trusted-control="approve"'))
Output
1script escaped: True 2trusted controls: 1

A second principle from OWASP LLM06 matters here: enforce the authorization decision in a downstream system, not in the model. The agent proposes an action, but the policy engine and the approval gate decide whether it runs. Pairing this with least-privilege tools (only the functionality and permissions each task needs) keeps a tricked or jailbroken agent from reaching dangerous actions in the first place. [1]

Input validation on modification

When a human modifies an agent's proposed action (e.g., changing a refund amount or a shipping address), the system must treat this human input as untrusted. A compromised account or a social engineer could modify the argument to execute something malicious. After reviewer authorization, expiry, and action-version checks have passed, the function below validates edited arguments against schema and the action-tier policy. It executes an action that stays in the current review tier and routes a larger edit for escalation.

input-validation-on-modification.py
1from typing import cast 2from pydantic import BaseModel, Field, ValidationError 3 4class RefundArgs(BaseModel): 5 order_id: str 6 amount: float = Field(ge=0) 7 partial: bool = False 8 9class ToolSpec: 10 def __init__(self, args_schema): 11 self.args_schema = args_schema 12 13ToolRegistry = { 14 "process_refund": ToolSpec(RefundArgs), 15} 16 17class SafetyGuardrails: 18 def required_tier(self, tool_name: str, validated_args: BaseModel) -> str: 19 if tool_name == "process_refund": 20 return "approve" if getattr(validated_args, "amount", 0) <= 500 else "escalate" 21 return "reject" 22 23safety_guardrails = SafetyGuardrails() 24 25JsonDict = dict[str, object] 26ActionState = dict[str, object] 27Modification = dict[str, JsonDict] 28 29def reject_action(state: ActionState, reason: str) -> JsonDict: 30 return {"status": "rejected", "reason": reason, "state": state} 31 32def execute_tool(tool_name: str, validated_args: BaseModel) -> JsonDict: 33 return { 34 "status": "executed", 35 "tool": tool_name, 36 "args": validated_args.model_dump(), 37 } 38 39def resume_with_modification( 40 state: ActionState, 41 modification: Modification, 42) -> JsonDict: 43 """ 44 Resume agent after human modified the tool arguments. 45 CRITICAL: Re-validate the new arguments against safety policies. 46 """ 47 new_args = modification["new_args"] 48 pending_action = cast(JsonDict, state["pending_action"]) 49 tool_name = cast(str, pending_action["tool"]) 50 51 # 1. Syntax Validation (Pydantic) 52 try: 53 validated_args = ToolRegistry[tool_name].args_schema(**new_args) 54 except ValidationError as e: 55 return reject_action(state, reason=f"Invalid modification: {e}") 56 57 # 2. The same policy is applied to the edited arguments. 58 required_tier = safety_guardrails.required_tier(tool_name, validated_args) 59 if required_tier == "escalate": 60 return {"status": "needs_escalation", "args": validated_args.model_dump()} 61 if required_tier == "reject": 62 return reject_action(state, reason="Modification violated safety policy") 63 64 # 3. Resume execution with safe arguments 65 return execute_tool(tool_name, validated_args) 66 67state = {"pending_action": {"tool": "process_refund"}} 68 69approved = resume_with_modification( 70 state, 71 {"new_args": {"order_id": "78291", "amount": 449.50, "partial": True}}, 72) 73 74escalated = resume_with_modification( 75 state, 76 {"new_args": {"order_id": "78291", "amount": 899.00}}, 77) 78 79print("approved:", approved["status"], approved["args"]["amount"]) 80print("larger edit:", escalated["status"])
Output
1approved: executed 449.5 2larger edit: needs_escalation

Scaling oversight with AI triage

As agent volume grows, human review can become the primary operational bottleneck. A rules engine or reviewer model can help prioritize the queue and reject proposals that clearly violate policy, provided it never turns an approval-required effect into autonomous execution.

This introduces an "AI-in-the-loop" filter that can automatically reject obvious policy violations and fast-path actions already classified as autonomous by explicit policy. The human reviewer is still required for actions at or above the approval floor.

A practical design combines three layers: hard policy rules, anomaly features, and an evaluator that emits a queue score plus rationale. The evaluator shouldn't silently override an approval-marked or critical action. Its job is triage, not final authority. Reviewer decisions may later support evaluation or training, but only after access control, purpose limitation, redaction, label-quality review, and leakage-safe dataset splitting.

Treat the policy tier as a floor. This router takes an explicit policy and a model suggestion as inputs; it lets a draft stay autonomous, but refuses to downgrade a refund proposal or destructive action.

preserve-policy-floor-during-triage.py
1RANK = {"auto": 0, "approve": 1, "escalate": 2} 2POLICY_FLOOR = { 3 "draft_reply": "auto", 4 "process_refund": "approve", 5 "delete_customer": "escalate", 6} 7 8def routed_tier(tool: str, evaluator_suggestion: str) -> str: 9 floor = POLICY_FLOOR[tool] 10 if RANK[evaluator_suggestion] < RANK[floor]: 11 return floor 12 return evaluator_suggestion 13 14print("draft:", routed_tier("draft_reply", "auto")) 15print("refund suggested auto:", routed_tier("process_refund", "auto")) 16print("delete suggested approve:", routed_tier("delete_customer", "approve"))
Output
1draft: auto 2refund suggested auto: approve 3delete suggested approve: escalate

That human approval record is also part of your governance story. NIST AI RMF frames governance, measurement, and operational controls as lifecycle responsibilities across design, deployment, and management. [7] If a deployment is classified as a high-risk AI system under the EU AI Act, Article 14 requires effective human oversight proportionate to its risk, including abilities related to understanding limitations, automation bias, interpretation of output, and intervention or stopping the system. [8] For any consequential workflow, retain authorized, data-minimized evidence of who decided, which version and effect were reviewed, what executed, and why the outcome was recorded.

Practice: design Riley's weekend policy

Suppose Riley runs unattended on Saturday night when the warehouse is closed. Design a dynamic risk policy with these rules:

  • Automatically perform authorized inventory lookups and package-status reads
  • Require approval for every refund proposal
  • Escalate any refund over $50 to a senior reviewer
  • Escalate for address changes on already-shipped orders
  • Auto-reject any request after 2:00 AM if the same customer already requested two refunds in the past hour

Write the ToolPolicy table and the calculate_dynamic_risk function. Then identify the edge case: what happens if a customer makes three $49 refund requests in one hour to bypass the senior-review threshold?

This is a split-transaction attack. Dynamic policies must look at aggregate customer behavior, rather than single transaction amounts. A useful fix is a rolling-sum check: if a customer requests more than $100 in total refunds within 24 hours, escalate regardless of individual transaction size.

Key takeaways

  • Risk Classification: Categorize every tool into Auto, Notify, Approve, or Escalate.

  • Durability: Use the Checkpoint/Resume pattern (LangGraph, Temporal) so approvals can be asynchronous and survive restarts.

  • Modification: Allow reviewers to edit proposals, then validate the edited action again before execution.

  • Dynamic Policy: Context matters. Escalate risk based on velocity, time of day, and dollar amounts.

  • Batching: Group repetitive actions to prevent alert fatigue.

The point of HITL isn't to keep humans clicking "Approve" forever. It's to put human judgment where failure is expensive or irreversible, while keeping policy-required gates in place even as lower-risk automation improves.

You now understand how to classify agent actions by risk, pause execution with durable checkpoints, resume safely with compare-and-swap semantics, and design approval UIs that give humans real context. You also know the common traps: in-memory blocking, stale approvals, alert fatigue, and prompt injection through the approval message itself.

The next step is applying those approval patterns to AI-assisted software work. Coding agents can create useful patches, but they need the same risk tiers, review gates, data-minimized audit evidence, and human ownership before their changes reach a repository or deployment pipeline.

Mastery check

Key concepts

  • HITL, HOTL, and HOOTL oversight modes
  • Risk tiers: Auto, Notify, Approve, and Escalate
  • Durable checkpoint and resume instead of in-memory waiting
  • Approval packets, compare-and-swap guards, and stale-click protection
  • Dynamic escalation, safe modification, and reviewer throughput metrics

Evaluation rubric

  • Picks the right oversight mode for each tool action instead of blocking everything
  • Explains why durable execution and guarded resume writes are required
  • Designs reviewer UX with context, diffs, expiry, and validation
  • Balances safety and throughput with escalation logic, batching, and metrics

Follow-up questions

Common pitfalls

Symptom: Review queue grows and reviewers start blindly approving requests. Cause: Everything is routed through the same approval path, including safe or repetitive work. Fix: Add risk tiers, bounded batches, and triage while keeping pre-execution review for effects that policy requires humans to authorize.

Symptom: Approved actions disappear after deploy or restart. Cause: Approval state was stored in memory or tied to a live request thread. Fix: Persist checkpoints and approval rows durably, then resume from stored state instead of sleeping a worker.

Symptom: Two reviewers approve same request and agent runs action twice. Cause: Approval resolution did not use version checks or compare-and-swap semantics. Fix: Guard writes on status and version, then reject stale clicks on resume.

Symptom: Reviewer edits create unsafe tool arguments even though model proposal looked safe. Cause: Human modifications were trusted without schema, authorization, or business-rule validation. Fix: Re-validate modified arguments exactly like model-generated arguments before execution.

Symptom: Approval UI becomes a prompt-injection surface. Cause: Attacker-controlled text or raw model summaries are rendered like trusted system instructions. Fix: Separate untrusted content visually, show structured diffs, and keep authorization logic downstream from the model.

Next Step
Continue to AI Coding Workflow with Agents

Human-in-the-loop gives you the risk classification, durable checkpoints, and approval gates. The next article shows how to apply those same patterns to AI-assisted software development: scoping coding tasks, running agents inside branches with tests, and requiring human review on risky changes before they reach a repository.

PreviousComputer-Use / GUI / Browser Agents
Share this article
XFacebookLinkedInBlueskyRedditHacker NewsEmail
References

OWASP Top 10 for Large Language Model Applications

OWASP Foundation · 2025

LangGraph Interrupts

LangChain · 2024

LangGraph Persistence

LangChain · 2026

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman · 2023

Temporal Workflow Execution Overview

Temporal Technologies · 2026

Temporal Python SDK: Workflow message passing

Temporal Technologies · 2024

Artificial Intelligence Risk Management Framework (AI RMF 1.0)

National Institute of Standards and Technology · 2023

EU AI Act: Regulation laying down harmonised rules on artificial intelligence

European Parliament and Council of the European Union · 2024