LearnAdvanced Agents & RetrievalHuman-in-the-Loop Agent Architecture

🤖HardLLM Agents & Tool Use

Human-in-the-Loop Agent Architecture

Build approval gates, durable checkpoints, and guarded resumes for agent actions that change external state.

39 min read

Learning path

Step 120 of 158 in the full curriculum

Computer-Use / GUI / Browser Agents AI Coding Workflow with Agents

Computer-use agents can click buttons, fill forms, and operate real software. That makes the next production question unavoidable: what happens when the agent is about to spend money, change user data, send an external message, or deploy code?

Human-in-the-Loop (HITL) architecture adds a hard execution boundary for those moments. An agent can propose a side effect, but a downstream policy service pauses it until the required reviewer authorizes the exact action.

This addresses the opaque execution problem: even capable models can produce confident but incorrect outputs, and in production AI systems, one bad model promotion or rollback can degrade real users. HITL doesn't make an action correct by itself. It creates a point where policy, evidence, authorization, and final-state checks can stop a bad action.

This is also one practical control for one of the top agent risks. The OWASP Top 10 for LLM Applications lists LLM06: Excessive Agency, which it breaks into excessive functionality, excessive permissions, and excessive autonomy. Requiring human approval for high-impact actions is one recommended mitigation for the excessive-autonomy facet, ideally enforced in a downstream system rather than left to the model to decide. ^{[1]Reference 1OWASP Top 10 for Large Language Model Applicationshttps://genai.owasp.org/llm-top-10/} Treat the pattern below as one implementation of that control.

Like a release bot that can read evaluation results but must route a production traffic change to an authorized reviewer, a HITL agent runs only within a defined permission boundary. Unlike a chatbot that waits for input at every turn, this pattern supervises actions according to their possible effect.

Follow Vega, a model-release assistant for an internal AI platform. Vega can read authorized evaluation runs, inspect deployment metrics, and draft release notes. Vega can also propose a model promotion, traffic shift, rollback, or incident update, but a downstream gate controls whether any of those side effects run. The traffic thresholds below are illustrative policy choices, not universal rules.

Design Vega's risk policy, implement checkpoint/resume logic that lets Vega wait durably for review, and build an approval UI that gives reviewers enough evidence without exposing unnecessary deployment data. This lesson assumes you already know large language model (LLM) tool calling, state graphs, guardrail policy, and browser-agent risk. If those terms are fuzzy, review Function Calling & Tool Use, Agentic Architectures, Guardrails & Safety Filters, and Computer Use & Browser Agents first.

The three pillars of human-agent interaction

Not all human oversight is created equal. Production systems distinguish between three distinct interaction models based on risk tolerance and the nature of the task. Those distinctions determine where approval, supervision, and audit controls belong.

Human-in-the-Loop (HITL): Active gatekeeping

In this model, the agent can't proceed with a gated action without explicit human approval. Use it for actions whose impact isn't acceptable to discover after execution, such as money movement, production deployment, or user-record change. The agent proposes an action, pauses execution, and waits for an approve, reject, or modified proposal decision before continuing. Modified proposals still need validation.

Human-on-the-Loop (HOTL): Supervisory control

Here the agent performs tasks autonomously while a human monitors the process and retains the ability to interrupt or override. This fits actions whose effect can be stopped or repaired after detection, such as routing internal draft suggestions into a queue. It isn't a sufficient gate for shifting production traffic or publishing an incident update: the human may see the mistake only after the effect happened. HITL blocks before execution, while HOTL can only interrupt execution that's already allowed.

Human-out-of-the-Loop (HOOTL): Full autonomy

At the far end of the spectrum, some actions are automated with no real-time human oversight. This applies only to bounded actions with clear authorization and acceptable failure modes. An authorized evaluation read or a draft created in an internal queue may fit; an unrestricted database read or web action doesn't become low risk merely because it doesn't write data. Some actions stay HITL because their effect, policy, or applicable duties demand a pre-execution gate.

The trust spectrum

Not all agent actions carry the same risk. Designing a HITL system begins by classifying every tool and action along a trust spectrum. This principle drives the level of friction required before execution.

A naive implementation treats all actions equally: either everything is autonomous (dangerous) or everything requires approval (tedious). A usable HITL system defines granular policies:

Risk Level	Description	Examples	Interaction Model
Low	Bounded, authorized read or isolated draft	Read permitted eval metrics, Inspect canary health, Draft release note without publishing	Auto-Execute: Run within least-privilege access.
Medium	Internal staged change with a recoverable effect	Queue review packet, Add a permitted internal risk label	Policy Decision: Audit and notify, or require review if policy says so.
High	External or production side effect	Publish incident update, Promote model to canary, Shift endpoint traffic	Approve: Pause and wait for an authorized reviewer.
Critical	Destructive, unusually high-impact, or legally constrained	Delete retained eval evidence, Promote despite failing gate, Disable required safety filter	Escalate or Block: Require stronger authorization or deny.

Risk classification system

Start with a RiskLevel enum and a policy lookup table. That decouples the agent's logic ("I want to do X") from the governance logic ("Should I allow X?").

Human-approval policy board that routes actions from auto-run reads through audit, approval, and escalated review as side effects become stronger or more irreversible. — Start with static tiers. Reads and drafts can run directly, internal changes stay auditable, writes pause for approval, and regulated or irreversible actions escalate.

To implement this, we define a policy lookup table that maps each available tool to a specific risk tier. This structure is the core policy engine: it takes a tool's name and runtime arguments as contextual input, then returns the required authorization flow before execution.

risk-classification-system.py

from enum import Enum
from dataclasses import dataclass

class RiskLevel(Enum):
    AUTO = "auto"           # Execute immediately, no human needed
    NOTIFY = "notify"       # Execute and notify human asynchronously
    APPROVE = "approve"     # Pause and wait for human approval
    ESCALATE = "escalate"   # Route to senior reviewer or stronger authorization

@dataclass
class ToolPolicy:
    tool_name: str
    risk_level: RiskLevel
    escalate_above_traffic_percent: float | None = None
    requires_reason: bool = False
    timeout_minutes: int = 60  # Auto-reject after timeout

TOOL_POLICIES = {
    # Safe actions run automatically
    "read_eval_run": ToolPolicy("read_eval_run", RiskLevel.AUTO),
    "inspect_service_metrics": ToolPolicy("inspect_service_metrics", RiskLevel.AUTO),

    # External communication requires justification
    "publish_incident_update": ToolPolicy(
        "publish_incident_update",
        RiskLevel.APPROVE,
        requires_reason=True,
    ),

    # Model promotions affect production users, so they always begin at APPROVE.
    "promote_model": ToolPolicy(
        "promote_model",
        RiskLevel.APPROVE,
        escalate_above_traffic_percent=25.0,
        timeout_minutes=30,
    ),

    # Destructive or high-impact actions require escalation
    "delete_eval_evidence": ToolPolicy("delete_eval_evidence", RiskLevel.ESCALATE),
    "disable_safety_filter": ToolPolicy("disable_safety_filter", RiskLevel.ESCALATE),
}

RISK_PRIORITY = {
    RiskLevel.AUTO: 0,
    RiskLevel.NOTIFY: 1,
    RiskLevel.APPROVE: 2,
    RiskLevel.ESCALATE: 3,
}

def escalate_risk(current: RiskLevel, target: RiskLevel) -> RiskLevel:
    return target if RISK_PRIORITY[target] > RISK_PRIORITY[current] else current

print("read_eval_run:", TOOL_POLICIES["read_eval_run"].risk_level.value)
print(
    "promotion traffic threshold:",
    TOOL_POLICIES["promote_model"].escalate_above_traffic_percent,
)
print("critical wins:", escalate_risk(RiskLevel.ESCALATE, RiskLevel.APPROVE).value)

Output

read_eval_run: auto
promotion traffic threshold: 25.0
critical wins: escalate

Risk isn't static. A "send email" tool might be Low Risk when emailing an internal test address, but High Risk when emailing an external domain. Production policies are dynamic, checking arguments at runtime.

Architecture: the Checkpoint/Resume pattern

The fundamental architectural challenge of HITL is the pause. When an agent pauses for approval, it can't sleep the thread (time.sleep()) or block in memory. A guarded compare-and-swap (CAS) update later lets only one current approval resolve the pending row.

Human-in-the-loop checkpoint and resume trace showing proposal, approval pause, durable checkpoint, reviewer decision, guarded resume, and final execution. — Pause before side effects, persist durable state, then resume only through guarded decision path so stale clicks can't replay execution.

Why in-memory blocking fails

Durability: If the server restarts or crashes while waiting for approval (which could take hours), the agent's state is lost.
Resource Usage: Holding a thread open for a human response wastes compute resources.
Scalability: You can't scale to thousands of concurrent agents if each one is blocking a thread.

Use the Checkpoint/Resume pattern. When an approval is needed, the agent persists its graph state (messages, structured variables, and pending task metadata) to durable storage and returns control to the caller. After approval, the runtime reloads that state and re-enters the waiting node.

Vega's paused state might look like this:

why-in-memory-blocking-fails.json

{
  "thread_id": "release_reranker_v17",
  "request_summary": "Promote recommendation-reranker-v17 from shadow to 25% traffic.",
  "evidence": ["offline eval pass: 0.842 nDCG@10", "canary error rate: 0.21%"],
  "pending_action": {
    "tool": "promote_model",
    "args": {"model_id": "recommendation-reranker-v17", "traffic_percent": 25}
  },
  "approval": {
    "status": "pending",
    "policy_rule": "model_release.requires_review",
    "action_hash": "sha256:53031c1110b357d59740606f7d211b5e876014720b38f5e3ad34582a37d827eb",
    "version": 3,
    "expires_at": "2026-08-22T19:00:00Z"
  }
}

When an authorized reviewer clicks Approve, the runtime resolves this pending decision with a guarded write, reloads the checkpoint, rechecks current state and arguments, and only then attempts the action. If the reviewer clicks Reject, Vega can draft a release note without promoting the model. Checkpoint state is ordinary inspectable data, but emergency changes should still go through an authenticated, versioned, audited path rather than an ad hoc database edit.

Before building a workflow engine, you can test the persistence boundary with ordinary data. The checkpoint below stores a redacted summary and an action hash, but not the raw request message.

persist-minimized-checkpoint.py

from hashlib import sha256
import json

raw_message = "Promote reranker-v17. Ask [email protected] to verify rollout notes."
pending_action = {
    "tool": "promote_model",
    "args": {"model_id": "recommendation-reranker-v17", "traffic_percent": 25},
}
action_bytes = json.dumps(pending_action, sort_keys=True).encode()
checkpoint = {
    "request_summary": "Release assistant proposes 25% traffic for reranker-v17.",
    "pending_action": pending_action,
    "action_hash": f"sha256:{sha256(action_bytes).hexdigest()}",
    "status": "pending",
}
stored = json.dumps(checkpoint)

print("status:", checkpoint["status"])
print("action hash present:", bool(checkpoint["action_hash"]))
print("stored digest length:", len(checkpoint["action_hash"].removeprefix("sha256:")))
print("action hash for display:", checkpoint["action_hash"][:19] + "...")
print("raw message stored:", raw_message in stored)

Output

status: pending
action hash present: True
stored digest length: 64
action hash for display: sha256:53031c1110b3...
raw message stored: False

Compare two popular approaches for implementing this pattern:

Feature	LangGraph (`interrupt`)	Temporal
Best for	Graph-based agents with checkpointed interrupts	Workflow orchestration with timers, retries, and cross-service activities
State Persistence	Durable checkpointer (Postgres, Redis, etc.)	Built-in durable event history
Resume Mechanism	API calls invoking `Command(resume=...)`	Signals or Updates
Operational shape	Agent runtime plus durable checkpointer	Workflow service plus activity workers and message handlers

Using LangGraph checkpointing

LangGraph provides first-class support for this via interrupt(). ^{[2]Reference 2LangGraph Interruptshttps://docs.langchain.com/oss/python/langgraph/interrupts} When you compile the graph with a checkpointer, LangGraph persists checkpoints for the thread and pauses execution until the graph is invoked again with Command(resume=...). ^{[3]Reference 3LangGraph Persistencehttps://docs.langchain.com/oss/python/langgraph/persistence} One subtle detail matters in production: when execution resumes, LangGraph reruns the node from the top, so any code before interrupt() must be idempotent. ^{[2]Reference 2LangGraph Interruptshttps://docs.langchain.com/oss/python/langgraph/interrupts}

using-langgraph-checkpointing.py

from langgraph.graph import StateGraph
from langgraph.checkpoint.postgres import PostgresSaver
from langgraph.types import interrupt, Command
from langchain_core.messages import AIMessage
from typing import TypedDict

Message = dict[str, object]
ToolAction = dict[str, object]

class AgentState(TypedDict):
    messages: list[Message]
    pending_action: ToolAction | None
    approval_status: str | None

def execute_action_node(state: AgentState) -> dict:
    """Execute a tool call, pausing for approval if needed."""
    action = state["pending_action"]
    if action is None:
        return {"messages": [], "pending_action": None, "approval_status": None}

    policy = TOOL_POLICIES.get(action["tool"])
    if policy is None:
        return {
            "messages": [AIMessage(content="Action blocked: tool has no policy.")],
            "pending_action": None,
            "approval_status": "rejected",
        }
    execution_args = action["args"]

    # Check if we need to pause
    if policy and policy.risk_level in (RiskLevel.APPROVE, RiskLevel.ESCALATE):
        # PAUSE execution here.
        # The graph state is automatically saved to Postgres by the checkpointer.
        # The runtime handles the pause; don't catch interrupt() in try/except.
        human_response = interrupt({
            "action": action,
            "risk_level": policy.risk_level.value,
            "reason": f"Agent wants to {action['tool']} with args: {action['args']}",
            "requires_reason": policy.requires_reason,
        })

        # This code runs ONLY after the human resumes execution
        if human_response["decision"] == "reject":
            return {
                "messages": [AIMessage(content="Action was rejected by human reviewer.")],
                "pending_action": None,
                "approval_status": "rejected",
            }

        # An approved edit is a new proposal. Validate it before execution.
        if human_response["decision"] == "modify":
            execution_args = validate_modified_args(
                action["tool"],
                human_response["modified_args"],
                policy=policy,
            )
        else:
            execution_args = validate_current_args(
                action["tool"],
                action["args"],
                policy=policy,
            )
    else:
        execution_args = validate_autonomous_args(
            action["tool"],
            action["args"],
            policy=policy,
        )

    # Bind execution to an idempotency key so retries can't repeat a side effect.
    result = execute_tool(
        action["tool"],
        execution_args,
        idempotency_key=action_idempotency_key(action),
    )

    return {
        "messages": [AIMessage(content=f"Action completed: {result}")],
        "pending_action": None,
        "approval_status": "executed",
    }

# Setup the graph with persistence
graph = StateGraph(AgentState)
graph.add_node("plan", plan_node)
graph.add_node("execute", execute_action_node)
# ... define edges ...

# The checkpointer provides durable HITL state.
# The first time you use a Postgres checkpointer, call setup() to create tables.
with PostgresSaver.from_conn_string("postgresql://...") as checkpointer:
    checkpointer.setup()
    app = graph.compile(checkpointer=checkpointer)

The approval flow

The flow separates the Agent Runtime (the Python application, often powered by an LLM) from the Approval Interface (Web/Slack). In production, they coordinate through persisted state plus a thin resume endpoint, not a long-lived in-memory call stack. The checkpoint figure above is the source of truth for the sequence: pause, persist state, create a versioned approval record, notify the reviewer, validate the decision, reload the checkpoint, execute, and audit.

REST API for approval queue

To build the "Resume API" component, we need an endpoint that looks up the suspended thread and issues a command to resume it. This endpoint takes a thread ID plus a versioned approval decision payload as input, validates the thread's state, and outputs a resume command to awaken the agent with the provided instructions.

rest-api-for-approval-queue.py

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Literal
from datetime import datetime, timezone

app = FastAPI()
JsonDict = dict[str, object]

# Assume langgraph_app is compiled elsewhere
# langgraph_app = graph.compile(...)

class ApprovalDecision(BaseModel):
    approval_id: str
    expected_version: int
    action_hash: str
    decision: Literal["approve", "reject", "modify"]
    reason: str | None = None
    modified_args: JsonDict | None = None

@app.post("/api/approvals/{thread_id}/decide")
async def decide_approval(thread_id: str, decision: ApprovalDecision):
    """
    Human approves, rejects, or modifies the pending action.
    This wakes up the dormant agent.
    """
    # 1. Verify the thread currently has an outstanding interrupt
    config = {"configurable": {"thread_id": thread_id}}
    state = langgraph_app.get_state(config)
    if not any(task.interrupts for task in state.tasks):
        raise HTTPException(400, "Agent isn't waiting for input")

    # 2. Resolve the approval and write its outbox row atomically.
    # The transaction commits both writes or rolls back both writes.
    with approval_store.transaction() as tx:
        approval = load_pending_approval(tx, thread_id, decision.approval_id)
        if approval is None or approval["status"] != "pending":
            raise HTTPException(404, "Approval request not found")
        if approval["version"] != decision.expected_version:
            raise HTTPException(409, "Approval request is stale")
        if approval["action_hash"] != decision.action_hash:
            raise HTTPException(409, "Proposed action changed; request fresh review")
        if approval["expires_at"] <= datetime.now(timezone.utc):
            raise HTTPException(409, "Approval request expired")
        require_reviewer_permission(current_reviewer(), approval["required_role"])

        updated = try_resolve_approval(
            tx,
            approval_id=decision.approval_id,
            expected_version=decision.expected_version,
            expected_action_hash=decision.action_hash,
            next_status="rejected" if decision.decision == "reject" else "authorized",
        )
        if not updated:
            raise HTTPException(409, "Approval was already resolved")

        # 3. Insert, rather than publish, an idempotent resume job in the same transaction.
        # An outbox worker publishes it after commit and retries safely if delivery fails.
        insert_resume_outbox(
            tx,
            thread_id=thread_id,
            approval_id=decision.approval_id,
            resume_payload={
                "approval_id": decision.approval_id,
                "expected_version": decision.expected_version,
                "action_hash": decision.action_hash,
                "decision": decision.decision,
                "reason": decision.reason,
                "modified_args": decision.modified_args,
                "timestamp": datetime.now(timezone.utc).isoformat(),
            },
        )
    return {"status": "decision_recorded"}

The compare-and-swap check records at most one reviewer decision. Store each approval request with fields such as status, version, action_hash, expires_at, required_role, and resolved_at, then resolve it with a guarded update such as WHERE id = ? AND status = 'pending' AND version = ? AND action_hash = ? AND expires_at > now(). The stored and compared action hash must contain the full digest; a shortened form belongs only in the UI. Record authorized separately from executed: an approval may be accepted and then fail during execution. The guarded approval update and outbox insert must commit in one database transaction. A reconciler that detects authorized approvals without an outbox row is an additional defense, not a substitute for atomic persistence. Otherwise, a crash after the approval update but before the outbox insert can strand an authorized action. A durable resume worker must recheck current production state and use an idempotency key so a retry can't promote the same model twice.

This small executable model shows those two separate transitions. It takes a versioned decision and an idempotency key as input, then shows that a stale click is rejected and an execution retry returns the previously recorded outcome.

resolve-once-execute-once.py

from dataclasses import dataclass
from hashlib import sha256

ACTION_HASH = f"sha256:{sha256(b'promote:reranker-v17:25pct').hexdigest()}"

@dataclass
class Approval:
    status: str = "pending"
    version: int = 3
    action_hash: str = ACTION_HASH

completed_effects: dict[str, str] = {}

def record_decision(
    approval: Approval,
    *,
    expected_version: int,
    action_hash: str,
) -> str:
    if approval.status != "pending" or approval.version != expected_version:
        return "blocked: stale decision"
    if approval.action_hash != action_hash:
        return "blocked: action changed"
    approval.status = "authorized"
    approval.version += 1
    return "authorized"

def execute_once(approval: Approval, *, idempotency_key: str) -> str:
    if idempotency_key in completed_effects:
        return completed_effects[idempotency_key]
    if approval.status != "authorized":
        return "blocked: missing authorization"
    result = "promotion recorded once"
    completed_effects[idempotency_key] = result
    approval.status = "executed"
    return result

pending = Approval()
print("decision:", record_decision(
    pending,
    expected_version=3,
    action_hash=ACTION_HASH,
))
print("first execution:", execute_once(pending, idempotency_key="apr_123"))
print("retry:", execute_once(pending, idempotency_key="apr_123"))
print("second click:", record_decision(
    pending,
    expected_version=3,
    action_hash=ACTION_HASH,
))

Output

decision: authorized
first execution: promotion recorded once
retry: promotion recorded once
second click: blocked: stale decision

An expiry guard is separate from version matching. This example takes the current time and expiry time as inputs, then rejects a decision whose review window has already passed.

reject-expired-approval.py

from datetime import datetime, timezone

def decision_allowed(*, expires_at: str, now: datetime) -> str:
    expiry = datetime.fromisoformat(expires_at.replace("Z", "+00:00"))
    return "accepted" if now < expiry else "blocked: approval expired"

clock = datetime(2026, 8, 22, 19, 0, tzinfo=timezone.utc)
print("fresh:", decision_allowed(
    expires_at="2026-08-22T19:01:00Z",
    now=clock,
))
print("stale:", decision_allowed(
    expires_at="2026-08-22T18:59:00Z",
    now=clock,
))

Output

fresh: accepted
stale: blocked: approval expired

Designing the approval interface

A "Yes/No" button is rarely enough for production systems. The approval interface must provide context and control. When Vega asks "Can I route 25% traffic to reranker-v17?", the human needs to know what data will change and why.

The approval UI should show the authorized reviewer the smallest evidence packet needed for the decision: a redacted request summary, the policy trigger, source references the reviewer is allowed to inspect, and the exact arguments Vega proposes to execute. Dumping an entire incident thread or model-card history into every review card creates unnecessary exposure and makes the meaningful change harder to see.

The approval payload

Vega should pause with a structured payload that the UI can render for review. This payload takes authorized evidence and proposed tool arguments as input and structures them into a redacted typed object for the reviewer. For data mutations, this should ideally be a visual diff rather than raw JSON.

the-approval-payload.ts

import { createHash } from "node:crypto";

type JsonValue = null | boolean | number | string | JsonValue[] | { [key: string]: JsonValue };

function canonicalJson(value: JsonValue): string {
  if (Array.isArray(value)) {
    return `[${value.map(canonicalJson).join(",")}]`;
  }
  if (value !== null && typeof value === "object") {
    return `{${Object.keys(value).sort().map(
      (key) => `${JSON.stringify(key)}:${canonicalJson(value[key])}`,
    ).join(",")}}`;
  }
  return JSON.stringify(value) as string;
}

function hashAction(action: ApprovalRequest["action"]): string {
  return `sha256:${createHash("sha256").update(canonicalJson(action)).digest("hex")}`;
}

type ApprovalRequest = {
  id: string;
  agentId: string;
  requestVersion: number;
  actionHash: string;
  expiresAt: string;
  action: {
    tool: "promote_model";
    reason: string;
    policyRule: string;
    args: {
      model_id: string;
      from_stage: "shadow" | "canary";
      to_stage: "canary" | "production";
      traffic_percent: number;
    };
    riskLevel: "HIGH" | "CRITICAL";
  };
  context: {
    evidenceSummary: string;
  };
};

const pendingApprovalAction: ApprovalRequest["action"] = {
  tool: "promote_model",
  reason: "Shadow and canary checks passed release threshold",
  policyRule: "model_release.write_requires_review",
  args: {
    model_id: "recommendation-reranker-v17",
    from_stage: "shadow",
    to_stage: "canary",
    traffic_percent: 25, // Render this as a diff in the UI
  },
  riskLevel: "HIGH",
};

const pendingApproval: ApprovalRequest = {
  id: "apr_123",
  agentId: "agent_42",
  requestVersion: 4,
  actionHash: hashAction(pendingApprovalAction),
  expiresAt: "2026-08-22T19:00:00Z",
  action: pendingApprovalAction,
  context: {
    evidenceSummary: "Eval pass, 30-minute canary clean, rollback plan attached.",
  },
};

Approval packet with frozen evidence, exact mutation diff, reviewer decision, and stale-click guards such as action hash and expiry. — Review packet must freeze intent. Reviewer sees permitted evidence, exact mutation, and guard fields that let backend reject stale clicks.

By rendering this payload, the Human-in-the-Loop UI becomes a useful decision surface. The reviewer can compare Vega's proposed effect with permitted evidence before authorizing the tool call. For complex actions, consider adding a dry-run diff that shows what would happen if the action were approved. Version, expiry, and action hash fields matter too: they let the backend reject stale clicks instead of replaying an approval long after the underlying state changed. The backend stores and compares the full digest even if the card displays an abbreviated label.

The packet builder is also a data-minimization boundary. This executable example takes a deployment note and exact action as input, redacts the email address, and emits a stable hash that binds the review card to the proposed mutation.

build-redacted-review-packet.py

from hashlib import sha256
import json
import re

def build_packet(note: str, action: dict[str, object]) -> dict[str, object]:
    redacted_note = re.sub(r"[\w.+-]+@[\w.-]+", "[email redacted]", note)
    canonical_action = json.dumps(action, sort_keys=True, separators=(",", ":"))
    action_hash = f"sha256:{sha256(canonical_action.encode()).hexdigest()}"
    return {"evidence": redacted_note, "action": action, "action_hash": action_hash}

pending_approval_action = {
    "tool": "promote_model",
    "reason": "Shadow and canary checks passed release threshold",
    "policyRule": "model_release.write_requires_review",
    "args": {
        "model_id": "recommendation-reranker-v17",
        "from_stage": "shadow",
        "to_stage": "canary",
        "traffic_percent": 25,
    },
    "riskLevel": "HIGH",
}
packet = build_packet(
    "Page [email protected] only if canary error budget burns.",
    pending_approval_action,
)
print("evidence:", packet["evidence"])
print("has action hash:", bool(packet["action_hash"]))
print("stored digest length:", len(packet["action_hash"].removeprefix("sha256:")))
print("action hash for display:", packet["action_hash"][:19] + "...")

Output

evidence: Page [email redacted] only if canary error budget burns.
has action hash: True
stored digest length: 64
action hash for display: sha256:04f7578c5e22...

Give the reviewer a concise rationale, the exact tool arguments, a diff, and the policy rule that triggered review. Don't make raw chain-of-thought your approval primitive. Explanations can be unfaithful, and they may reveal internal reasoning or sensitive context you didn't intend to surface. ^{[4]Reference 4Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Promptinghttps://arxiv.org/abs/2305.04388}

Also keep approval state compact. Persist a data-minimized audit record with access controls and retention policy, then inject only structured fields such as decision, reason, modified_args, or a short reviewer summary back into the runtime. Otherwise long-lived threads accumulate reviewer chatter that burns context window on approval metadata instead of task state.

Using Temporal for long-running workflows

For workflows that coordinate several durable activities, timers, retries, and approval messages, Temporal can be a better fit than a hand-rolled checkpoint table. LangGraph checkpoints can also wait durably; the choice isn't only "short wait versus long wait." Temporal's workflow execution records event history and communicates with callers through activities plus message-passing primitives such as Signals and Updates. ^{[5]Reference 5Temporal Workflow Execution Overviewhttps://docs.temporal.io/workflow-execution}

Signals are a good default when the approval service can fire-and-forget. If the UI needs synchronous confirmation that the workflow accepted the decision, or you want the runtime to reject a stale approval before it lands in history, an Update is usually cleaner. ^{[6]Reference 6Temporal Python SDK: Workflow message passinghttps://docs.temporal.io/develop/python/workflows/message-passing}

Temporal also gives you durable state, timers, retries, and event history out of the box. That means you don't have to build the orchestration layer for "pause, notify, wait, resume" yourself.

This approach is useful for multi-step approval chains (for example, waiting for both a support lead and finance sign-off). The workflow can wait without keeping a worker blocked, evaluate partial approvals, and continue waiting for the remaining required decision.

The Temporal workflow below uses an Update for the approval decision because the UI needs to learn whether the action ID and version still match the pending request. The workflow then calls a validation activity before any external side effect.

using-temporal-for-long-running-workflows.py

from temporalio import workflow

ApprovalPayload = dict[str, object]

# Assume activities are defined elsewhere
# plan_actions, notify_human, validate_for_execution, execute_action, is_risky = ...

@workflow.defn
class AgentWorkflow:
    def __init__(self) -> None:
        self._pending_action_id: str | None = None
        self._pending_version: int | None = None
        self._human_decision: ApprovalPayload | None = None

    @workflow.update
    def decide(self, decision: ApprovalPayload) -> str:
        self._human_decision = decision
        return "accepted"

    @decide.validator
    def validate_decision(self, decision: ApprovalPayload) -> None:
        """Reject stale decisions before the Update is accepted."""
        if decision["action_id"] != self._pending_action_id:
            raise ValueError("stale action id")
        if decision["expected_version"] != self._pending_version:
            raise ValueError("stale action version")
        if self._human_decision is not None:
            raise ValueError("decision already recorded")

    @workflow.run
    async def run(self, task: str):
        # Step 1: Plan
        plan = await workflow.execute_activity(plan_actions, task, ...)

        for action in plan.actions:
            if is_risky(action):
                self._pending_action_id = action["id"]
                self._pending_version = action["version"]
                self._human_decision = None

                # Send notification (Slack/Email)
                await workflow.execute_activity(
                    notify_human,
                    {"action_id": action["id"], "action": action},
                    ...,
                )

                # Wait durably until a validated decision Update arrives.
                await workflow.wait_condition(
                    lambda: self._human_decision is not None
                )

                decision = self._human_decision
                self._pending_action_id = None
                self._pending_version = None
                self._human_decision = None

                if decision["decision"] == "reject":
                    continue

                # The reviewer decision isn't enough: validate any modified
                # arguments and current business state in an Activity.
                action = await workflow.execute_activity(
                    validate_for_execution,
                    {"action": action, "decision": decision},
                    ...,
                )

            # Step 2: Execute with an idempotency key carried by the action.
            await workflow.execute_activity(execute_action, action, ...)

The validator rejects stale versions and a second decision while the first accepted Update is waiting to be consumed, so two fast clicks can't overwrite one another.

Keep policy code inside the workflow deterministic. If is_risky() depends on live balances, anomaly services, or a policy database, fetch that data in an Activity first and pass the result into workflow state. Temporal replays workflow code against event history, so non-deterministic logic inside the workflow will break replay. ^{[5]Reference 5Temporal Workflow Execution Overviewhttps://docs.temporal.io/workflow-execution}

If a notification doesn't need a response, a Signal may still be appropriate. Approval buttons need synchronous accept/reject feedback, so an Update with validation is a stronger fit for this decision path. ^{[6]Reference 6Temporal Python SDK: Workflow message passinghttps://docs.temporal.io/develop/python/workflows/message-passing}

For approval chains that can run for months or accumulate a large event history, periodically use Continue-As-New to cap history size while carrying forward unresolved approval state. ^{[5]Reference 5Temporal Workflow Execution Overviewhttps://docs.temporal.io/workflow-execution}

Advanced patterns

Pause and resume is the core mechanism, but production queues also need dynamic escalation, safe modifications, bounded batches, and reviewer-load metrics. These controls reduce needless review work without weakening policy floors.

Dynamic risk escalation

Static policies need runtime context. Publishing one external incident update is already a side effect that may require approval; attempting hundreds of updates in a short interval should move into a stricter queue or be blocked outright. This visual starts from a static policy floor and conditionally escalates the required authorization layer.

The numbered cells match the examples below. A 50% traffic shift moves from APPROVE to ESCALATE; an otherwise autonomous eval read moves to APPROVE after an anomaly spike; and disabling a safety filter stays at ESCALATE even when runtime conditions look quiet. This monotonic rule is the visual form of escalate_risk().

Dynamic policies evaluate context by taking the proposed action and operational metadata as input, applying rule-based checks, and outputting an escalated risk tier if anomalies are detected. This function combines the static base policy with runtime conditions. It checks traffic percentage, recent failure count, and time window, elevating the required authorization level when thresholds are crossed:

dynamic-risk-escalation.py

from dataclasses import dataclass
from enum import Enum
from typing import Literal, cast

class RiskLevel(Enum):
    AUTO = "auto"
    NOTIFY = "notify"
    APPROVE = "approve"
    ESCALATE = "escalate"

@dataclass
class ToolPolicy:
    tool_name: str
    risk_level: RiskLevel

TOOL_POLICIES = {
    "read_eval_run": ToolPolicy("read_eval_run", RiskLevel.AUTO),
    "promote_model": ToolPolicy("promote_model", RiskLevel.APPROVE),
    "disable_safety_filter": ToolPolicy("disable_safety_filter", RiskLevel.ESCALATE),
}

RISK_PRIORITY = {
    RiskLevel.AUTO: 0,
    RiskLevel.NOTIFY: 1,
    RiskLevel.APPROVE: 2,
    RiskLevel.ESCALATE: 3,
}

def escalate_risk(current: RiskLevel, target: RiskLevel) -> RiskLevel:
    return target if RISK_PRIORITY[target] > RISK_PRIORITY[current] else current

ToolArgs = dict[str, float | str | bool]
ToolAction = dict[str, object]
RuntimeContext = dict[str, object]

def is_business_hours(context: RuntimeContext) -> bool:
    hour = int(context.get("local_hour", 12))
    return 8 <= hour < 18

def calculate_dynamic_risk(action: ToolAction, context: RuntimeContext) -> RiskLevel:
    """Vega's risk increases with traffic size, recent anomalies, and time of day."""
    tool_name = str(action["tool"])
    args = cast(ToolArgs, action.get("args", {}))
    base_risk = TOOL_POLICIES[tool_name].risk_level

    # 1. Blast-radius escalation: large traffic shifts are critical
    if float(args.get("traffic_percent", 0)) > 25:
        base_risk = escalate_risk(base_risk, RiskLevel.ESCALATE)

    # 2. Anomaly-based escalation: many recent failures suggest something is wrong
    if context.get("recent_failures", 0) > 3:
        base_risk = escalate_risk(base_risk, RiskLevel.APPROVE)

    # 3. Temporal escalation: off-hours promotions need senior review
    if tool_name == "promote_model" and not is_business_hours(context):
        base_risk = escalate_risk(base_risk, RiskLevel.ESCALATE)

    return base_risk

print("promote 50%:", calculate_dynamic_risk(
    {"tool": "promote_model", "args": {"traffic_percent": 50}},
    {"recent_failures": 0, "local_hour": 14},
).value)
print("read after failures:", calculate_dynamic_risk(
    {"tool": "read_eval_run", "args": {}},
    {"recent_failures": 4, "local_hour": 14},
).value)
print("disable filter:", calculate_dynamic_risk(
    {"tool": "disable_safety_filter", "args": {}},
    {"recent_failures": 0, "local_hour": 10},
).value)
print("off-hours promote:", calculate_dynamic_risk(
    {"tool": "promote_model", "args": {"traffic_percent": 5}},
    {"recent_failures": 0, "local_hour": 2},
).value)

Output

promote 50%: escalate
read after failures: approve
disable filter: escalate
off-hours promote: escalate

Approval with modification

A useful HITL pattern lets the human modify a proposed action rather than merely reject it. Instead of a binary yes/no choice, the human can correct arguments, such as reducing a traffic percentage or changing a rollout stage. That edit is a new proposal, not a guarantee of safety.

Build the approval UI as a form, not a button. Populate the form with the agent's proposed arguments (e.g., email body, SQL query) and allow the human to edit them before hitting "Approve".

For example, if Vega proposes promote_model(model_id="reranker-v17", traffic_percent=50), a human reviewer can modify it to promote_model(model_id="reranker-v17", traffic_percent=10) for a smaller canary. Before execution, the host must validate schema, reviewer permission, applicable policy, action version, and current deployment state. The edit doesn't train Vega and doesn't bypass a stronger approval tier.

Batch approvals

If Vega needs to propose 50 low-risk release-note publications after a migration freeze, asking for separate decisions for each one creates reviewer fatigue. Reviewers who face repetitive requests may stop inspecting individual effects. A batch review can group related proposals without hiding the service, audience, destination, or policy status of each item.

Instead of creating one alert per proposed operation, the orchestrator collects pending actions over a bounded window or groups them by a shared task identifier. The UI can then present: "Vega proposes 50 release-note publishes (review all items)." The approval record must bind to the exact item list or item hashes, so an item inserted after approval can't ride along in the batch.

Implementing batch approvals requires per-item state and idempotency. If a batch contains 50 actions and the reviewer authorizes the fixed set, the orchestrator must track each action separately. If action 42 fails, already-completed publishes aren't automatically undone unless the operation provides a compensation path. Retry only the failed authorized item with its idempotency key, and re-request review if its inputs or destination changed.

Bind a batch approval to the exact items shown to the reviewer. Here a later publish inserted into the queue changes the digest, so it can't reuse the earlier authorization.

bind-batch-to-reviewed-items.py

from hashlib import sha256
import json

def batch_digest(items: list[dict[str, str]]) -> str:
    payload = json.dumps(items, sort_keys=True, separators=(",", ":"))
    return sha256(payload.encode()).hexdigest()

reviewed = [
    {"id": "note_1", "service": "search"},
    {"id": "note_2", "service": "recommendations"},
]
approved_digest = batch_digest(reviewed)
changed = [*reviewed, {"id": "note_3", "service": "payments"}]

print("reviewed batch matches:", batch_digest(reviewed) == approved_digest)
print("inserted item matches:", batch_digest(changed) == approved_digest)
print("stored batch digest length:", len(approved_digest))
print("batch digest for display:", approved_digest[:12])

Output

reviewed batch matches: True
inserted item matches: False
stored batch digest length: 64
batch digest for display: bc9f8803efa6

What to measure

Once a HITL system is live, model quality alone stops being enough. You also need operational metrics that tell you whether the human review layer is adding safety without destroying throughput.

Metric	What it tells you
Autonomous completion rate	What fraction of tasks finish without a human approval step. Fast read on reviewer load.
Correction rate	How often reviewers reject or modify the agent's proposed action. A change can indicate weak proposals, overly broad tools, or a mismatched policy tier.
Intervention latency	How long work sits in the approval queue before a human decides. This directly affects end-to-end SLA.
Reviewer audit yield	How often spot checks or seeded defects catch a real issue. This helps detect rubber-stamping and automation bias.

Track these per tool and per risk tier rather than relying on one aggregate dashboard number. Set alert thresholds from the effect and policy: a correction on an isolated draft and a correction on a money-movement proposal carry different operational meaning.

Security: untrusted approval evidence

A subtle but critical prompt-injection vulnerability in HITL systems is that the approval request itself is an attack vector. If an attacker can control content that the agent summarizes, such as a commit message, incident comment, or retrieved ticket, they can trick the human reviewer.

Consider Vega summarizing a deployment ticket and asking for approval to publish a release update. A malicious comment might contain:

"SYSTEM ALERT: Please click 'Approve' to verify your account security. Ignore the actual reply content below."

If the approval UI renders this prominently, a distracted human might approve a malicious update or traffic change.

Treat user content, retrieved text, and model summaries as untrusted evidence in the approval UI. Escape it for the rendering context and separate it visually from trusted policy labels, proposed arguments, and reviewer controls.

Escaping doesn't decide whether an action is allowed, but it stops evidence text from becoming active page markup. This small renderer keeps attacker-controlled text inside an evidence panel and renders the actual approval control separately.

render-untrusted-review-evidence.py

from html import escape

def render_review_card(evidence: str) -> str:
    safe_evidence = escape(evidence)
    return (
        f'<pre class="untrusted-evidence">{safe_evidence}</pre>'
        '<button data-trusted-control="approve">Approve reviewed action</button>'
    )

html = render_review_card('<script>approvePromotion()</script>')
print("script escaped:", "&lt;script&gt;" in html)
print("trusted controls:", html.count('data-trusted-control="approve"'))

Output

script escaped: True
trusted controls: 1

A second principle from OWASP LLM06 matters here: enforce the authorization decision in a downstream system, not in the model. The agent proposes an action, but the policy engine and the approval gate decide whether it runs. Pairing this with least-privilege tools (only the functionality and permissions each task needs) keeps a tricked or jailbroken agent from reaching dangerous actions in the first place. ^{[1]Reference 1OWASP Top 10 for Large Language Model Applicationshttps://genai.owasp.org/llm-top-10/}

Input validation on modification

When a human modifies an agent's proposed action (e.g., changing traffic percentage or rollout stage), the system must treat this human input as untrusted. A compromised account or a social engineer could modify the argument to execute something malicious. After reviewer authorization, expiry, and action-version checks have passed, this function validates edited arguments against schema and the action-tier policy. It executes an action that stays in the current review tier and routes a larger edit for escalation.

input-validation-on-modification.py

from typing import Literal, cast
from pydantic import BaseModel, Field, ValidationError

RolloutStage = Literal["canary", "ramp", "full"]
ALLOWED_STAGE_TRANSITIONS: dict[str, set[RolloutStage]] = {
    "none": {"canary"},
    "canary": {"canary", "ramp"},
    "ramp": {"ramp", "full"},
    "full": {"full"},
}

class PromotionArgs(BaseModel):
    model_id: str
    traffic_percent: float = Field(ge=0, le=100)
    rollout_stage: RolloutStage

class ToolSpec:
    def __init__(self, args_schema):
        self.args_schema = args_schema

ToolRegistry = {
    "promote_model": ToolSpec(PromotionArgs),
}

class SafetyGuardrails:
    def required_tier(self, tool_name: str, validated_args: BaseModel) -> str:
        if tool_name == "promote_model":
            promotion = cast(PromotionArgs, validated_args)
            if promotion.rollout_stage == "full":
                return "escalate"
            traffic = promotion.traffic_percent
            return "approve" if traffic <= 25 else "escalate"
        return "reject"

safety_guardrails = SafetyGuardrails()

JsonDict = dict[str, object]
ActionState = dict[str, object]
Modification = dict[str, JsonDict]

def reject_action(state: ActionState, reason: str) -> JsonDict:
    return {"status": "rejected", "reason": reason, "state": state}

def execute_tool(tool_name: str, validated_args: BaseModel) -> JsonDict:
    return {
        "status": "executed",
        "tool": tool_name,
        "args": validated_args.model_dump(),
    }

def resume_with_modification(
    state: ActionState,
    modification: Modification,
) -> JsonDict:
    """
    Resume agent after human modified the tool arguments.
    CRITICAL: Re-validate the new arguments against safety policies.
    """
    new_args = modification["new_args"]
    pending_action = cast(JsonDict, state["pending_action"])
    tool_name = cast(str, pending_action["tool"])

    # 1. Syntax Validation (Pydantic)
    try:
        validated_args = ToolRegistry[tool_name].args_schema(**new_args)
    except ValidationError as e:
        return reject_action(state, reason=f"Invalid modification: {e}")

    # 2. Check the requested stage against current deployment state.
    current_stage = cast(str, state["current_rollout_stage"])
    requested_stage = cast(PromotionArgs, validated_args).rollout_stage
    if requested_stage not in ALLOWED_STAGE_TRANSITIONS.get(current_stage, set()):
        return reject_action(
            state,
            reason=f"Invalid rollout transition: {current_stage} -> {requested_stage}",
        )

    # 3. The same policy is applied to the edited arguments.
    required_tier = safety_guardrails.required_tier(tool_name, validated_args)
    if required_tier == "escalate":
        return {"status": "needs_escalation", "args": validated_args.model_dump()}
    if required_tier == "reject":
        return reject_action(state, reason="Modification violated safety policy")

    # 4. Resume execution with safe arguments
    return execute_tool(tool_name, validated_args)

state = {
    "pending_action": {"tool": "promote_model"},
    "current_rollout_stage": "canary",
}

approved = resume_with_modification(
    state,
    {
        "new_args": {
            "model_id": "reranker-v17",
            "traffic_percent": 10,
            "rollout_stage": "canary",
        }
    },
)

escalated = resume_with_modification(
    state,
    {
        "new_args": {
            "model_id": "reranker-v17",
            "traffic_percent": 50,
            "rollout_stage": "canary",
        }
    },
)

print("approved:", approved["status"], approved["args"]["traffic_percent"])
print("larger edit:", escalated["status"])

Output

approved: executed 10.0
larger edit: needs_escalation

Scaling oversight with AI triage

As agent volume grows, human review can become the primary operational bottleneck. A rules engine or reviewer model can help prioritize the queue and reject proposals that violate policy, provided it never turns an approval-required effect into autonomous execution.

This introduces an "AI-in-the-loop" filter that can automatically reject obvious policy violations and fast-path actions already classified as autonomous by explicit policy. The human reviewer is still required for actions at or above the approval floor.

A practical design combines three layers: hard policy rules, anomaly features, and an evaluator that emits a queue score plus rationale. The evaluator shouldn't silently override an approval-marked or critical action. Its job is triage, not final authority. Reviewer decisions may later support evaluation or training, but only after access control, purpose limitation, redaction, label-quality review, and leakage-safe dataset splitting.

Treat the policy tier as a floor. This router takes an explicit policy and a model suggestion as inputs; it lets a draft stay autonomous, but refuses to downgrade a model-promotion proposal or destructive action.

preserve-policy-floor-during-triage.py

RANK = {"auto": 0, "approve": 1, "escalate": 2}
POLICY_FLOOR = {
    "draft_release_note": "auto",
    "promote_model": "approve",
    "delete_eval_evidence": "escalate",
}

def routed_tier(tool: str, evaluator_suggestion: str) -> str:
    floor = POLICY_FLOOR[tool]
    if RANK[evaluator_suggestion] < RANK[floor]:
        return floor
    return evaluator_suggestion

print("draft:", routed_tier("draft_release_note", "auto"))
print("promotion suggested auto:", routed_tier("promote_model", "auto"))
print("delete suggested approve:", routed_tier("delete_eval_evidence", "approve"))

Output

draft: auto
promotion suggested auto: approve
delete suggested approve: escalate

That human approval record is also part of your governance story. NIST AI RMF frames governance, measurement, and operational controls as lifecycle responsibilities across design, deployment, and management. ^{[7]Reference 7Artificial Intelligence Risk Management Framework (AI RMF 1.0)https://www.nist.gov/itl/ai-risk-management-framework} If a deployment is classified as a high-risk AI system under the EU AI Act, Article 14 requires effective human oversight proportionate to its risk, including abilities related to understanding limitations, automation bias, interpretation of output, and intervention or stopping the system. ^{[8]Reference 8EU AI Act: Regulation laying down harmonised rules on artificial intelligencehttps://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689} For any consequential workflow, retain authorized, data-minimized evidence of who decided, which version and effect were reviewed, what executed, and why the outcome was recorded.

Practice: design Vega's weekend policy

Suppose Vega runs unattended on Saturday night during a migration freeze. Design a dynamic risk policy with these rules:

Automatically perform authorized evaluation reads and deployment-health checks
Require approval for every model-promotion proposal
Escalate any promotion over 10% traffic to a senior reviewer
Escalate any proposal that disables a required safety filter
Auto-reject any request after 2:00 AM if the same service already had two failed rollouts in the past hour

Write the ToolPolicy table and the calculate_dynamic_risk function. Then identify the edge case: what happens if Vega proposes three 9% traffic shifts in one hour to bypass the senior-review threshold?

This is a split-transaction attack. Dynamic policies must look at aggregate service behavior, rather than single action size. A useful fix is a rolling-sum check: if one service accumulates more than 10% new traffic within 24 hours, escalate regardless of individual step size.

HITL controls to keep

Categorize every tool into Auto, Notify, Approve, or Escalate.
Use the Checkpoint/Resume pattern (LangGraph, Temporal) so approvals can be asynchronous and survive restarts.
Allow reviewers to edit proposals, then validate the edited action again before execution.
Escalate risk based on velocity, time of day, and dollar amounts.
Group repetitive actions to prevent alert fatigue.

The point of HITL isn't to keep humans clicking "Approve" forever. It's to put human judgment where failure is expensive or irreversible, while keeping policy-required gates in place even as lower-risk automation improves.

A safe HITL design has four moving pieces: risk classification, durable checkpoints, compare-and-swap resume semantics, and approval UIs that give reviewers real context. The common traps are in-memory blocking, stale approvals, alert fatigue, and prompt injection through the approval message itself.

The next step is applying those approval patterns to AI-assisted software work. Coding agents can create useful patches, but they need the same risk tiers, review checks, data-minimized audit evidence, and human ownership before their changes reach a repository or deployment pipeline.

Mastery check

Key concepts

HITL, HOTL, and HOOTL oversight modes
Risk tiers: Auto, Notify, Approve, and Escalate
Durable checkpoint and resume instead of in-memory waiting
Approval packets, compare-and-swap guards, and stale-click protection
Dynamic escalation, safe modification, and reviewer throughput metrics

Evaluation rubric

Picks the right oversight mode for each tool action instead of blocking everything
Explains why durable execution and guarded resume writes are required
Designs reviewer UX with context, diffs, expiry, and validation
Balances safety and throughput with escalation logic, batching, and metrics

Follow-up questions

Common pitfalls

Symptom: Review queue grows and reviewers start blindly approving requests.
Cause: Everything is routed through the same approval path, including safe or repetitive work.
Fix: Add risk tiers, bounded batches, and triage while keeping pre-execution review for effects that policy requires humans to authorize.
Symptom: Approved actions disappear after deploy or restart.
Cause: Approval state was stored in memory or tied to a live request thread.
Fix: Persist checkpoints and approval rows durably, then resume from stored state instead of sleeping a worker.
Symptom: Two reviewers approve same request and agent runs action twice.
Cause: Approval resolution did not use version checks or compare-and-swap semantics.
Fix: Guard writes on status and version, then reject stale clicks on resume.
Symptom: Reviewer edits create unsafe tool arguments even though model proposal looked safe.
Cause: Human modifications were trusted without schema, authorization, or business-rule validation.
Fix: Re-validate modified arguments exactly like model-generated arguments before execution.
Symptom: Approval UI becomes a prompt-injection surface.
Cause: Attacker-controlled text or raw model summaries are rendered like trusted system instructions.
Fix: Separate untrusted content visually, show structured diffs, and keep authorization logic downstream from the model.

Next Step

Continue to AI Coding Workflow with Agents

Human-in-the-loop gives you the risk classification, durable checkpoints, and approval gates. The next article shows how to apply those same patterns to AI-assisted software development: scoping coding tasks, running agents inside branches with tests, and requiring human review on risky changes before they reach a repository.

PreviousComputer-Use / GUI / Browser Agents

Share this article

X Facebook LinkedIn Bluesky Reddit Hacker News Email

References

OWASP Top 10 for Large Language Model Applications

OWASP Foundation · 2025

LangGraph Interrupts

LangChain · 2024

LangGraph Persistence

LangChain · 2026

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman · 2023

Temporal Workflow Execution Overview

Temporal Technologies · 2026

Temporal Python SDK: Workflow message passing

Temporal Technologies · 2024

Artificial Intelligence Risk Management Framework (AI RMF 1.0)

National Institute of Standards and Technology · 2023

EU AI Act: Regulation laying down harmonised rules on artificial intelligence

European Parliament and Council of the European Union · 2024

Discussion

Questions and insights from fellow learners.

Discussion loads when you reach this section.

Human-in-the-Loop Agent Architecture

The three pillars of human-agent interaction

Human-in-the-Loop (HITL): Active gatekeeping

Human-on-the-Loop (HOTL): Supervisory control

Human-out-of-the-Loop (HOOTL): Full autonomy

What is the difference between HITL, HOTL, and HOOTL?

The trust spectrum

Risk classification system

Why should a HITL policy classify tool calls by risk tier instead of asking for approval on everything?

Why is the tool name alone not enough to decide whether approval is required?

Architecture: the Checkpoint/Resume pattern

Why in-memory blocking fails

Why can't a production HITL agent wait with time.sleep() or an open Python thread?

What should be present in a useful approval checkpoint?

Using LangGraph checkpointing

Why must code before interrupt() be idempotent in LangGraph?

The approval flow

REST API for approval queue

Why does the approval endpoint need compare-and-swap semantics?

Designing the approval interface

The approval payload

What should an approval UI show beyond Approve and Reject buttons?

Why shouldn't raw chain-of-thought be the main approval artifact?

Using Temporal for long-running workflows

When should a Temporal approval use an Update instead of a Signal?

Advanced patterns

Dynamic risk escalation

Why should dynamic risk logic promote requests to stricter tiers rather than silently downgrade them?

Approval with modification

Why is "approve with modification" more useful than a binary approval?

Batch approvals

What risk do batch approvals reduce, and what new execution problem do they introduce?

What to measure

What do correction rate and intervention latency tell you about a HITL system?

Security: untrusted approval evidence

Why is the approval request itself a prompt-injection surface?

Input validation on modification

Why validate human modifications as if they were model-generated arguments?

Scaling oversight with AI triage

What should an AI reviewer model be allowed to do in a HITL queue?

Practice: design Vega's weekend policy

Why does a threshold policy need rolling-window checks?

HITL controls to keep

Mastery check

Key concepts

Evaluation rubric

Follow-up questions

How does HITL handle stale approvals where the underlying state changes while waiting for a human?

Why is in-memory suspension insufficient for production HITL systems?

What is the difference between Human-in-the-Loop and Human-on-the-Loop?

How do you prevent alert fatigue at high volume?

What are the security implications of allowing humans to modify tool arguments?

Which metrics show whether HITL improves safety without becoming a bottleneck?

Common pitfalls

Mastery Check

Discussion