Scope coding-agent tasks, isolate execution, keep patches on branches, verify behavior, and preserve human merge ownership.
Human-in-the-loop systems pause agents before high-risk actions. A coding agent creates two kinds of risk: it can run commands during its work, and it can propose source changes that become consequential once reviewed, merged, or deployed. A small patch can alter auth, expose a secret, break billing, or change deployment behavior.
AI coding agents can still write a pull request while you work on something else.
That sounds useful until the diff touches the wrong files, skips the failing test, updates a dependency for no reason, and hides a security problem behind a confident summary.
Adding an agent means adding an automated code-generation worker to your engineering workflow. If you tell it to "fix the billing webhook" and leave the task unbounded, it may return code that looks plausible but changes the wrong behavior. The engineering skill isn't "ask for code and trust the summary." It's designing a workflow where the agent can help without bypassing execution, review, or deployment controls.
Code Generation & Sandboxing and Human-in-the-Loop Agent Architecture covered the two controls this workflow needs: generated code runs in isolation, and risky agent actions require approval gates. Here, you'll apply those controls to an isolated workspace, branch, test, review, and deploy loop.
The workflow below is the operating model for the rest of the article: the agent works in a restricted execution environment and writes only to a review branch. Tests, redacted evidence, review, and merge ownership remain explicit.
First, separate two things people often blur together. A coding assistant suggests code as you type, inside your editor. A coding agent takes a scoped task, reads the repository, plans changes, edits files, runs commands, and produces a branch or pull request with little keystroke-by-keystroke input from you. The agent loop needs the strongest controls because it can change files and run commands.
A coding agent, then, is an AI system that can inspect a repository, plan changes, edit files, run commands, and produce a branch or patch.
Modern tools expose this across several surfaces, and their features change quickly. GitHub documents Copilot coding agent as working in an ephemeral GitHub Actions-powered environment where it can make changes, execute tests and linters, and open a pull request for review.[1] Anthropic's Claude Code docs describe an agentic tool that reads a codebase, edits files, and runs commands across terminal and other surfaces.[2] OpenAI's Codex app documentation describes local, worktree, and cloud task modes, including isolated worktrees for parallel tasks.[3] Product surfaces differ; the stable lesson is to control execution, bound diffs, collect evidence, and keep merge authority outside the agent.
A useful workflow is plan-then-edit: let the agent inspect the necessary code, ask it for a bounded plan, correct that plan, and only then authorize an edit pass. A plan isn't proof of correctness, but catching the wrong approach before a large diff is much cheaper than untangling unrelated changes afterward.
The core loop is familiar from agent design. In coding work, every transition creates a reviewable artifact.
An agent can move through many steps, but the human still owns the merge.
A coding agent doesn't write code in a single shot the way a human might type an email. It works in a loop: read the problem, plan a step, take an action, observe the result, and decide what to do next.
This loop is ReAct (Reason + Act).[4] The agent might read a test file, notice the failure is in a validation function, edit the function, run the tests again, see a new error, and fix that too. Each round gives the agent local feedback, which is why bounded tasks with fast tests work so well.
The thing that closes this loop is a verifier: an automatic check the agent can run after each edit to decide whether it got closer. In coding work, tests, type checkers, and linters are the cheapest, most reliable verifiers you have. A failing test that the agent can rerun turns a vague "is this right?" into a concrete pass or fail signal. No verifier means the agent is guessing, and you're reviewing guesses.
A verifier loop also needs a stopping rule. This tiny simulation takes successive test outcomes as input and returns either the first verified patch or an escalation after its repair budget is exhausted.
1def run_repair_loop(test_results: list[str], max_attempts: int = 3) -> str:
2 for attempt, result in enumerate(test_results[:max_attempts], start=1):
3 if result == "pass":
4 return f"verified on attempt {attempt}"
5 return "stop: request human diagnosis"
6
7print("recovering patch:", run_repair_loop(["fail", "pass"]))
8print("stuck patch:", run_repair_loop(["fail", "fail", "fail", "pass"]))1recovering patch: verified on attempt 2
2stuck patch: stop: request human diagnosisA second pattern, Reflection, lets an agent critique its own draft before showing it to you.[5] After producing a diff, the agent can run a second pass to check for security risks, missing tests, or style violations. That's the difference between a single oversized agent that tries to do everything at once and a workflow where separate passes handle exploration, implementation, review, and verification.
Modern agents pull repository context through tools like the Model Context Protocol (MCP), which lets them query files, documentation, and APIs without you copy-pasting text into a chat window.[6] The shift from writing better prompts to giving the agent better data is called context engineering. A well-scoped task with precise file ownership and a clean test suite gives the agent far more signal than a giant prompt full of instructions.
Bad task:
Improve the app and clean up anything you find.
That prompt invites scope drift.
Better task:
Fix the account settings save bug.
Scope:
- Own files under
web/src/app/settings/andweb/src/lib/settings.ts.- Don't edit auth, billing, deployment, or unrelated styling.
- Work without production credentials or external side effects.
Acceptance:
- Reproduce failing save behavior with a test.
- Fix the bug.
- Run
pnpm test --run settingsandpnpm lint.- Summarize files changed and remaining risk.
Execution:
- Use the restricted project runner.
- Don't install dependencies, run deploys, or call production systems.
That prompt gives the agent a useful box.
| Prompt field | Why it helps |
|---|---|
| Goal | Defines the behavior and the code area. |
| Scope | Limits file ownership and blast radius. |
| Non-goals | Prevents surprise cleanup. |
| Acceptance checks | Gives the agent a finish line. |
| Commands | Makes verification explicit. |
| Execution policy | Prevents the patch task from gaining unrelated capabilities. |
| Risk notes | Tells review where to look. |
Bounded tasks reduce merge conflicts and make review possible.
Don't use one vague agent pass for everything.
Use roles with clear ownership boundaries:
You define the database schema and API contracts first, then assign a bounded worker to fill in the implementation while you review every change at each step. If you hand an agent a vague task like "Build me an access-control service" and disappear, it's likely to fail.
| Role | Good use | Bad use |
|---|---|---|
| Explorer | Read code and answer one question. | Making broad edits without ownership. |
| Worker | Implement a bounded change. | Changing architecture without review. |
| Reviewer | Find bugs, risks, and missing tests. | Rubber-stamping its own diff. |
| Verifier | Run checks and inspect output. | Treating command success as product success. |
This separation matters because agents are good at producing plausible work. A reviewer agent shouldn't be asked to defend the same implementation it just wrote.
Even when one tool provides one interface, you can still run the workflow in passes:
A controlled workflow separates two boundaries: a restricted environment for commands executed while producing a patch, and version control for reviewing the proposed source changes.
The branch is an integration boundary. It makes proposed changes inspectable before they enter a protected branch.
The review wrapper provides:
It doesn't stop an agent command from reading local credentials, making network calls, installing a dependency, or altering external infrastructure. That requires a restricted runner: an ephemeral workspace, least-privilege credentials, constrained network and command policy, and explicit review before privileged operations. If an agent edits directly on main, you also lose much of the integration boundary.
The task contract should state what the agent may execute as well as what it may edit. For a billing webhook test fix, an agent may need to run unit tests and read local source files; it doesn't need production secrets, package publishing, infrastructure changes, or outbound calls to production systems.
| Surface | Default rule for a bounded patch |
|---|---|
| Filesystem | Write only to working tree or worktree for assigned branch. |
| Credentials | Provide no production tokens; use fixtures or scoped test credentials only when needed. |
| Network | Deny by default or allow only approved package/test endpoints. |
| Commands | Allow repository test, lint, and format commands; review installs, migrations, and generated scripts. |
| External effects | Block deploys, releases, emails, payment writes, and infrastructure mutations. |
This admission check is intentionally small. It takes an argument vector rather than an interpolated shell string, then admits only commands named in the task contract. In a real runner, execute the approved argument vector without shell concatenation and keep the operating-system sandbox in place.
1CONTRACT_COMMANDS = {
2 ("pnpm", "test", "--run", "billing"),
3 ("pnpm", "lint"),
4 ("pnpm", "exec", "tsc", "--noEmit"),
5 ("git", "diff"),
6}
7BLOCKED_PROGRAMS = {
8 "curl", "npm", "pip", "publish", "terraform", "uv",
9}
10
11def admit_command(argv: tuple[str, ...], available_secrets: tuple[str, ...]) -> str:
12 if available_secrets:
13 return "blocked: credentials mounted"
14 if not argv:
15 return "blocked: empty command"
16 program = argv[0].removeprefix("./")
17 if program in BLOCKED_PROGRAMS:
18 return "blocked: privileged or expanding command"
19 if argv in CONTRACT_COMMANDS:
20 return "allowed in restricted runner"
21 return "needs explicit review"
22
23print("test:", admit_command(("pnpm", "test", "--run", "billing"), ()))
24print("infra:", admit_command(("terraform", "apply"), ()))
25print("secreted test:", admit_command(("pnpm", "test", "--run", "billing"), ("PROD_API_KEY",)))1test: allowed in restricted runner
2infra: blocked: privileged or expanding command
3secreted test: blocked: credentials mountedFor parallel agent work, write ownership down.
1Worker A owns:
2- `web/src/app/settings/page.tsx`
3- `web/src/lib/settings.ts`
4- `web/src/lib/settings.test.ts`
5
6Worker B owns:
7- `web/src/app/billing/page.tsx`
8- `web/src/lib/billing.ts`
9- `web/src/lib/billing.test.ts`
10
11Both workers:
12- Must not edit shared auth middleware.
13- Must not change package versions.
14- Must run targeted tests before reporting done.This style prevents agents from overwriting each other and makes it obvious when a diff goes outside scope.
Ownership can be checked before two patches are combined. This example takes proposed changed files from two workers and detects the shared file that neither handoff resolved.
1worker_a = {"web/src/lib/settings.ts", "web/src/lib/settings.test.ts"}
2worker_b = {"web/src/lib/billing.ts", "web/src/lib/settings.ts"}
3
4overlap = sorted(worker_a & worker_b)
5print("merge status:", "blocked" if overlap else "ready")
6print("overlap:", ", ".join(overlap) if overlap else "none")1merge status: blocked
2overlap: web/src/lib/settings.tsAgent-generated code often looks correct.
That doesn't mean behavior is correct.
Benchmarks such as SWE-bench evaluate agents on real GitHub issues because repository work is full of hidden context, failing tests, and edge cases that don't appear in isolated coding tasks.[7] SWE-agent and Agentless both study how agent-computer interfaces and repair strategies affect software engineering performance on real tasks.[8][9]
Read SWE-bench Verified numbers carefully. Verified tasks come from public GitHub history, which creates contamination risk as code enters training sets. SWE-bench Pro includes substantially different, longer-horizon tasks and a private-repository subset intended to reduce that risk.[10] OpenAI reported in February 2026 that Verified was no longer a sound frontier evaluation for its models and recommended SWE-bench Pro for more informative measurement.[11] The practical reading is stable: a benchmark score isn't evidence that an agent fixed your bug. Your own failing test is.
For day-to-day engineering, the lesson is simple: require evidence.
| Change type | Minimum evidence |
|---|---|
| Bug fix | Repro test fails before and passes after. |
| Feature | Tests cover normal path and error path. |
| Refactor | Existing tests pass and behavior diff is explained. |
| UI change | Screenshot or Playwright smoke when practical. |
| API change | Contract test or curl example. |
| Security change | Threat-specific test or manual review note. |
The agent's summary isn't evidence.
The diff, tests, redacted logs, screenshots, and reproduced behavior are evidence.
A reproduction should demonstrate the reported defect before the patch and the intended invariant after it. Here the same billing webhook retry creates two invoices before idempotency handling and one after it.
1events = ["billing_event_78291", "billing_event_78291"]
2
3def create_without_idempotency(retries: list[str]) -> list[str]:
4 return [f"invoice_for_{retry}" for retry in retries]
5
6def create_with_idempotency(retries: list[str]) -> list[str]:
7 return [f"invoice_for_{retry}" for retry in dict.fromkeys(retries)]
8
9print("before patch invoices:", len(create_without_idempotency(events)))
10print("after patch invoices:", len(create_with_idempotency(events)))1before patch invoices: 2
2after patch invoices: 1Command output is useful only after it's safe to attach to a pull request. This redactor takes captured test output and removes a credential value before the evidence packet is shown to reviewers.
1import re
2
3def redact_output(output: str) -> str:
4 return re.sub(r"PROD_API_KEY=\S+", "PROD_API_KEY=[redacted]", output)
5
6captured = "billing webhook retry failed PROD_API_KEY=secret-abc123 webhook_id=78291"
7review_output = redact_output(captured)
8print(review_output)
9print("secret visible:", "secret-abc123" in review_output)1billing webhook retry failed PROD_API_KEY=[redacted] webhook_id=78291
2secret visible: FalseThis example catches one known credential shape for teaching purposes. Production evidence pipelines should redact against the secrets mounted in the runner, apply a secret scanner, and cap attached output before it reaches a pull request.
Start review with these questions:
Then inspect dangerous patterns.
| Pattern | Risk |
|---|---|
eval, shell commands, dynamic imports | Code execution risk. |
| New dependency | Supply chain and bundle risk. |
| Broad catch block | Hidden failures. |
| Silent fallback | Product behavior drifts without alerting. |
| Secret in code or captured output | Data exposure. |
| Tests that only assert rendering | Behavior may still be broken. |
OWASP's LLM security work identifies prompt injection and sensitive information disclosure as major categories for LLM applications.[12] Coding agents add another angle: untrusted issue text, docs, comments, and test data may influence code changes or attempt to make the agent execute a command. Treat repository text as evidence about the task, not authority to broaden permissions, expose secrets, or run privileged operations.
Instructions discovered inside a repository don't expand the task's authority. The authority check distinguishes a trusted task contract from a README instruction that attempts to authorize a privileged command.
1TASK_COMMANDS = {"pnpm test --run billing", "pnpm lint"}
2
3def command_decision(command: str, source: str) -> str:
4 if source != "task_contract":
5 return "blocked: untrusted repository instruction"
6 return "allowed" if command in TASK_COMMANDS else "blocked: outside contract"
7
8print("task test:", command_decision("pnpm test --run billing", "task_contract"))
9print("README infra:", command_decision("terraform apply", "README.md"))1task test: allowed
2README infra: blocked: untrusted repository instructionUse this checklist on every agent-generated PR:
| Check | Pass condition |
|---|---|
| Scope | Changed files match task ownership. |
| Behavior | Diff directly maps to acceptance criteria. |
| Tests | At least one test would fail before fix. |
| Commands | Reported checks ran. |
| Security | No unsafe shell, file, dependency, or secret handling. |
| Observability | Logs and errors remain useful. |
| Simplicity | No broad rewrites unrelated to task. |
| Rollback | Reverting the PR returns to previous behavior. |
If a PR fails the scope check, stop early.
Don't spend an hour reviewing unrelated churn. Ask for a smaller diff.
Review intensity can be computed from proposed paths before a reviewer opens every line. This helper takes changed files as input and routes payment, auth, deployment, and workflow edits to focused review.
1SENSITIVE = ("auth", "billing", "billing/webhook", ".github/workflows", "deploy")
2
3def review_route(paths: tuple[str, ...]) -> str:
4 touched = [path for path in paths if any(part in path for part in SENSITIVE)]
5 return "focused security review" if touched else "standard review"
6
7print("settings only:", review_route(("web/src/app/settings/page.tsx",)))
8print("payment edit:", review_route(("web/src/billing/webhook/create.ts",)))1settings only: standard review
2payment edit: focused security reviewYou can encode part of this review check as a small policy rule before a human starts deep review. The check isn't a replacement for judgment; it catches obvious scope drift and weak evidence so reviewers don't waste time on unreviewable patches.
1from dataclasses import dataclass
2
3SENSITIVE_MARKERS = ("/auth/", "/billing/", "/deploy/", ".github/workflows/")
4DEPENDENCY_FILES = ("package.json", "pnpm-lock.yaml", "uv.lock")
5
6@dataclass
7class AgentTask:
8 owned_paths: tuple[str, ...]
9 requires_repro_test: bool = True
10
11@dataclass
12class PullRequestEvidence:
13 changed_files: tuple[str, ...]
14 added_tests: tuple[str, ...]
15 reported_commands: tuple[str, ...]
16
17def review_gate(task: AgentTask, evidence: PullRequestEvidence) -> list[str]:
18 findings: list[str] = []
19
20 def is_owned(path: str) -> bool:
21 return any(
22 path.startswith(owned) if owned.endswith("/") else path == owned
23 for owned in task.owned_paths
24 )
25
26 out_of_scope = [
27 path for path in evidence.changed_files
28 if not is_owned(path)
29 ]
30 if out_of_scope:
31 findings.append(f"scope_drift: {', '.join(out_of_scope)}")
32
33 sensitive = [
34 path for path in evidence.changed_files
35 if any(marker in f"/{path}" for marker in SENSITIVE_MARKERS)
36 ]
37 if sensitive:
38 findings.append(f"sensitive_area: {', '.join(sensitive)}")
39
40 dependencies = [
41 path for path in evidence.changed_files
42 if path.rsplit("/", 1)[-1] in DEPENDENCY_FILES
43 ]
44 if dependencies:
45 findings.append(f"dependency_change: {', '.join(dependencies)}")
46
47 if task.requires_repro_test and not evidence.added_tests:
48 findings.append("missing_repro_test")
49
50 if not evidence.reported_commands:
51 findings.append("missing_verification_commands")
52
53 return findings
54
55task = AgentTask(owned_paths=(
56 "web/src/app/settings/",
57 "web/src/lib/settings.ts",
58 "web/src/lib/settings.test.ts",
59))
60pr = PullRequestEvidence(
61 changed_files=(
62 "web/src/app/settings/page.tsx",
63 "web/src/lib/settings.ts",
64 "auth/middleware.ts",
65 "pnpm-lock.yaml",
66 ),
67 added_tests=(),
68 reported_commands=("pnpm test --run settings",),
69)
70
71findings = review_gate(task, pr)
72print("BLOCK" if findings else "PASS")
73for finding in findings:
74 print("-", finding)1BLOCK
2- scope_drift: auth/middleware.ts, pnpm-lock.yaml
3- sensitive_area: auth/middleware.ts
4- dependency_change: pnpm-lock.yaml
5- missing_repro_testThis tiny checker catches the same issue a senior reviewer would spot first: the settings fix may be real, but the PR also touched auth and the lockfile without ownership. A lockfile change isn't automatically unsafe, but it requires an intentional dependency review rather than hitching a ride in a settings fix. The next action is extraction or rescoping, not merge.
Agents are strongest when the task has clear boundaries and local feedback.
Good uses:
Risky uses:
Feedback changes the work.
If an agent can run a test and see whether it got closer, it has a useful loop. If the task needs product judgment, legal judgment, security judgment, or architecture taste, the agent can assist, but it shouldn't decide alone.
A multi-agent refactor should start with a messy but functional billing-webhook retry module, then run it through separate passes.
Step A - Exploration: Ask an explorer agent to read the module and list the highest-risk maintainability issues. It reports duplicated idempotency checks, one oversized handler, and missing error handling for payment-provider timeouts.
Step B - Architecture: Ask an architect agent to propose a modular structure. It suggests splitting the module into validateWebhook.ts, loadInvoiceState.ts, and handleProviderTimeout.ts, with clear interfaces between them.
Step C - Implementation: Ask a worker agent to execute the split, one file at a time, keeping existing tests green. It produces small commits with descriptive messages.
Step D - Verification: Ask a verifier agent to run the targeted tests, review the diff against the original behavior contract, and confirm the timeout failure path is now tested. It reports the exact commands and changed behavior instead of a confidence summary.
Each pass has a different role, a bounded scope, and a clear handoff. The human reviews the final diff before anything merges.
Start with an issue:
Users sometimes see duplicate invoices after replaying a billing webhook.
A weak agent task says:
Fix duplicate invoices.
The agent might add a frontend debounce. That may reduce clicks but won't fix duplicate server-side side effects.
A strong task says:
Investigate duplicate invoice creation in billing webhook retry flow.
Scope:
- Read billing webhook handler, invoice creation, and retry tests.
- Don't edit payment provider config.
- First report likely cause and proposed fix.
- Then add a failing test for duplicate retry.
- Fix with idempotency key on server-side invoice creation.
- Run billing webhook tests.
This task guides the agent toward the real class of bug: idempotency.
It also splits investigation from implementation. The human can stop a wrong plan before code changes land.
The implementation invariant is small enough to execute locally. Invoice creation accepts a webhook idempotency key and returns the existing invoice for a retry instead of inserting a second side effect.
1class InvoiceStore:
2 def __init__(self) -> None:
3 self.by_webhook_key: dict[str, str] = {}
4
5 def create_once(self, webhook_key: str) -> str:
6 if webhook_key not in self.by_webhook_key:
7 next_id = len(self.by_webhook_key) + 1
8 self.by_webhook_key[webhook_key] = f"inv_{next_id}"
9 return self.by_webhook_key[webhook_key]
10
11store = InvoiceStore()
12first = store.create_once("webhook_78291")
13retry = store.create_once("webhook_78291")
14print("first:", first)
15print("retry returns same invoice:", retry == first)
16print("invoice count:", len(store.by_webhook_key))1first: inv_1
2retry returns same invoice: True
3invoice count: 1Teams should write down how AI coding agents are allowed to work.
Minimum policy:
This isn't anti-agent.
It's how agents become useful inside real engineering teams.
A mechanical gate can reject missing evidence, but it must not perform the merge. This example returns a candidate for human review only when branch, sandbox, tests, and sensitive-review requirements are satisfied.
1from dataclasses import dataclass
2
3@dataclass
4class Evidence:
5 on_review_branch: bool
6 restricted_runner: bool
7 repro_test_passed: bool
8 touches_sensitive_area: bool
9 focused_review_complete: bool
10
11def merge_candidate(evidence: Evidence) -> str:
12 if not evidence.on_review_branch or not evidence.restricted_runner:
13 return "blocked: missing boundary"
14 if not evidence.repro_test_passed:
15 return "blocked: missing behavior evidence"
16 if evidence.touches_sensitive_area and not evidence.focused_review_complete:
17 return "blocked: focused review required"
18 return "candidate for human merge decision"
19
20print("settings fix:", merge_candidate(Evidence(True, True, True, False, False)))
21print("auth edit:", merge_candidate(Evidence(True, True, True, True, False)))1settings fix: candidate for human merge decision
2auth edit: blocked: focused review requiredYou're reviewing an agent PR. It claims:
Implemented requested settings fix. Tests pass.
You see:
What do you do?
Strong answer:
The settings fix may be real, but the PR isn't reviewable as-is.
Answer every question, then check your score. Score above 75% to mark this lesson complete.
8 questions remaining.
GitHub Copilot cloud agent
GitHub · 2026
Claude Code overview
Anthropic · 2026
Features - Codex app
OpenAI · 2026
ReAct: Synergizing Reasoning and Acting in Language Models.
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. · 2023 · ICLR 2023
Reflexion: Language Agents with Verbal Reinforcement Learning.
Shinn, N., et al. · 2023
Introducing the Model Context Protocol
Anthropic · 2024
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Jimenez et al. · 2024 · ICLR 2024
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
Yang et al. · 2024
Agentless: Demystifying LLM-based Software Engineering Agents
Xia et al. · 2024
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
Scale AI · 2025
Why SWE-bench Verified no longer measures frontier coding capabilities
OpenAI · 2026
OWASP Top 10 for Large Language Model Applications
OWASP Foundation · 2025