Method
The Two-Pass Review Workflow
Last updated: 2026-07-024 min read
The two-pass review workflow splits review into a machine pass and a human pass: automated verification per change first – types, tests, boundaries, spec comparison, enforced in CI – then human review on the pre-verified diff, focused on architecture and business logic. Machines take the decidable questions; humans keep the judgment and the merge decision.
Contents
Why one pass stopped working
Review used to be one activity because one person could carry both jobs: checking correctness and judging quality. AI volume split those jobs apart. Telemetry across 10,000+ developers shows high-AI teams merging ~98% more PRs with review time up 91%, and 38% of developers finding AI code harder to review than a colleague’s. When the same scarce minutes must carry style nits, type errors, scope checks, and architecture judgment, the mechanical work crowds out the judgment – which is the part humans were actually needed for.
JetBrains’ engineering blog states the fix bluntly: stop sending machine-catchable errors to human review at all. The two-pass workflow is that principle, systematized.
Pass one: the machine pre-check
Pass one runs on every push, as a required status check – nothing reaches a human that failed it:
- Deterministic gates. Formatting, lint, types, build, and the test suite – including tests written before the run, so the generating model did not author its own judge.
- Boundary check. The diff compared against the declared task boundaries – files and behavior that must not change. Off-scope edits fail the pass.
- Spec comparison. The spec-vs-implementation check against the task’s acceptance criteria, with the outcome recorded.
- Optional: AI pre-review as hypothesis filter. An AI reviewer flags likely issues across the diff. Cloudflare’s orchestration shows this works at scale – as a screening layer whose output humans act on, never as the verdict.
Pass two: the human review, upgraded
The reviewer receives the diff plus the pass-one results: what was checked, what passed, what was skipped. The job changes from reconstructing intent to exercising judgment – is this the right approach, does it fit the architecture, should this exist at all. The clean division:
| Concern | Pass 1 – machine | Pass 2 – human |
|---|---|---|
| Formatting, lint, types, build | Enforced gate | Never sees it |
| Test results (incl. pre-written tests) | Enforced gate | Reads the summary |
| Scope vs. declared boundaries | Enforced gate | Judges intent of allowed changes |
| Acceptance criteria (spec comparison) | Checked + recorded | Spot-checks the record |
| Likely-bug hypotheses (AI review) | Screens + filters | Decides on flagged items |
| Architecture & system fit | — | Core job |
| Business logic sanity | — | Core job |
| Merge decision | Never | Always – with evidence in view |
Limits and typical mistakes
- Alibi automation. A green pass-one badge is an input to review, not a substitute for it. If approvals become reflexive, the workflow has failed quietly.
- Noisy pre-review. An AI reviewer that cries wolf trains the team to ignore pass one entirely. Tune for precision over recall; deterministic gates carry the authority.
- No written intent. Without a task specification, pass one shrinks to style and tests – real value, but the strongest checks (boundaries, criteria) need a written Soll.
- Treating pass one as optional. A pre-check that can be skipped under deadline pressure will be – required status checks exist for a reason.
Where Reality Graph fits
Reality Graph is a structured pass one with intent: task boundaries and acceptance criteria defined before the run, the diff checked against them after it, validation the model did not author, and an evidence report as the hand-off artifact into pass two – the missing piece between generic CI checks and the human reviewer, inside the broader verification loop.
The two-pass workflow gives you
- Human review minutes spent on judgment, not mechanics
- A consistent machine gate on every single change
- Reviewers who read pre-verified diffs with evidence attached
- Scale: pass one grows with CI, not with headcount
It does not
- Replace the human merge decision – ever
- Work as an alibi – green badges are inputs, not verdicts
- Reach full strength without written task intent
- Require any specific AI reviewer or vendor
If these boundaries fit how your team wants to ship:
FAQ
- What does an efficient review workflow for AI code look like?
- Two passes with different jobs: pass one is machine verification per change - formatting, types, tests, boundary and spec checks, optionally an AI pre-review as a hypothesis filter - enforced as a required status check. Pass two is human review on the pre-verified diff, focused on architecture, business logic, and the questions no checklist can ask. The human always makes the merge decision.
- What belongs in the machine pass and what stays human?
- Everything decidable belongs to the machine: style, types, test results, scope against declared boundaries, criteria from the task specification. Everything judgmental stays human: is this the right approach, does it fit the system, is the intent itself sensible. The dividing line is decidability, not difficulty.
- Is an AI reviewer the same as a machine pre-check?
- It is one ingredient, not the pass itself. An AI reviewer generates hypotheses about problems - useful for clearing mechanical noise, but it does not know what the change was supposed to do. The pre-check earns its authority from deterministic gates (tests, types, boundaries, spec comparison); the AI reviewer adds breadth on top, filtered by a human-set signal threshold.
- Doesn't a required pre-check just slow merging down?
- It moves waiting from humans to machines. The pre-check runs in CI minutes on every push; the human reviewer - the scarce resource - receives changes that already passed the mechanical layer. Cloudflare's engineering team describes exactly this economics at scale: orchestrated automated review so human attention lands where it changes the outcome.
- What is the most common way this workflow fails?
- Alibi automation: the pre-check exists, so humans stop reading - or the pre-check is so noisy that everyone learns to ignore it. Both are calibration failures. The pass-one signal must stay high-precision (JetBrains' advice applies: don't send IDE-catchable errors to review at all), and pass two must stay a genuine review, not a rubber stamp on a green checkmark.
- Do we need a task specification for this to work?
- The workflow improves review without one, but its strongest check - comparing the change against declared intent and boundaries - needs a written task. Five minutes of specification per run unlocks the difference between 'the tests pass' and 'this is verifiably the change we asked for'.
Keep reading
Sources
- Cloudflare Engineering – Orchestrating AI code review at scale (2026)
- JetBrains – Stop sending IDE-catchable AI code errors to review (2026)
- Augment Code – Setting up AI code review in a CI/CD pipeline (2026)
- Faros AI telemetry (10,000+ developers): ~98% more merged PRs, review time +91% – summarized in 'The State of AI Code Review in 2026' (2026)
- Sonar – State of Code Developer Survey: 38% find reviewing AI code harder than human code (2026)