Skip to content

For teams

Your First Verification Report

Last updated: 2026-07-024 min read

A team lead can produce a first AI-code verification report in an afternoon, with no platform required: pick one workflow, write the task down, run your existing checks against it, and record the result in a file stored with the code. The value is the discipline, not the software – which is why the first report is a habit you can start today and grow into a team practice later.

Contents

Why start small, and why start now

The situation a team lead faces is measured: 96% of developers distrust AI code while only 48% consistently verify it – the intent is there, the practice is not. The trap is treating that gap as a tooling-procurement problem, which delays the fix by months. The opposite move works better: produce one real verification report by hand this week, prove the discipline pays, and let the tooling question answer itself once the manual version hits its limits. This page is the afternoon roadmap.

The afternoon roadmap

  1. Pick one workflow. One recurring AI-assisted change type – a typical feature PR, a migration, a bug fix. Not the whole codebase; one honest slice.
  2. Write the task. Goal in a sentence, the files the change may touch, two or three yes/no acceptance criteria. Three minutes, before the run.
  3. Run the checks you already have. Build, types, tests – the point is that they were not authored by the model that wrote the change (spec-vs-implementation).
  4. Record the result. A short file: task, scope, checks and outcomes, skips, approver. Stored with the code.
  5. Read it, then decide. The reviewer starts from the report, judges the trade-offs, and merges or not. That read is the report’s whole reason to exist.

What the first report looks like

verification-report.md

Example – adapt to your team
# Verification report · <change title> · <date>

## Task
Goal:     <one sentence>
Boundaries: touch only <files/dirs>
Criteria: [ ] <yes/no criterion 1>
          [ ] <yes/no criterion 2>

## Change
Files:    <list> · +<n>/-<m> lines

## Validation (not authored by the generating model)
Build:    pass
Types:    pass
Tests:    <k> added/updated, all pass
Scope:    within boundaries? yes/no

## Skipped / uncertain
- <anything not checked, and why>

## Decision
Approved by <name> · <date>

That is a complete first report – shorter than most diffs, and already useful. It is the same structure the evidence reports article details, reduced to what one person can produce by hand on day one.

Growing it into a practice - and the limits

From one report to a team habit is three unhurried steps: make it normal for the one workflow, review the trend in a weekly ten-minute look, and write the expectation into a one-page policy so it outlives the person who started it. What the first report does not do is prove ROI by itself – that needs the trend, via the four metrics – and it does not replace human judgment on architecture. It is a starting point that happens to be reachable today.

Where Reality Graph fits

Everything above works with a Markdown file and your existing CI – deliberately, because the discipline is the point. Reality Graph is one way to stop producing that report by hand once the manual version becomes the bottleneck: it writes the task, runs the verification and generates the report as a byproduct of each run, local-first. It is in private beta; the roadmap here needs none of it to start.

This roadmap gives you

  • A first verification report reachable in an afternoon
  • A tool-agnostic path that starts with a Markdown file
  • A concrete report template to copy
  • A three-step route from one report to a team practice

It does not give you

  • A product tutorial or an onboarding-time promise
  • ROI proof from one report - the trend does that
  • A replacement for human review of architecture
  • A reason to wait for tooling before starting

If these boundaries fit how your team wants to ship:

FAQ

How quickly can a team produce its first verification report?
The first one is an afternoon's work, not a project - because the minimum viable version needs no special platform. Pick one recurring AI-assisted change, write its task down with boundaries and acceptance criteria, run your existing checks (build, types, tests) against it, and record what was intended, what changed, what passed and what was skipped in a plain file stored with the code. That file is a verification report. Everything after is refinement.
What is the smallest useful first step?
One written task on one real change. Before the next AI-assisted PR, spend three minutes writing the goal, the files the change may touch, and two or three yes/no acceptance criteria. Reviewing the result against that note - instead of against your memory of the prompt - is the whole method in miniature, and it works before any tooling exists.
Do we need to buy a tool to start?
No. The first report can be a Markdown file and your existing CI - the value is in the discipline (task before the run, checks the model did not author, a recorded result), not in software. Tooling earns its place later, when doing this by hand across many changes becomes the bottleneck; at that point a verification layer automates what you already proved worth doing.
Who on the team should own this?
The team lead sets the expectation and picks the first workflow; the developers running AI tools produce the reports as part of their normal change. It is deliberately not a separate role or a gate someone polices - it is a habit attached to the work, which is why it survives. The lead's job is to make the report a normal part of 'done', not to review every one personally.
What does a good first report actually contain?
Five things: the written task, the scope of the change (files touched), the validations run and their results, anything skipped or left uncertain, and who approved. That is enough to let a reviewer start from facts instead of reconstructing them, and enough to become an audit trail as the reports accumulate. Keep it short - a report longer than the diff defeats its purpose.
How do we grow from one report to a team practice?
Three steps over a few weeks: make the report normal for one workflow, review the trend in a short weekly look (are tasks getting written, are checks passing), then write the expectation into a one-page policy so it survives personnel changes. The measurement side - is this actually reducing rework - comes from the four verification-debt metrics, computable from git data you already have.

Keep reading

Sources

Want to follow the beta, or test it when it opens?

Join early access