Skip to content

Compliance

Audit Trails for AI-Generated Code

Last updated: 2026-07-024 min read

An audit trail for AI-generated code records per change what no git log holds: which tool acted, on what written task, what changed, what was validated with which result, and who approved. The demand is assembling from several directions at once – NIS2 oversight duties, the new Product Liability Directive treating software as a product, ISO and sector audits – and it lands on the same artifact: structured evidence per run, stored with the code.

Contents

Why the question suddenly arrives in audits

Legal status: July 2, 2026. This article describes regulation for orientation – it is not legal advice; what your auditors accept is their call and your counsel’s.

No single law says “keep an AI code audit trail”. The pressure assembles: since December 2025, Germany’s NIS2 implementation puts development and supply-chain risk under documented management oversight for roughly 29,500 companies. From December 2026, the new Product Liability Directive treats software – standalone or as a service – as a product under strict liability, which makes “we can show how this was built and checked” a defense posture rather than a nicety. And certification audits have started asking the concrete question: what do your AI tools touch, and who checked it? With only 48% of developers consistently verifying AI code, the honest answer in most organizations today is: nobody can say.

What the trail must record - and where git ends

QuestionWhat to recordDoes git hold it?
Who/what acted?Tool, model and version; triggering humanPartly - committer, not tool identity
On what mandate?The written task: goal, boundaries, acceptance criteriaNo
What changed?Files, diff scope, migrations, dependencies addedYes - git's home turf
What was checked?Validations run, results, what was skipped and whyNo
Who approved?Reviewer/approver, decision, timestampPartly - PR approvals, if your platform keeps them
The five questions an AI-code audit trail answers - git natively answers only one of them (status: July 2026).

The gap is structural, not a git flaw: version control records outcomes, not mandates and checks. Commit messages and PR threads hold fragments – unstructured, inconsistent, and scattered across platforms that may not be part of your retention. For an auditor, that is the difference between evidence and archaeology.

Building the trail without a bureaucracy project

  1. One written task per AI run. Goal, boundaries, criteria – checkable form. This is the mandate the trail refers back to.
  2. One structured record per run. Tool and version, diff scope, validations with results, skips, uncertainties, approver – machine-readable, generated as a byproduct of the workflow, not typed up afterwards.
  3. Stored with the code. The repository is the one place that shares the code’s lifetime and retention – trails in chat threads and ticket comments do not survive tool migrations.
  4. Anchored in policy. A line in your AI coding policy that says agent changes carry evidence – so the trail is a rule, not a habit that erodes under deadline pressure.

The BSI/ANSSI recommendations point the same direction: treat assistant output as unverified input, and make its checking visible.

Where Reality Graph fits

This artifact is what Reality Graph produces natively: each AI coding run is verified against its written task, and the outcome – changes, validations, results, skips, open points – lands in an evidence report stored with the code, local-first. Whether that record satisfies a specific auditor or regulation is their assessment and your counsel’s – Reality Graph supplies the documentation, not the verdict.

A per-run audit trail gives you

  • Answers to the five auditor questions, per change
  • Incident response that knows which tool touched what
  • A defense posture as software liability tightens
  • Evidence generated as a byproduct, not an afterthought

It does not give you

  • A legal duty checklist - the duties assemble case by case
  • A guarantee any auditor accepts the format - ask yours
  • Provenance for code written before the trail existed
  • A substitute for verification itself - it records checks, it is not one

If these boundaries fit how your team wants to ship:

FAQ

How do you prove to auditors what an AI tool did in your codebase?
With a per-change record that answers five questions: which tool and version acted, on what task with which boundaries, what changed, what was validated with which result, and who approved it. Git answers only the third question. The teams that answer all five keep a structured evidence artifact per AI run, stored with the code - the pattern auditors already know from build provenance and change management.
Which regulations actually demand an audit trail for AI code?
None names 'AI code audit trail' verbatim - the demand assembles from adjacent duties. NIS2 (in Germany via the implementation act in force since December 2025) requires managed, documented development and supply-chain risk; the new Product Liability Directive treats software as a product from December 2026, making 'we can show how this was built and checked' a defense posture; ISO 27001 and TISAX audits ask how tooling with repository access is controlled. Sector rules (DORA, IEC 62304, Automotive SPICE) add traceability explicitly.
Isn't git history enough?
Git records what changed and who committed - it does not record what the task was, what the tool was allowed to touch, what validation ran, what it found, or who reviewed the result. Commit messages and PR comments hold fragments of this, unstructured and inconsistently. For an auditor, that is the difference between evidence and archaeology.
What fields does a usable audit trail entry contain?
The working set: timestamp; tool and model version; the written task including boundaries; scope of the change (files, diff stats); validations run and their results, including what was skipped; open uncertainties; and the human who approved. Machine-readable, one record per run, stored with the repository. Teams that keep evidence reports per run get this as a byproduct instead of a project.
Do we have to disclose which code is AI-generated?
As of July 2026 there is no general legal duty to label AI-generated code in ordinary commercial software - transparency duties under the AI Act target AI systems interacting with people, not code provenance. Internal traceability is a different matter: knowing which changes came from which tool is what makes incident response, license review and audits workable. Whether specific contracts or sector rules require disclosure in your case is a counsel question.
How do we start without boiling the ocean?
Start where the risk is: agent runs that modify code. Require a written task per run, capture the diff scope and validation results in a structured record, and store it next to the code. That single habit converts your highest-volume, least-witnessed change source into your best-documented one - and it is the part auditors ask about first, because it is the part nobody can reconstruct later.

Keep reading

Sources

Want to follow the beta, or test it when it opens?

Join early access