Economics
AI Code Churn
Last updated: 2026-07-024 min read
AI code churn – the share of merged code reworked within two weeks – drifted from a ~3.1% baseline toward 5.7% as AI assistance spread, per GitClear’s analysis of 211 million changed lines: roughly a doubling of the near-term rework share. Churn is the cheapest verification-debt signal you can compute, the lagging confirmation that code shipped before it was checked – and, priced honestly, the visible tip of a larger bill.
Contents
What the metric measures - and why two weeks
Churn counts merged lines that get modified or reverted within a window; the 14-day window is what makes it diagnostic. Rework that fast rarely means requirements changed – it means the change was wrong or incomplete when it merged and nobody caught it. That is why two-week churn functions as the lagging confirmation of verification debt: the leading indicators (unverified merges, falling review depth) predict it, and churn arrives two weeks later as the receipt.
What 211 million lines show
| Finding | Number | What it suggests |
|---|---|---|
| Two-week churn trend | ~3.1% (2020) toward 5.7% | Near-term rework share roughly doubling with AI volume |
| Duplicated code blocks | Rising sharply (~8x on copy-paste block measures) | Generation favors repeating code over reusing it |
| Moved/refactored code | Declining share | The refactor habit eroding as paste gets cheap |
| Context: PR volume | ~98% more merged PRs (Faros) | Churn percentage applies to a much larger base |
The last row matters for the money math: a doubled churn rate on a doubled change volume is roughly four times the churned lines. And the defect supply feeding it is measured elsewhere too – ~45% of AI-generated samples failing security tests is the same phenomenon viewed from the security angle.
The honest caveats
- Not all churn is waste. Fast iteration churns healthily; the debt signal is the delta above your own baseline and its trend, not the absolute number.
- Vendor-adjacent research. GitClear sells git analytics; the dataset is the largest public one and the method is documented, but the secondary-source caution applies - which is why the consistent direction across independent signals (review telemetry, security rates) carries the argument.
- Working titles inflate. Figures like “39% more churn” circulate without a traceable source; we use the numbers the dataset actually supports. Directional honesty beats dramatic precision.
Measuring and reducing your own
Your churn is a git script away: for each merged change, count lines modified again within 14 days – weekly, per repository, trend over absolute. The full recipe sits in measuring verification debt, and the levers that move it are the verification loop itself: written tasks, checks before merge, validation the model did not author. Because the metric is cheap and weekly, it doubles as the before/after gauge for any pilot – and its euro translation lives in the cost calculation.
Where Reality Graph fits
Reality Graph attacks churn at its dominant cause: changes merging unchecked against their task. Verification per run moves defects to the cheap side of the merge, and the evidence reports give churn investigations a starting point - which change, what was checked, what was skipped. We quote no reduction percentages; churn is cheap to measure, so measure it around your own pilot.
This analysis gives you
- The sourced churn numbers, with their caveats attached
- The rate-times-volume math most write-ups miss
- A weekly, script-cheap metric for your own codebase
- The bridge from churn to euros, via the cost model
It does not give you
- A universal churn multiplier - '39%'-style figures lack sources
- A verdict that all churn is waste - iteration churns too
- Certainty from one dataset - direction over gospel
- Reduction promises - measure your own before/after
If these boundaries fit how your team wants to ship:
FAQ
- How much does code churn rise with AI coding tools?
- The best public dataset is GitClear's analysis of 211 million changed lines: code reworked within two weeks of merge drifted from a ~3.1% baseline (2020) toward 5.7% as AI assistance spread - roughly a doubling of the near-term rework share. A precise universal multiplier does not exist and codebases differ; the direction and the magnitude class are what the data supports.
- What exactly does the two-week churn metric measure?
- The share of merged lines that are modified or reverted within 14 days. The window is the point: rework that fast usually means the change was wrong or incomplete when it merged - too soon for changed requirements to be the normal explanation. That makes two-week churn a proxy for 'shipped before it was verified', which is why it serves as the lagging confirmation of verification debt.
- Is all churn bad?
- No, and honest analysis prices only part of it as debt. Some churn is fast iteration working as intended - prototypes hardening, feedback landing. The signal is in the delta and the trend: a codebase whose churn doubles as AI volume grows is not iterating twice as healthily. GitClear's accompanying findings point the same direction - duplicated code rising sharply while moved (refactored) code declines, a copy-paste-over-refactor shift that is hard to read as health.
- How reliable is the GitClear research?
- It is the largest public dataset on the question and its method is documented - and it comes from a vendor of git analytics, which deserves the same secondary-source caution we apply everywhere. Its headline trend is consistent with independent signals (review-time telemetry, security-failure rates), which is why we treat it as directional evidence rather than gospel. The strongest move remains measuring your own churn, which takes a git script and an afternoon.
- What does churn cost in money?
- Churn is the bridge from quality talk to budget talk: each churned change costs the hours of its rework plus the review it consumed twice. In our worked example for a 12-person team, the churn delta above baseline prices out at roughly €1,000-1,500 per month - real, but notably smaller than the review-reconstruction line, which is why churn is the visible tip rather than the whole bill.
- How do we reduce AI code churn specifically?
- By attacking its dominant cause: changes merging before anyone checked them against what was actually asked. Written tasks with acceptance criteria, verification before merge, and validation the model did not author move defects from post-merge rework to pre-merge fixes. Teams see the effect in the metric itself - two-week churn is cheap to compute weekly, which makes it a fine before/after gauge for any verification pilot.
Keep reading
Sources
- GitClear – AI Copilot Code Quality: churn ~3.1% toward 5.7%, duplication up ~8x, refactoring declining, across 211M changed lines (2025)
- Faros AI telemetry: ~98% more merged PRs, review time per PR +91% - the volume behind the churn (2026)
- Veracode – GenAI Code Security Report: ~45% of samples fail security tests - the defect supply feeding rework (2025)