Skip to content

Concept

Comprehension Debt

Last updated: 2026-07-024 min read

Comprehension debt is the growing gap between the code a team ships and the mental model its people hold of it - the system’s theory hollowing out while its line count grows. AI coding accelerates it because generation no longer produces understanding as a by-product: the model writes, the human skims, and nobody builds the theory.

Contents

Where the idea comes from

The intellectual root is four decades old. In Programming as Theory Building (1985), Peter Naur argued that a program is not really its code - it is the theory in the heads of the people who built it: why the system is shaped this way, which changes it tolerates, where the bodies are buried. Code without that theory is an artifact nobody can confidently change.

The term comprehension debt attached itself to this idea as AI-generated code went mainstream, carried by essays from Addy Osmani and O’Reilly Radar and a widely-discussed framing of LLM code as a “ticking time bomb” of understanding. The mechanism is simple: writing code was how developers built Naur’s theory. Remove the writing, keep the shipping - and the theory stops being built while the system keeps growing.

Three debts, one family - and different repayments

AI coding produces more than one kind of debt, and teams that only measure one get surprised by the others:

Technical debtVerification debtComprehension debt
Core questionIs the code well-built?Was this change checked against intent?Does anyone still understand the system?
UnitThe codebaseThe individual changeThe team's mental model
How it announces itselfLoudly: slow builds, painful changesQuietly: 'looked right' incidents laterVery quietly: onboarding slows, one person 'knows it'
AI's effectMixed - more duplication, less refactoringAccelerates it: generation outruns checkingAccelerates it most: generation without understanding
RepaymentRefactoringVerification per change, recordedHumans in the decision loop; decisions recorded with reasons
Technical, verification, and comprehension debt compared: different questions, different warning signs, different repayments - a team can be fine on two and drowning in the third.

The debts compound: unverified changes (verification debt) become foundations nobody understands (comprehension debt), which makes the next verification harder still.

Why it breeds false confidence

Technical debt is at least honest - it hurts every time you touch the tangled module. Comprehension debt is invisible in the codebase: the code is clean, the tests are green, the linter is happy. What is missing exists only as an absence - the person who could say why the retry logic works this way, or what breaks if you change it. Teams discover the debt at the worst moments: during an incident, a security review, or the departure of the one person who still held the theory.

The academic framing generalizes this as cognitive and intent debt: not only is understanding thinning, the original intent behind changes is never recorded at all - which is where comprehension debt and verification debt share a root cause and a remedy.

What keeps understanding alive

  • Written intent per change. A checkable task specification records what was meant - the raw material of theory, preserved even when a model wrote the lines.
  • Review as judgment, not skimming. The human in the loop builds theory only if the review engages with intent - which is what a machine pre-check frees them to do.
  • Decisions recorded with reasons. Session handoffs and evidence trails carry the “why” across time - the part of the theory that can be written down.
  • Ownership despite generation. A module can be AI-written and human-owned; it cannot be safely nobody-owned.

Where Reality Graph fits

Reality Graph attacks the recordable half of comprehension debt: every run captures intent, boundaries, and verification outcomes in an evidence report stored with the code - so the “why” and “what was checked” survive even when no human wrote the lines. The unrecordable half - humans actually engaging with the system - remains yours.

Naming this debt gives you

  • A vocabulary for a risk dashboards don't show
  • Proxies to watch: onboarding time, bus factor, regenerate-instead-of-modify
  • A clean split from technical and verification debt
  • An argument for keeping humans in the decision loop

It does not mean

  • AI-written code is unmaintainable per se
  • Documentation alone repays it - theory can't be fully written down
  • Disposable code is always wrong - short-lived scripts differ
  • There is a standard metric yet - proxies only

If these boundaries fit how your team wants to ship:

FAQ

What is comprehension debt and how does it differ from verification debt?
Comprehension debt is the gap between the code a team ships and the mental model its people hold of that code - system knowledge hollowing out while the codebase grows. Verification debt is about a specific change: was it checked against intent before merge? Comprehension debt is about the system over time: does anyone still hold the theory of how it works? A team can verify every change and still lose the big picture - the two debts compound each other.
Where does the concept come from?
The term spread through the developer community from 2025 as AI-generated code became mainstream, popularized by essays from Addy Osmani and O'Reilly Radar among others. Its intellectual root is much older: Peter Naur's 1985 essay 'Programming as Theory Building', which argued that software is not really the code - it is the theory in the heads of the people who built it.
Why does AI coding accelerate comprehension debt?
Because writing code was how developers built their theory of it. When a model writes and a human skims, the code arrives without the understanding that used to come from producing it - and it arrives faster than anyone can absorb. Generation speed outruns theory-building speed, and the gap compounds with every merge.
Is comprehension debt measurable?
Less directly than verification debt, but proxies exist: how long onboarding takes, how often changes to a module require the one person who 'knows it', bus-factor analyses, and how frequently teams re-generate code because modifying it is harder than replacing it. Rising values on all four are the pattern to watch.
Does documentation solve comprehension debt?
It helps but does not substitute. Naur's point was precisely that the theory cannot be fully written down - documentation transmits facts, not the built-up judgment of why the system is shaped this way. What helps more: humans staying in the decision loop (reviewing with intent, not skimming), recorded decisions with reasons, and keeping module-level ownership even when AI writes the lines.
Can a team just accept comprehension debt and re-generate code instead of understanding it?
That is a real position ('disposable code'), and for short-lived scripts it can be rational. For systems that live years, the bet fails at the boundaries: incidents need someone who understands the system now, security reviews need intent, and regeneration without understanding reproduces the old assumptions - including the wrong ones - at scale.

Keep reading

Sources

Want to follow the beta, or test it when it opens?

Join early access