What is spec-driven development and how does it work with AI agents?

Spec-driven development (SDD) is a methodology where a written specification - requirements, design decisions, and a task breakdown - is created and agreed before an AI agent generates code, and the spec remains the living reference during implementation. Tools like GitHub Spec Kit and AWS Kiro structure this into explicit phases, so the agent executes an agreed plan instead of improvising from a prompt.

How is SDD different from just writing checkable task specifications?

Same philosophy, different weight class. A machine-checkable task specification is a lightweight per-run artifact - goal, boundaries, criteria - written in minutes. SDD formalizes the whole feature lifecycle with multi-document workflows and tooling. For a bug fix, the lightweight form wins; for a greenfield feature spanning many runs, SDD's structure pays for itself.

Does spec-driven development replace tests?

No. The spec defines what must be true; tests are one way of checking it. SDD tools generate task lists that usually include writing tests, but a spec that nobody validates against remains a plan, not a proof. The verification step - comparing what was built against what was specified - is separate work, whichever tool wrote the spec.

Is SDD worth it for small teams?

Selectively. The documented wins are on larger features and greenfield work - AWS reports cases where 40-hour features shipped in under 8 hours of human time when authored spec-first. For small fixes, practitioners consistently report the overhead exceeds the benefit; even Martin Fowler's team notes the method is 'way too verbose for a small bug'. Match the ceremony to the task size.

What are the main criticisms of SDD?

Three recur in practitioner reports: the document overhead dwarfs small tasks, agents do not reliably follow their own specs ('I frequently saw the agent ultimately not follow all the instructions'), and reviewing several markdown files can be more work than reviewing the code itself. None of these kill the idea - they argue for right-sizing it and for verifying results independently of the spec pipeline.

Which SDD tool should we try first?

If you want the community default, GitHub Spec Kit - open source, works alongside 30+ agents including Claude Code and Cursor. If you want an integrated IDE experience, AWS Kiro with its requirements-design-tasks flow. If you mainly need verifiable single runs rather than feature-scale planning, start with lightweight checkable specifications and grow into SDD where the task size justifies it.

Method

Spec-Driven Development

Last updated: 2026-07-024 min read

Spec-driven development (SDD) writes the specification first – requirements, design, task breakdown – and lets AI agents implement against it, with the spec as the living reference. It is the structured answer to vibe coding’s drift problem, championed by tools like GitHub Spec Kit and AWS Kiro – strongest on large features, heaviest on small fixes.

Contents

Where SDD came from

SDD emerged in 2025 as the direct counter-movement to prompt-and-hope coding: agents produce plausible code that drifts from intent, invents APIs, and decays as projects scale. The proposed fix is old-fashioned and radical at once – agree on a written specification before generation, and keep it authoritative during implementation. By 2026 every major tool ecosystem ships an SDD flavor; GitHub’s Spec Kit alone has grown past 90,000 stars and supports 30+ coding agents.

The movement validates something this site argues from the verification side: the missing artifact in AI coding is written intent. SDD builds that artifact at feature scale; machine-checkable specifications build it at run scale. The philosophies meet in the middle.

How the workflow actually looks

Requirements.What the feature must do, phrased as user stories with acceptance criteria – in Kiro’s flow the first of three gated phases, in Spec Kit the /specify step.
Design. Architecture decisions, data models, interfaces – agreed before code exists, so the agent inherits decisions instead of making them silently.
Tasks. The design broken into small, ordered, individually reviewable work items – the unit an agent executes.
Implementation against the spec. The agent codes task by task; humans review against the agreed documents rather than reconstructing intent from diffs.

The reported upside is substantial where the method fits: GitHub describes internal teams shipping with roughly an order of magnitude fewer “regenerate from scratch” cycles, and AWS documents customer cases of 40-hour features landing in under 8 hours of human time when authored spec-first.

The tooling, honestly compared

	GitHub Spec Kit	AWS Kiro	Lightweight per-run spec
Form	Open-source CLI + templates, agent-agnostic (30+ agents).	Agentic IDE with gated requirements → design → tasks flow.	A few lines per run: goal, boundaries, criteria.
Strongest at	Feature-scale work across mixed toolchains; community momentum.	Guided, integrated flow – the clearest mental model of the three.	Bug fixes and single-run tasks; zero setup.
Weakest at	Small fixes – the ceremony dwarfs the change.	Verbosity: practitioners report it 'way too verbose for a small bug'.	Multi-run features – no design memory between runs.
Verification story	Spec defines the plan; checking results against it stays your job.	Same – phases gate planning, not post-run verification.	Feeds directly into a spec-vs-implementation check.

Spec Kit, Kiro, and lightweight per-run specifications solve the same problem at different weights - the right choice depends on task size, not on ideology.

Honest limits – from people who used it

Agents don’t reliably obey their own specs. The Fowler-team analysis notes that even with all the templates and checklists, “the agent ultimately did not follow all the instructions” – a written plan is not an enforced plan.
Document review can exceed code review. Several practitioners report they would rather review the code than three markdown files describing it. SDD shifts effort upstream; it does not remove it.
Specs age. A spec that is not maintained becomes confidently wrong documentation – worse than none, because it looks authoritative.
SDD plans forward; it does not check backward. The workflow gates what gets built, but the question “does the merged code actually match the spec?” still needs an independent spec-vs-implementation check – especially given the first limit above.

Where Reality Graph fits

SDD and Reality Graph attack the same gap from opposite ends: SDD makes the intent explicit before generation, Reality Graph verifies the result against intent after it – with boundary checks, validation the model did not author, and an evidence report per run. Teams using Spec Kit or Kiro bring excellent Solls; verification closes the loop those tools leave open.

SDD gives you

Agreed requirements and design before generation
An order-of-magnitude fewer regenerate-from-scratch cycles (GitHub's internal reports)
Reviewable task units instead of monolithic diffs
A shared reference for humans and agents

It does not

Make agents obey the spec - runs still need verification
Pay off on small fixes - right-size the ceremony
Keep itself up to date - specs age without care
Replace tests, review, or a human merge gate

If these boundaries fit how your team wants to ship:

Get early access See how it works

FAQ

What is spec-driven development and how does it work with AI agents?: Spec-driven development (SDD) is a methodology where a written specification - requirements, design decisions, and a task breakdown - is created and agreed before an AI agent generates code, and the spec remains the living reference during implementation. Tools like GitHub Spec Kit and AWS Kiro structure this into explicit phases, so the agent executes an agreed plan instead of improvising from a prompt.
How is SDD different from just writing checkable task specifications?: Same philosophy, different weight class. A machine-checkable task specification is a lightweight per-run artifact - goal, boundaries, criteria - written in minutes. SDD formalizes the whole feature lifecycle with multi-document workflows and tooling. For a bug fix, the lightweight form wins; for a greenfield feature spanning many runs, SDD's structure pays for itself.
Does spec-driven development replace tests?: No. The spec defines what must be true; tests are one way of checking it. SDD tools generate task lists that usually include writing tests, but a spec that nobody validates against remains a plan, not a proof. The verification step - comparing what was built against what was specified - is separate work, whichever tool wrote the spec.
Is SDD worth it for small teams?: Selectively. The documented wins are on larger features and greenfield work - AWS reports cases where 40-hour features shipped in under 8 hours of human time when authored spec-first. For small fixes, practitioners consistently report the overhead exceeds the benefit; even Martin Fowler's team notes the method is 'way too verbose for a small bug'. Match the ceremony to the task size.
What are the main criticisms of SDD?: Three recur in practitioner reports: the document overhead dwarfs small tasks, agents do not reliably follow their own specs ('I frequently saw the agent ultimately not follow all the instructions'), and reviewing several markdown files can be more work than reviewing the code itself. None of these kill the idea - they argue for right-sizing it and for verifying results independently of the spec pipeline.
Which SDD tool should we try first?: If you want the community default, GitHub Spec Kit - open source, works alongside 30+ agents including Claude Code and Cursor. If you want an integrated IDE experience, AWS Kiro with its requirements-design-tasks flow. If you mainly need verifiable single runs rather than feature-scale planning, start with lightweight checkable specifications and grow into SDD where the task size justifies it.

Keep reading

MethodTwo-Pass Review WorkflowMachine pre-check first, human architecture review second: what belongs in each pass, how to keep the machine gate high-precision, and why the human always makes the merge call.MethodAI Session HandoffsSessions forget - compaction drops details, new sessions start cold. Write state, decisions, and open verification points into a persistent handoff artifact: the method, a template, its limits.All articlesThe whole collection – 30 cited, dated guides on verifying AI-generated code.