3PM Track · Module 3 · 20 min

Sprints → Loops: What Adaptive AI Adds

Story → Brief, Sprint → Loop. The transition tables, with three-source convergence: Anthropic Engineering on harness discipline, GitHub on developer-AI patterns, DORA on delivery-loop telemetry.

Diagram: Sprints → Loops: What Adaptive AI Adds. Switch site theme to see the dark/light variant.

Try the harness on your machine

Use of the harness is free — closed-loop capabilities require a PM33 subscription.

install
curl -fsSL https://pm-33.com/install/pm33-init.sh | bash

Read the install docs first — script source, bundle contents, uninstall command all linked there.

In this module

  1. Learning objectives
  2. Key concept — the unit of work and the unit of time are changing together
  3. What the sprint actually did — and what's still doing it
  4. What the story actually did — and why it can't survive contact with an AI agent
  5. What's emerging in place of each — the Brief and the Loop
  6. The transition table — old artifact, new artifact, what carries over
  7. The empirical case — DORA 2024, Anthropic harness blog, GitHub Spec Kit
  8. Failure modes during the transition
  9. Sidebar: how PM33 implements the Brief/Loop pattern
  10. Discussion prompts

Learning objectives

After this module you should be able to:

  • Explain why the 2-week sprint is a coordination ritual without a delivery purpose in AI-augmented teams
  • Articulate what the sprint was doing that still needs doing (Lindsay's productive complication)
  • Describe the difference between a Story and a Brief in concrete terms
  • Identify the failure modes of partial transitions (e.g., keeping sprints but adopting Briefs)

Key concept

The 2-week sprint is the most visible artifact of the Agile era. The Story is the second-most-visible. Both are under pressure simultaneously, for the same underlying reason: AI agents change the cycle-time constants of execution.

When the executor changes from "a human team that needs 2 weeks to deliver, coordinated by daily standups" to "an agent that delivers in hours, coordinated by structured handoffs," the rituals built around the old executor become friction.

Critically, this does not mean "no sprints, no stories, no ceremonies." It means: the artifacts that were doing real work for the old executor need to be replaced by artifacts that do the equivalent work for the new executor. The transitions are not 1-to-1, and the rituals that remain often have different content even if they keep the same names.

What the sprint actually did

Before we talk about the sprint dying, we should be clear about what the sprint did. Sprints performed at least five distinct functions:

FunctionWhat it producedStill needed in Adaptive era?
Time-box for delivery"We commit to N stories by Day 14"NO — execution flows continuously when agents are involved
Cadence for stakeholder communicationSprint demos, stakeholder reviewYES, but the content changes (outcome attribution, not feature demo)
Cadence for learningRetrospectivesYES, and intensifies — Lindsay (2026): "Faster execution does not eliminate the need for learning cycles; it increases it"
Coordination for dependencies"We need eng A by Day 5 so eng B can start on Day 6"YES, but the dependency graph runs continuously, not bounded by sprint walls
Forcing function for scope"If it doesn't fit in 2 weeks, decompose"YES, but the unit becomes "fits in one agent session" not "fits in 2 weeks"

The pattern: the sprint's delivery-throttle function is dying. The sprint's learning-cadence and forcing-function roles persist, often under different names.

Giles Lindsay's February 2026 piece makes this precisely:

"AI didn't kill the sprint — it exposed what sprints were really for. The sprint was never meant to be a delivery throttle but was designed as a learning cadence." — Giles Lindsay (February 2026)

This is the productive complication that should make any "sprints are dead" argument better, not worse. The honest version of the claim is: the sprint as a time-boxed delivery commitment is breaking down. The sprint as a learning cadence is intensifying.

What the story actually did

A Story in classic Agile served three functions:

  1. Unit of estimation — "this is 3 story points"
  2. Unit of conversation — "let's discuss what this story actually means"
  3. Unit of commitment — "this is in the sprint"

For human teams, those three functions composed well. The same artifact could be estimated, debated, and committed. The implicit contract was that the engineer executing the story could ask clarifying questions, exercise judgment on edge cases, and surface concerns at code review.

For AI agents, the three functions decompose:

Story functionWhat an AI agent needs instead
Unit of estimationA spec structured enough that delivery time is computable from history (AR(1) priors)
Unit of conversationEither zero conversation (the spec is complete) or a structured ask-back mechanism
Unit of commitmentIndependent of time — commits to completion against verification criteria, not commits to date

The Story doesn't disappear. It evolves into a more structured artifact with named fields. We use the term Brief for this evolved artifact, following Anthropic Engineering's language in their November 2025 harness blog.

"A comprehensive file of feature requirements expanding on the user's initial prompt" is the foundational input. Crucially: "Self-verify all features. Only mark features as 'passing' after careful testing." — Anthropic Engineering, "Effective harnesses for long-running agents" (November 2025)

The Brief is what the Story becomes when the executor can't ask clarifying questions and can't exercise human judgment about ambiguity.

What's emerging in place of each

Agile artifactAdaptive-era replacementWhat's the change?
StoryBrief — structured spec with machine-verifiable acceptance criteriaAmbiguity moves from execution time to spec time
Sprint (delivery time-box)Loop — continuous flow with explicit learning gatesTime-box dies; learning cadence intensifies
Sprint planning meetingBacklog grooming + scheduler review — capacity-aware prioritization runs continuouslyThe meeting shrinks; the analysis becomes ongoing
Sprint demoOutcome attribution review — predicted vs. realized at the strategic-objective levelOutput demo → outcome attribution
RetrospectiveContinuous recalibration — automated where measurable (AR(1) priors), human where qualitativeProcess retro → model retro + human retro
Daily standupAsynchronous summary + targeted human conversation — the morning Pam summary replaces ~80% of standup contentMeeting → summary you read in 5 minutes
Story pointsAR(1) confidence intervals — Bayesian forecast informed by workspace historyPoint estimate → distribution
VelocityOutcome attribution rate — % of work whose attributed metric movement matches predictionThroughput metric → impact metric

A few things to note about this table:

  • Nothing on the left disappears entirely. The functions persist; the artifacts evolve.
  • The replacements aren't 1-for-1. The sprint demo becomes "outcome attribution review," which is a deeper and less frequent event than a per-sprint demo.
  • The new artifacts mostly require new infrastructure. You can't run AR(1) confidence intervals on the back of a napkin; they require a system that's tracking historical Brief throughput per workspace.

The empirical case for the transition

Three independent sources converge on the transition:

Anthropic Engineering — the harness pattern

Anthropic's November 2025 harness blog and June 2025 multi-agent research system piece describe an execution pattern that is structurally identical to what we're calling the Brief/Loop pattern:

"Each subagent needs an objective, an output format, guidance on the tools and sources to use, and clear task boundaries... Without detailed task descriptions, agents duplicate work, leave gaps, or fail to find necessary information."

The InfoQ coverage of Anthropic's three-agent harness (April 2026) quotes Anthropic's Prithvi Rajasekaran:

"Separating the agent doing the work from the agent judging it proves to be a strong lever."

That separation — generation from judgment — is exactly what the Brief/verification gate does. Anthropic is independently arriving at the pattern this curriculum is teaching, from the engineering side.

GitHub Spec Kit — Spec → Plan → Tasks → Implement

GitHub's Spec Kit (Den Delimarsky, Principal PM, September 2025) productizes the same pattern:

"We treat coding agents like search engines when we should be treating them more like literal-minded pair programmers... A vague prompt like 'add photo sharing to my app' forces the model to guess at potentially thousands of unstated requirements."

The Spec Kit's structure (Spec → Plan → Tasks → Implement → Verify) is the same Brief/Loop pattern with different vocabulary.

DORA 2024 — the Vacuum Hypothesis is the empirical hammer

DORA's 2024 finding is the strongest single empirical argument for adopting the full pattern rather than partial:

"AI adoption was accompanied by an estimated decrease in delivery throughput by 1.5%, and an estimated reduction in delivery stability by 7.2%." — DORA 2024 State of DevOps Report

Why? The "Vacuum Hypothesis": reclaimed time from AI productivity gains gets absorbed by lower-value tasks if not redirected through disciplined flow practices. Adopting AI without adopting Briefs, continuous flow, and outcome attribution makes things worse.

This is the empirical answer to "can we just adopt AI tools and keep our sprint cadence?" The honest answer per DORA's data: no — you'll lose stability without gaining throughput, and worse, you won't notice because the velocity number won't change much.

Failure modes during the transition

The transition is hard. Common failure modes:

1. The Brief that's still a Story

A team adopts the Brief vocabulary but keeps writing Stories. The "acceptance criteria" field gets filled with "works correctly for common cases." This is the central failure mode of the Brief transition — it's why Anthropic's harness blog spends so much time on what constitutes a useful spec.

Diagnostic: pick 5 recent Briefs. Can each one be validated mechanically — by running a test command, checking a metric, parsing an output? If not, you have Stories in Brief clothing.

2. The Loop that's still a Sprint

A team eliminates the formal sprint commitment but keeps holding sprint planning meetings every 2 weeks. The meeting now has nothing to plan because work flows continuously. Attendees lose attention; the meeting becomes ritual.

Diagnostic: ask the team — "what decision was made in the last sprint planning meeting that wouldn't have been made anyway?" If the answer is none, the meeting is ritual.

3. Outcome attribution without recalibration

A team starts measuring outcome attribution (predicted vs. realized) but doesn't feed the gap back into priors. The reports get generated; nobody acts on them.

Diagnostic: look at your AR(1) prior history. When did it last update? If it hasn't updated in 6+ months, attribution is happening but recalibration isn't.

4. Keeping the velocity metric

A team adopts Briefs and continuous flow but still reports velocity to leadership. Now leadership measures throughput while the team optimizes for outcome attribution. The mismatch produces political friction and incentive misalignment.

Diagnostic: ask your skip-level what metric they look at for your team. If it's velocity (story points / time), the measurement layer hasn't transitioned.

5. AI without harness

A team adopts AI agents but treats them as "humans who type faster." No structured Briefs, no verification gates, no specialist routing. This is the DORA Vacuum Hypothesis playing out — the throughput-stability tradeoff bites.

Diagnostic: are AI-executed work items entering code review with consistent shape? Or is every PR a surprise? Inconsistent shape = no harness discipline.

PM33's implementation is one instance of the pattern. The pattern stands without PM33.

  • The Brief is a first-class entity (pm33_create_brief MCP tool, /brief skill, BriefSchema enforcement). It has structured fields: title (imperative), specification (function signature, input/output contracts, dependencies), acceptance criteria (machine-verifiable: test file path, assertion shape, validation command), outcomeHook (predicted metric movement), specialist + LLM tier, TDD phases.

  • The Loop is the AutoDoneVerificationService — it observes verification signals (CI, tests, security gates, deploy events) and flips Brief status from in_progress to done only when all gates pass. Done is computed, not claimed.

  • Continuous flow — there is no sprint commitment in the core model. Briefs are scheduled by pm33_optimize_priorities based on capacity, alignment, AR(1) confidence, and dependencies. The "sprint" view exists as a coordination convenience for teams that want a 2-week rollup, but the underlying execution is continuous.

  • Outcome attributionoutcomeHook is the declared prediction; the attribution layer measures realized vs. predicted at the configured window and feeds back into the AR(1) prior.

  • Recalibration — the workspace-specific AR(1) priors update continuously. The recalibration history is visible in the attribution dashboard; nothing is hidden inside the model.

We are not the only people building this pattern. GitHub's Spec Kit (open source) implements a sibling version with different opinions. Anthropic's harness pattern is described in their engineering blog and is being independently re-implemented across the ecosystem (LangChain, Mastra, CrewAI, etc.). Pick the implementation that fits your stack; the pattern matters more than the vendor.

Discussion prompts

  1. Open your team's last 5 Stories or Briefs. Which would survive being handed to an AI agent with no human in the room? Which would produce expensive rework?
  2. If your team eliminated the sprint commitment, what would have to be true about your flow practices for that to be safe (per DORA 2024)?
  3. The Lindsay productive complication — what learning cadence does your team currently rely on? Would it survive losing the sprint container?
  4. The Vacuum Hypothesis: where in your team has reclaimed AI productivity been absorbed by low-value work? What did it get spent on instead of redirected to?

Further reading