In this module
- Learning objectives
- The classic PM ↔ Engineering interface — what it was doing
- What changes when AI agents execute — and what doesn't
- The new shape of the interface — three handoff modes
- What PMs still owe engineering — even more, not less
- What's now automated that used to be human — capacity, dependencies
- Where human judgment remains load-bearing — the boundaries
- Sidebar: how PM33 makes the interface legible
- Discussion prompts
Learning objectives
After this module you should be able to:
- Describe the three handoff modes between planning and engineering in the loop era
- Articulate what PMs still owe engineering (and why it's more, not less)
- Identify what's automated (capacity-aware scheduling, dependency surfacing) and what isn't
- Locate the human-judgment boundary in your own team's interface
The classic PM ↔ Engineering interface
In the Agile era, the PM ↔ Engineering interface was a structured set of conversations across a sprint cycle:
| Conversation | Frequency | Primary content |
|---|---|---|
| Backlog grooming | Weekly | What's coming, is it ready, what's blocked |
| Sprint planning | Bi-weekly | What fits, who does what, dependencies |
| Daily standup | Daily | What's progressed, what's blocked |
| Sprint demo | Bi-weekly | What shipped, did it land |
| Retrospective | Bi-weekly | What hurt, what to improve |
| Ad-hoc (Slack, hallway) | Constant | Clarifications, scope changes, blocker escalation |
The interface was high-touch, high-bandwidth, and human-to-human throughout. PMs and engineers built relationships through the cadence. Trust was personal.
This worked well at human-team scale. When some of the team's work becomes AI-executed, the interface shape changes.
What changes when AI agents execute — and what doesn't
The biggest single change: the unit-of-conversation between PM and engineer becomes denser per occurrence and less frequent overall.
When a senior engineer reviews 12 stories per sprint planning meeting, each gets ~5 minutes. When the same engineer reviews 3 Briefs that will be AI-executed, each gets 20+ minutes because the spec quality has to be high enough to survive non-judgment execution.
What changes:
| Aspect | Agile era | Loop era |
|---|---|---|
| Conversation frequency | High (multiple per week) | Lower (per Brief, not per sprint) |
| Conversation depth | Medium (humans fill gaps) | Higher (specs are load-bearing) |
| Status updates | Verbal (standup) | Lifecycle events (audit log) |
| Demo cadence | Sprint-bounded | Continuous (outcome attribution) |
| Decision authority for scope changes | PM (often) | PM, but with structural cost — scope change interrupts agent execution |
| Engineer-side "I'd interpret it this way" | Constant | Rare and explicit (asked via structured channels) |
What does NOT change:
- The PM still owns the "why" and the "what"
- The engineer still owns the "how" — including the harness, the Brief verification gates, and which specialists execute which Briefs
- Cross-functional decisions (design trade-offs, security concerns, accessibility) still require human conversations
- Customer escalations route through the PM first
- Engineering escalations (tech debt, incidents, infrastructure decisions) route through engineering leadership first
The shape of the relationship changes; the underlying division of responsibility doesn't.
The new shape of the interface — three handoff modes
In the loop era, PM ↔ Engineering handoffs split into three modes with different bandwidth and ceremony:
Mode 1: Structured handoff (most common)
The PM authors a Brief; engineering doesn't see it directly because the AI specialist picks it up. Engineering is involved when:
- The Brief is filed — engineering reviews structure, flags missing dependencies, signs off
- A verification gate fails — engineering investigates whether the gate is wrong or the Brief is wrong
- The PR lands — engineering does code review on the diff before merge
Bandwidth: low (structured artifact, audit log, code review). The engineer's time per Brief is concentrated at filing and at PR review.
Mode 2: Conversational handoff (load-bearing for novel work)
When a Brief involves genuinely novel architecture, novel data flow, or novel external integration, the PM and engineering have a conversation BEFORE the Brief is filed. The conversation produces:
- A specification depth the AI specialist can't be expected to fill from priors
- A specialist + LLM tier choice that reflects the novelty (often Opus, often a specific specialist class)
- A dependency map that the scheduler can use
Bandwidth: high during the conversation, low after. The conversation replaces what would have been mid-sprint scope thrash in the Agile era.
Mode 3: Escalation handoff (rare, important)
When something goes structurally wrong — repeated verification failures, AR(1) priors drifting, harness configuration breaking — the Loop Master escalates to engineering. This is genuinely rare in steady state. When it happens, it requires the highest-bandwidth conversation of the three modes.
Bandwidth: high, urgent, time-bounded. Treated like an incident.
The pattern: the interface gets stratified. Routine work has structured handoff. Novel work has conversational handoff at the front. System-level problems get escalation handoff. Each mode has its own bandwidth budget and shouldn't be conflated.
What PMs still owe engineering — even more, not less
A common misread of the Adaptive era is "AI does the work, so PMs can write looser specs." The opposite is true. The PM's spec quality matters MORE because the AI specialist can't fill ambiguity with judgment.
What PMs still owe engineering, and what's increased in the loop era:
| What's owed | Was-it-important-before | Loop-era importance |
|---|---|---|
| Clear acceptance criteria | Yes | Critical — machine-verifiable, not prose |
| Strategic context (why this Brief exists) | Yes | Critical — informs LLM tier choice and specialist routing |
| Explicit dependencies (UUIDs, not prose) | Helpful | Critical — the scheduler depends on it |
| Performance budget | Sometimes | Critical — sets the constraint the specialist optimizes against |
| Side-effect disclosure (auth, schema, audit) | Often missed | Critical — drives specialist policy override (security-auditor on auth Briefs) |
| Edge-case enumeration | Often missed | Critical — the agent will not enumerate them on its own |
| outcomeHook (predicted metric movement) | Rarely required | Important — closes the loop; engineering wants to see this too |
The shift: ambiguity that used to be resolved during execution now must be resolved during specification. That work doesn't disappear — it moves left. PMs do MORE specification work per Brief than they did per Story. They do it for FEWER Briefs (because each one carries more) but with HIGHER quality per spec.
Anthropic's harness blog (November 2025) is the clearest statement of this:
"Without detailed task descriptions, agents duplicate work, leave gaps, or fail to find necessary information."
GitHub's Spec Kit (September 2025) makes the same point with a sharper edge:
"A vague prompt like 'add photo sharing to my app' forces the model to guess at potentially thousands of unstated requirements."
PMs who internalize this become indispensable. PMs who don't end up writing Stories in Brief clothing (Module 3's failure mode), which produces expensive rework that gets blamed on "the AI doesn't understand what we wanted."
What's now automated that used to be human
Several things that took PM ↔ Engineering bandwidth in the Agile era are now mechanized:
| Was human bandwidth | Now mechanized |
|---|---|
| "How much capacity does frontend have this sprint?" | Capacity-aware scheduler reads team config + leave + velocity history |
| "Does BRIEF-X depend on BRIEF-Y?" | Dependency graph surfaces this in the scheduler proposal |
| "Where is BRIEF-X right now?" | Lifecycle event audit log |
| "Did BRIEF-X land?" | AutoDoneVerificationService flips status based on verification signals |
| "What's the team's velocity this quarter?" | Outcome attribution dashboard (replaces velocity entirely) |
| "What's the AR(1) confidence on BRIEF-X landing this week?" | Computed automatically |
The pattern: factual questions are mechanized; judgment questions remain human. The PM and engineer don't spend time on "where are we?" anymore. They spend their saved time on the judgment questions that were always the high-value work.
Atlassian's productization signal (2024-2025) is worth noting: Jira itself is building probabilistic forecasting (Monte Carlo charts) as a built-in feature. The mechanization of factual delivery questions is becoming a tooling commodity.
Where human judgment remains load-bearing
Despite the mechanization above, several judgment surfaces remain stubbornly human:
- Should we build this at all? — strategic judgment, customer empathy, market timing. Not delegatable.
- Is this Brief specified well enough to ship? — the spec-quality call requires human judgment about what's worth saying explicitly.
- Is the outcomeHook the right metric? — measuring the wrong thing is worse than measuring nothing. Picking the right metric is taste.
- When to override the scheduler proposal? — three legitimate reasons (private information, missed dependency, strategic objective changed). Recognizing them requires context the scheduler doesn't have.
- When a verification gate is wrong vs. the work is wrong? — investigating a Brief that won't convert requires human judgment about the gate's design.
- Cross-functional disagreements — eng vs. design vs. PM. Humans broker.
A useful mental model: mechanize the factual layer; defend the judgment layer. The PM ↔ Engineering interface in the loop era spends almost all its time in the judgment layer, because the factual layer is largely automated.
Sidebar — how PM33 makes the interface legible
PM33's design opinions on the PM ↔ Engineering interface:
- The Brief schema makes the spec contract explicit. Engineering can read a Brief and immediately know: what's the input contract, what's the output contract, what are the side effects, what's the performance budget, what's the dependency graph. There's no "ask the PM what they meant."
- Lifecycle events replace standup. When a Brief transitions states, a structured event fires. Engineering sees the audit log; PM sees the morning summary. The "where are we?" question is answered without a meeting.
- The scheduler shows its work.
pm33_optimize_prioritiesreturns a proposal with reasons — alignment scores, AR(1) confidence per Brief, capacity constraints surfaced, dependency edges visible. PMs and engineers can argue with the proposal because the inputs are legible. - Verification gates are documented. When the AutoDoneVerificationService keeps a Brief in
in_progress, the gate that's failing is named. Engineering doesn't have to ask "why isn't this done?" — the log says. - Specialist routing is policy-driven, not ad hoc. The matrix (backend-architect at Opus, frontend-developer at Opus with frontend-design skill, security-auditor on auth Briefs) is documented and enforceable. PMs and engineers agree on the matrix once; they don't re-negotiate it per Brief.
These mechanisms aren't PM33-specific. Any closed-loop system that takes the Brief/Loop pattern seriously will need to make the interface legible in similar ways. PM33 just chose specific opinions.
Discussion prompts
- Pick the last 3 mid-sprint scope thrashes your team had. Which mode of handoff was missing (structured Brief? front-load conversation? escalation pathway)?
- What spec-quality habit would your PMs need to develop to make Mode 1 (structured handoff) work reliably?
- Of the six judgment surfaces above, which does your team get right today? Which is owned implicitly without an owner?
- The mechanization shift — what does your engineering team currently spend time on that has become mechanizable?
Further reading
- Anthropic Engineering — "Effective harnesses for long-running agents"
- Anthropic Engineering — "How we built our multi-agent research system"
- GitHub — "Spec-driven development with AI"
- Addy Osmani via O'Reilly Radar — "How to Write a Good Spec for AI Agents"
- Atlassian — "Agile Monte Carlo charts" — the mainstreaming signal
- Next module: PM Module 6 — Closed-Loop in Practice