Planning ↔ Engineering in the Loop Era

In this module

Learning objectives
The classic PM ↔ Engineering interface — what it was doing
What changes when AI agents execute — and what doesn't
The new shape of the interface — three handoff modes
What PMs still owe engineering — even more, not less
What's now automated that used to be human — capacity, dependencies
Where human judgment remains load-bearing — the boundaries
Sidebar: how PM33 makes the interface legible
Discussion prompts

Learning objectives

After this module you should be able to:

Describe the three handoff modes between planning and engineering in the loop era
Articulate what PMs still owe engineering (and why it's more, not less)
Identify what's automated (capacity-aware scheduling, dependency surfacing) and what isn't
Locate the human-judgment boundary in your own team's interface

The classic PM ↔ Engineering interface

In the Agile era, the PM ↔ Engineering interface was a structured set of conversations across a sprint cycle:

Conversation	Frequency	Primary content
Backlog grooming	Weekly	What's coming, is it ready, what's blocked
Sprint planning	Bi-weekly	What fits, who does what, dependencies
Daily standup	Daily	What's progressed, what's blocked
Sprint demo	Bi-weekly	What shipped, did it land
Retrospective	Bi-weekly	What hurt, what to improve
Ad-hoc (Slack, hallway)	Constant	Clarifications, scope changes, blocker escalation

The interface was high-touch, high-bandwidth, and human-to-human throughout. PMs and engineers built relationships through the cadence. Trust was personal.

This worked well at human-team scale. When some of the team's work becomes AI-executed, the interface shape changes.

What changes when AI agents execute — and what doesn't

The biggest single change: the unit-of-conversation between PM and engineer becomes denser per occurrence and less frequent overall.

When a senior engineer reviews 12 stories per sprint planning meeting, each gets ~5 minutes. When the same engineer reviews 3 Briefs that will be AI-executed, each gets 20+ minutes because the spec quality has to be high enough to survive non-judgment execution.

What changes:

Aspect	Agile era	Loop era
Conversation frequency	High (multiple per week)	Lower (per Brief, not per sprint)
Conversation depth	Medium (humans fill gaps)	Higher (specs are load-bearing)
Status updates	Verbal (standup)	Lifecycle events (audit log)
Demo cadence	Sprint-bounded	Continuous (outcome attribution)
Decision authority for scope changes	PM (often)	PM, but with structural cost — scope change interrupts agent execution
Engineer-side "I'd interpret it this way"	Constant	Rare and explicit (asked via structured channels)

What does NOT change:

The PM still owns the "why" and the "what"
The engineer still owns the "how" — including the harness, the Brief verification gates, and which specialists execute which Briefs
Cross-functional decisions (design trade-offs, security concerns, accessibility) still require human conversations
Customer escalations route through the PM first
Engineering escalations (tech debt, incidents, infrastructure decisions) route through engineering leadership first

The shape of the relationship changes; the underlying division of responsibility doesn't.

The new shape of the interface — three handoff modes

In the loop era, PM ↔ Engineering handoffs split into three modes with different bandwidth and ceremony:

Mode 1: Structured handoff (most common)

The PM authors a Brief; engineering doesn't see it directly because the AI specialist picks it up. Engineering is involved when:

The Brief is filed — engineering reviews structure, flags missing dependencies, signs off
A verification gate fails — engineering investigates whether the gate is wrong or the Brief is wrong
The PR lands — engineering does code review on the diff before merge

Bandwidth: low (structured artifact, audit log, code review). The engineer's time per Brief is concentrated at filing and at PR review.

Mode 2: Conversational handoff (load-bearing for novel work)

When a Brief involves genuinely novel architecture, novel data flow, or novel external integration, the PM and engineering have a conversation BEFORE the Brief is filed. The conversation produces:

A specification depth the AI specialist can't be expected to fill from priors
A specialist + LLM tier choice that reflects the novelty (often Opus, often a specific specialist class)
A dependency map that the scheduler can use

Bandwidth: high during the conversation, low after. The conversation replaces what would have been mid-sprint scope thrash in the Agile era.

Mode 3: Escalation handoff (rare, important)

When something goes structurally wrong — repeated verification failures, AR(1) priors drifting, harness configuration breaking — the Loop Master escalates to engineering. This is genuinely rare in steady state. When it happens, it requires the highest-bandwidth conversation of the three modes.

Bandwidth: high, urgent, time-bounded. Treated like an incident.

The pattern: the interface gets stratified. Routine work has structured handoff. Novel work has conversational handoff at the front. System-level problems get escalation handoff. Each mode has its own bandwidth budget and shouldn't be conflated.

What PMs still owe engineering — even more, not less

A common misread of the Adaptive era is "AI does the work, so PMs can write looser specs." The opposite is true. The PM's spec quality matters MORE because the AI specialist can't fill ambiguity with judgment.

What PMs still owe engineering, and what's increased in the loop era:

What's owed	Was-it-important-before	Loop-era importance
Clear acceptance criteria	Yes	Critical — machine-verifiable, not prose
Strategic context (why this Brief exists)	Yes	Critical — informs LLM tier choice and specialist routing
Explicit dependencies (UUIDs, not prose)	Helpful	Critical — the scheduler depends on it
Performance budget	Sometimes	Critical — sets the constraint the specialist optimizes against
Side-effect disclosure (auth, schema, audit)	Often missed	Critical — drives specialist policy override (security-auditor on auth Briefs)
Edge-case enumeration	Often missed	Critical — the agent will not enumerate them on its own
outcomeHook (predicted metric movement)	Rarely required	Important — closes the loop; engineering wants to see this too

The shift: ambiguity that used to be resolved during execution now must be resolved during specification. That work doesn't disappear — it moves left. PMs do MORE specification work per Brief than they did per Story. They do it for FEWER Briefs (because each one carries more) but with HIGHER quality per spec.

Anthropic's harness blog (November 2025) is the clearest statement of this:

"Without detailed task descriptions, agents duplicate work, leave gaps, or fail to find necessary information."

GitHub's Spec Kit (September 2025) makes the same point with a sharper edge:

"A vague prompt like 'add photo sharing to my app' forces the model to guess at potentially thousands of unstated requirements."

PMs who internalize this become indispensable. PMs who don't end up writing Stories in Brief clothing (Module 3's failure mode), which produces expensive rework that gets blamed on "the AI doesn't understand what we wanted."

What's now automated that used to be human

Several things that took PM ↔ Engineering bandwidth in the Agile era are now mechanized:

Was human bandwidth	Now mechanized
"How much capacity does frontend have this sprint?"	Capacity-aware scheduler reads team config + leave + velocity history
"Does BRIEF-X depend on BRIEF-Y?"	Dependency graph surfaces this in the scheduler proposal
"Where is BRIEF-X right now?"	Lifecycle event audit log
"Did BRIEF-X land?"	AutoDoneVerificationService flips status based on verification signals
"What's the team's velocity this quarter?"	Outcome attribution dashboard (replaces velocity entirely)
"What's the AR(1) confidence on BRIEF-X landing this week?"	Computed automatically

The pattern: factual questions are mechanized; judgment questions remain human. The PM and engineer don't spend time on "where are we?" anymore. They spend their saved time on the judgment questions that were always the high-value work.

Atlassian's productization signal (2024-2025) is worth noting: Jira itself is building probabilistic forecasting (Monte Carlo charts) as a built-in feature. The mechanization of factual delivery questions is becoming a tooling commodity.

Where human judgment remains load-bearing

Despite the mechanization above, several judgment surfaces remain stubbornly human:

Should we build this at all? — strategic judgment, customer empathy, market timing. Not delegatable.
Is this Brief specified well enough to ship? — the spec-quality call requires human judgment about what's worth saying explicitly.
Is the outcomeHook the right metric? — measuring the wrong thing is worse than measuring nothing. Picking the right metric is taste.
When to override the scheduler proposal? — three legitimate reasons (private information, missed dependency, strategic objective changed). Recognizing them requires context the scheduler doesn't have.
When a verification gate is wrong vs. the work is wrong? — investigating a Brief that won't convert requires human judgment about the gate's design.
Cross-functional disagreements — eng vs. design vs. PM. Humans broker.

A useful mental model: mechanize the factual layer; defend the judgment layer. The PM ↔ Engineering interface in the loop era spends almost all its time in the judgment layer, because the factual layer is largely automated.

PM33's design opinions on the PM ↔ Engineering interface:

The Brief schema makes the spec contract explicit. Engineering can read a Brief and immediately know: what's the input contract, what's the output contract, what are the side effects, what's the performance budget, what's the dependency graph. There's no "ask the PM what they meant."
Lifecycle events replace standup. When a Brief transitions states, a structured event fires. Engineering sees the audit log; PM sees the morning summary. The "where are we?" question is answered without a meeting.
The scheduler shows its work. pm33_optimize_priorities returns a proposal with reasons — alignment scores, AR(1) confidence per Brief, capacity constraints surfaced, dependency edges visible. PMs and engineers can argue with the proposal because the inputs are legible.
Verification gates are documented. When the AutoDoneVerificationService keeps a Brief in in_progress, the gate that's failing is named. Engineering doesn't have to ask "why isn't this done?" — the log says.
Specialist routing is policy-driven, not ad hoc. The matrix (backend-architect at Opus, frontend-developer at Opus with frontend-design skill, security-auditor on auth Briefs) is documented and enforceable. PMs and engineers agree on the matrix once; they don't re-negotiate it per Brief.

These mechanisms aren't PM33-specific. Any closed-loop system that takes the Brief/Loop pattern seriously will need to make the interface legible in similar ways. PM33 just chose specific opinions.

Discussion prompts

Pick the last 3 mid-sprint scope thrashes your team had. Which mode of handoff was missing (structured Brief? front-load conversation? escalation pathway)?
What spec-quality habit would your PMs need to develop to make Mode 1 (structured handoff) work reliably?
Of the six judgment surfaces above, which does your team get right today? Which is owned implicitly without an owner?
The mechanization shift — what does your engineering team currently spend time on that has become mechanizable?