Letter No. 01··3 min

1,000 story points. One engineer. One sprint.

We crossed a line this week we weren't sure we'd hit this year. What 1,000 SP/contributor/sprint means, and the structural bet that made it possible — connecting AI agents to real work items.

To: Engineering leaders, AI-curious technologistsFrom: Steve Saper

We crossed a line this week we weren't sure we'd hit this year.

The number

1,000 story points shipped by a single contributor in one sprint.

Not lines of code. Not commits. Story points of estimated work, delivered. Industry baseline for a single engineer is 10–20 SP/sprint — call it 15. So 1,000 is roughly 67× a typical sprint for one engineer, or what a 10-person team would normally ship across a full quarter. One engineer hit it because the constraint stopped being "typing" and became "deciding what to ship."

Sprint output · one engineer
Industry baseline15 SP
10–20 SP/sprint is the typical max for one engineer
This sprint (loop)1,000 SP
closed loop end-to-end · agents on routed Briefs

That's the lift the closed loop is for. Briefs from VOC overnight, routed by capacity, shipped against machine-verifiable acceptance criteria, attributed against the outcome metric they promised. The engineer doesn't author the Brief, doesn't manually translate it to Jira, doesn't ping the reviewer, doesn't write the release note. The system does. They make the decisions and ship the code.

The bet — connecting agents to real work items

PM33 is now the first platform where AI agents act on the same work items your team does. That sounds like marketing copy. It isn't. Here's what it means structurally:

Not a coding assistant. Coding assistants take your prompt and write code. They don't know what a Brief is, why it exists, what metric it promised, or whether it shipped. The work item — the thing your team is graded on — is not in the loop.

Not a PM tool with AI bolted on. Most "AI in PM" today is summarization + drafting. The AI suggests; a human carries the work through Jira; the AI has no idea what shipped.

What we built is different. Agents pick up routed Briefs. A Brief is a real work item — capacity-aware, scored against strategic objectives, bound to an outcome metric, audited end-to-end. The agent ships against the Brief's machine-verifiable AC, the PR carries the Brief link, the metric moves or doesn't. Same routing, same review gates, same audit chain whether a human or an agent does the work.

Why that matters:

  • Outcome attribution survives the transition. If a Brief promised that latency drops 30% and the agent ships the PR, you know whether latency dropped — same data shape as a human shipping it. McKinsey's finding that 5.5% of orgs ship measurable AI value is a measurement problem, not an AI capability problem. Connecting agents to real work items is how the measurement happens.
  • The substrate compounds. Briefs from VOC, routed to capacity, shipped, attributed. The next batch of Briefs is drafted from what attribution taught us. The loop tightens each week.
  • Trust is built Brief-by-Brief. Agents earn the right to ship higher-stakes Briefs by track record, the same way humans do. Bad work gets caught at the same gates.

The risk we're holding: agents shipping bad work fast = a lot of bad work fast. The mitigation is structural — human-reviewed Briefs upstream, code review + CI downstream, audit log throughout. If the gates work, the loop works. If they don't, we'll see it before the trust accrues.

What's also shipping

  • Autopilot exits preview — end-to-end loop rolling out to pilot teams this week
  • 4 multi-week harnesses closed — Sprint Platform Unification, Pam Quality, VOC Auto-Triage, Autopilot Loop Pillar
  • A painful one: a 52-feature pillar harness deleted 25 marketing curriculum files outside its declared scope. We call it "absorption." Pre-merge diff verification against a declared file manifest is the structural fix. (Deserves its own letter — soon.)

The bet for next two weeks

The question 1,000 SP/contributor/sprint poses: is it a one-time peak or a new baseline?

If three more contributors hit it in the next 30 days, it's structural — the platform did the work. If only one engineer keeps hitting it, it's a person, and we mis-attributed. We'll know.

— Steve

Steve Saper