Stripe's Coding Agents Ship 1,300 PRs EVERY Week - Here's How They Do It

Study Guide

Overview

Stripe now ships over 1,300 pull requests every week that are entirely AI-written, with humans reviewing but not writing any of the code. They accomplish this through an internal agent harness called Minions, which combines agentic AI nodes with deterministic workflow steps to achieve reliability at scale. This video breaks down how Minions works, why the pattern of structured AI workflows matters, and how you can apply the same principles to your own coding practice.

Why Stripe Is a Compelling Case Study

Complex codebase: Stripe runs a Ruby backend with a vast number of homegrown libraries that LLMs are generally unfamiliar with.
High stakes: They process over $1 trillion in payment volume per year, so code quality cannot be compromised.
Scale: With 3,400+ engineers shipping 8,000+ PRs per week, the 1,300 AI-written PRs represent a growing but still small percentage of total output.

If Stripe can build a system reliable enough for their environment, the pattern is applicable to virtually any company.

The Industry Trend: Structured AI Workflow Engines

Stripe is not alone in building these systems. Several major companies have developed their own structured AI workflow harnesses:

Shopify created Roast, an open-source structured AI workflow engine.
Airbnb is using similar approaches, especially for test migrations.
AWS has internal tooling they have begun sharing publicly.

The common thread: these companies are moving away from giving agents full autonomy and toward systems that control the agents.

The High-Level Pattern: Workflows vs. Agents

Key Distinction

Stripe draws a critical distinction between two orchestration primitives:

Workflow: An LLM system that operates via a fixed graph of steps. The model is built into the system. More deterministic and predictable.
Agent: A simpler loop with tools where the agent defines the workflow in real time by deciding what tool calls to make. More flexible but less predictable.

The Core Insight

"The agent isn't controlling the system. The system is controlling the agent."

By combining deterministic nodes (guaranteed steps like linting, type checking, test execution) with agentic nodes (code generation, bug fixing), you get reliability that pure agentic approaches cannot match. The system forces the agent to retry whenever validation fails, rather than trusting the agent to self-validate.

How Stripe Minions Works

Entry Point

Engineers interact with Minions primarily through Slack. They start a thread with detailed context about the issue, including relevant parts of the codebase the agent should examine. The emphasis on providing rich upfront context is critical for one-shot success.

Blueprints: The Workflow Definition

The core concept in Minions is blueprints - workflow definitions that combine agent and deterministic nodes. Stripe describes them as "a collection of agent skills interwoven with deterministic code so that particular sub-tasks can be handled more appropriately."

Step 1: Deterministic Context Curation

Before the agent ever runs, a deterministic node handles context curation with two functions:

Documentation gathering: MCP tools are used programmatically (not by an agent) to search tickets, documentation, and relevant information, combining it with the Slack prompt.
Tool selection: From Stripe's massive internal MCP server called Tool Shed (with ~500 tools for internal systems and SaaS platforms), the system selects a relevant subset of tools to give the agent, avoiding tool overload.

Step 2: Agent Implementation

The agent runs in an isolated dev box - an AWS EC2 instance spun up for each Minion run. Key details:

Instances come preloaded with Stripe codebases, lint caching, and everything needed for fast execution.
They follow a "cattle not pets" philosophy - instances are disposable and scaled based on demand.
This approach solves permissioning and scaling issues that come with work trees or containers on developer laptops.
A single engineer can have many Minions running in parallel.

Step 3: Deterministic Linting and Type Checking

After the agent writes code, a deterministic step runs linting and type checking (using Sorbet for Ruby). Any failures loop back to the agent for fixes.

Step 4: Deterministic Test Execution

Stripe has over 3 million tests in their CI suite. The system selects a relevant subset of tests to run against the change. If tests fail, the agent gets feedback and can retry - but only a maximum of two times before escalating to a human to prevent infinite loops.

Step 5: Human Review

Every Minion run ends with human review. Stripe never "vibe codes" - there is always a human in the loop. The structured workflow makes reviews faster and the code more reliable by the time it reaches an engineer.

Why Determinism Matters

Stripe explains their rationale clearly: "Writing code to deterministically accomplish small decisions we can anticipate - like we always want to lint changes at the end of a run - saves tokens, so it saves on cost at scale. And it gives the agent a little less opportunity to get things wrong."

Key benefits of deterministic nodes:

Token savings: Deterministic steps cost nothing in LLM tokens.
Reliability: LLMs can skip steps or deviate from instructions; deterministic steps guarantee execution.
Compounding reliability: Contained agent boxes with separate deterministic steps create system-wide reliability improvements.

The PIV Loop: Applying This Yourself

The video presents the PIV (Plan, Implement, Validate) loop as a generalized version of Stripe's approach:

Planning Phase

Start with initial context (feature request, GitHub issue).
Have the agent plan the implementation.
Iterate with the agent on the plan, especially validation criteria.
Output a structured plan document.

Implementation Phase

Start a fresh context window with only the plan - this keeps the implementation agent focused.
The context window split is a deterministic decision you enforce.

Validation Phase

Run linting, type checking, and unit tests deterministically.
Loop the agent back to fix any failures.
End with human review and merge.

Getting Started

Study Shopify's Roast: Since Stripe Minions is not open source, Roast serves as a practical reference implementation with quick starts showing deterministic + agentic node combinations.
Build incrementally: You do not need 500 MCP tools or 3 million tests. The pattern works at any scale - even enforcing a fresh context window between planning and implementation adds determinism.
Archon: The presenter mentions Archon as an open-source project evolving into a workflow builder for AI coding workflows, described as "n8n for AI coding."

Key Takeaway

The industry is moving not toward more power to the agents but toward more power to the system controlling the agents. Building structured workflows that combine agentic flexibility with deterministic guarantees is how organizations achieve reliable AI-assisted coding at scale.