Study Guide

Harness Engineering: The Next Evolution of AI Coding

Cole Medin's central claim in this video is that AI coding has moved through three distinct eras, and we are now entering the third. Prompt engineering (2022 to 2024) was about squeezing the best single output from a model. Context engineering followed, focused on curating the perfect context for a single agent so it could handle a larger scope of work. The current era is harness engineering: orchestrating many coding agent sessions together with deterministic glue, validation, and context control. Archon, his newly rewritten open source project, is positioned as the first open source harness builder for AI coding.

What a Harness Actually Is

A harness is the layer that sits above your coding agents (Claude Code, Codex, and eventually others) and orchestrates them. It is the tooling, prompting, and chaining that elevates the capabilities of a single large language model by stringing multiple sessions together and enforcing deterministic behavior at key points. Medin frames this against "AI shepherding," where you manually run a skill, remember to kick off code review, remember to run tests, and generally hold the process in your head. A harness encodes the process once and executes it forever.

The business case is backed by concrete numbers he cites. A raw model generating code produces a PR acceptance rate of roughly 6.7%. Wrap that model in a solid custom harness with validation and context curation and acceptance can climb to nearly 70%. Stripe's internal "Minions" system ships 1,300 AI-only generated pull requests per week by applying exactly these patterns. And with the leaked Claude Code source, we can see that roughly 40% of Anthropic's own coding codebase is harness logic. The signal is unambiguous: the frontier of productivity is not the model, it is the harness around it.

Archon's Mental Model: Nodes and Workflows

Every Archon workflow is a sequence (or graph) of nodes. A node is one of two things:

A prompt node, which sends a prompt into a coding agent session.
A deterministic node, which runs a fixed command — context curation, a validation step, a bash invocation, a test runner.

This hybrid is the point. Some steps (classifying an issue, planning, implementing) benefit from model reasoning. Other steps (pulling the right files into context, running the full test suite, opening a PR) should never be left to agent discretion because the agent will occasionally forget. Deterministic nodes enforce the parts of the software development lifecycle you never want to skip.

Workflows are defined as plain YAML files in the .archon directory. Each workflow has a description (used by the coding agent to decide whether to invoke it, much like a Claude Code skill), a default provider and model, and an ordered list of nodes. Per node, you can override the model — Haiku for cheap classification, Sonnet for planning and implementation — which gives you real control over cost and rate limit exposure.

A Canonical Workflow: Plan to PR

Medin walks through the shape of a typical Archon workflow:

Plan in one session, optionally in a loop with human feedback.
Implement in a fresh context window to remove planning bias.
Run tests deterministically every time.
Retry with the agent if tests fail.
Human approval gate before shipping.
Open a pull request as the final deterministic step.

The critical design choice is that planning and implementation happen in different coding sessions. Starting with a fresh context window prevents the implementer from inheriting the planner's biases and assumptions. This is a pattern you could apply even without Archon, but the harness makes it default.

Setup in Under Five Minutes

Installation is deliberately minimal. You clone the Archon repo, open Claude Code inside it, and type set up Archon. Claude automatically loads the bundled Archon skill, which guides the rest of the process: checking prerequisites (including Bun), asking which target repository to register first (not the Archon repo itself — a separate project you want to run workflows against), and asking which platforms you want to connect (CLI by default, plus optional GitHub, Slack, Telegram).

Credentials are entered in a separate terminal spawned by the setup wizard to avoid piping API keys through the coding agent. You pick a database (SQLite by default, Postgres optional), a coding assistant (Claude now, Codex nearly done, Agent SDK and OpenCode on the roadmap), and an auth method. Using your Anthropic subscription is permitted because Archon runs locally via the Claude Agent SDK. The setup ends by installing the Archon skill into your target project so the coding agent there knows how to invoke the Archon CLI.

Running Workflows

Once installed, usage is almost conversational. In Claude Code inside your target repo you say use Archon to fix issue #1 in GitHub. The skill loads, the agent picks the right workflow (in this case archon-fix-github-issue), and invokes it. The workflow runs as a background process; you can monitor it through Claude Code logs, the Archon web UI on localhost:5178, or both.

The web UI visualizes the exact DAG defined in YAML. You see each node light up — web research, classification, investigation, implementation, validation, PR creation — along with every tool call the underlying Claude Code session made. The UI also hosts a chat agent with Archon's registered projects and workflows injected as context, so you can say fix GitHub issue #3 and it routes correctly without you naming the repo.

Parallelism Is a First-Class Feature

One of the most striking demos: Medin says use Archon to fix GitHub issues 5, 7, 8, 9, 10, and 11 and six workflow runs spin up simultaneously as background processes, each progressing through classification, investigation, and implementation in parallel. The end result is eight new pull requests ready for review. This is where harnesses differentiate from single-agent loops — you are no longer bottlenecked on one context window or one developer's attention.

The Bundled Workflows

Archon ships with a meaningful library out of the box:

Fix GitHub Issue — investigation, fix, validation, PR.
Adversarial Dev Harness — pairs implementation with aggressive review.
Comprehensive PR Review.
Idea to PR — end-to-end feature delivery.
Interactive PRD — human-in-the-loop spec creation that pauses at designated nodes for your input.
Ralph Loop — the well-known continuous-execution harness, now as YAML.
Workflow Builder — an Archon workflow that builds Archon workflows.

Building Your Own Harness

The workflow-builder workflow is the most "meta" piece, and it matters because custom harnesses are the whole point. Open Claude Code in the Archon repo and say use the workflow builder workflow to help me make an Archon workflow. It asks what you are trying to build, does research, and emits a YAML file you can immediately run against any registered project. In the demo Medin pastes the Beads repo (persistent structured memory for coding agents) and asks Archon to absorb its ideas into a workflow — exploration, decomposition into tasks, implementation in a loop with progress tracking, final validation. The result is a Beads-flavored harness in minutes.

Key Takeaways

Harness engineering is the current frontier. Models are commoditizing; the scaffolding around them is where multiplicative gains come from.
Hybrid is the secret. Deterministic nodes enforce what models forget; prompt nodes handle what rules cannot express.
Separate planning from implementation. Fresh context windows remove bias.
Per-node model selection is a real cost lever. Haiku for classification, Sonnet for reasoning, Opus for the hardest steps.
Parallelism is the payoff. Once your process is encoded, running six workflows at once becomes a single command.
Build your own. Stripe Minions is proprietary; Archon is open source. Encode your SDLC instead of borrowing someone else's.

For practitioners already fluent in Claude Code skills and commands, Archon is the natural next layer up. The payoff is not faster typing, it is a repeatable, reviewable, parallelizable process that you can trust with real work.

The Next Evolution of AI Coding Is Harnesses - Here's How to Build Them