Anthropic Just Dropped a Masterclass on Building Agent Harnesses (for Large Codebases)

Study Guide

Why the Harness Matters as Much as the Model

Most AI coding tutorials assume small projects. Real engineering happens in codebases with tens or hundreds of thousands of lines, decades-old legacy code, and architectures spread across dozens of repositories. Anthropic recently published guidance on using Claude Code in exactly that environment, and Cole Medin walks through both the article and a concrete demo repo that puts every strategy into practice. The central insight: model benchmarks matter, but the harness (the ecosystem of context and tools around the model) matters just as much. Cole calls this the "AI layer" of a codebase, sitting alongside code and tests as a first-class component.

How Claude Code Navigates a Codebase

Claude Code uses agentic search rather than traditional RAG or semantic indexing. There is no vector index to keep in sync. Instead, Claude navigates like an engineer would: grep, folder traversal, and command-line tools. The upside is zero index maintenance. The trade-off is that Claude needs enough starting context to know where to look, which is exactly what the AI layer provides.

The AI Layer: Seven Components

Every codebase now has three parts: code, tests, and the AI layer. The AI layer is made up of seven components, each mapping to a strategy:

Global rules (CLAUDE.md files)
Hooks
Skills
Plugins
Language Server Protocol (LSP)
MCP servers
Sub agents

1. Lean and Layered Global Rules

The foundation of the AI layer is your CLAUDE.md. It is loaded for the entire session, so context curation here matters more than anywhere else.

Keep it lean

Thousand-line rule files actively hurt performance. Studies show overloading the LLM with context degrades coding agent quality. Stick to: what the codebase is about, a brief tech stack/architecture note, general conventions, and gotchas like test commands or dev server startup.

Layer with subdirectory CLAUDE.mds

Claude automatically walks up the directory tree and loads every CLAUDE.md it finds. Put root-level rules at the repo root, and put service- or module-specific conventions in a CLAUDE.md inside that subdirectory. This is progressive disclosure: load only the conventions relevant to where you are working.

Initialize Claude in a subdirectory

If you already know the work is scoped to one part of the codebase (say, the API service), cd there and launch Claude Code from that directory. Claude will treat it as the current working directory and still load the root CLAUDE.md by walking up. You are doing the navigation for it.

Build a codebase map when structure doesn't speak for itself

For repos where the directory layout isn't self-explanatory, include a section in your global rules outlining each subdirectory and its purpose so Claude can help you find the right starting point.

2. Self-Improving Hooks

Most teams treat hooks defensively, using PreToolUse to stop Claude from editing protected paths or deleting files. The more valuable use is continuous improvement.

Stop hook: propose CLAUDE.md updates

A stop hook runs at the end of each Claude turn. It can launch a separate headless Claude session that reviews what changed, compares it to the existing CLAUDE.md files, and proposes updates while the context is fresh. The output goes to a markdown review file you can action on later. This solves a real problem: CLAUDE.md files go stale as the codebase evolves.

Start hook: load team-specific context

A start hook can dynamically pull in context at session launch: Git status, recent commits, Confluence docs for the team or service you're working on. Different developers get the right setup without manual configuration.

3. Scoped Skills

Skills are reusable workflows (think "process prompts") that Claude loads when it decides the description matches the task. In large codebases you'll have dozens or hundreds of task types, so progressive disclosure is essential.

Use the path parameter

The lesser-known skill parameter scopes activation to specific paths. A skill for "add an API route" only activates when Claude touches files in the API service directory. This enforces conventions exactly where they apply.

Rules vs. workflows

Cole's distinction: CLAUDE.md holds rules (every route is registered here). Skills hold workflows (here's how to add a new route step by step). Both should be scoped to the slice of the codebase where they matter.

4. Language Server Protocol via MCP

An LSP gives Claude the same navigation a developer has in their IDE: jump to definition, find references, type hints. For very large codebases, raw grep is slow and token-inefficient. A symbol-level search is far more directed.

Cole's demo wires an LSP into Claude Code through a local MCP server that exposes tools like where_is and find_references. Once you cross into six-figure line counts, you need something like this. Big companies often build custom LSPs for their internal stacks.

5. Sub Agents for Exploration

Anthropic's advice is simple: use sub agents to split exploration from editing. Exploration tasks (web research, codebase discovery) can balloon to hundreds of thousands of tokens. If your primary session does that work directly, the context window is bloated before any editing begins.

Dispatch exploration to a sub agent with its own context window. It returns a summary back to the primary session, which then has clean context for the actual edits. Claude Code now ships with a built-in explorer sub agent, so you don't have to define your own to start.

6. The Plugin Cole Built

Cole packaged the key pieces from his demo repo as a Claude Code plugin you can drop into any codebase:

The self-improving stop hook
An explorer sub agent
The code-base search MCP server (LSP)
An example path-scoped skill

Install via /plugin marketplace add <path-to-tooling-folder> then /plugin install helpline-ai-layer@helpline-tooling.

7. Assign Ownership for AI Layer Adoption

Anthropic's closing advice, which Cole strongly endorses from his consulting work: have a small team champion the initial AI layer build for the organization. Run a quiet investment period, produce something foundational (rules, skills, LSP, MCP), then roll it out. The alternative (every developer evolving their own AI layer in isolation) leads to inconsistent results and disappointment when new users first try the tool.

Key Takeaways

Harness matters as much as the model — model benchmarks are not the whole story; context and tooling around the model often determine success
Treat the AI layer as a first-class part of every codebase, alongside code and tests
Keep CLAUDE.md lean and layer it via subdirectory files; bloated rule files hurt performance
Use hooks for continuous improvement, not just guardrails
Scope skills to paths so conventions activate only where they apply
Add LSP via MCP once you cross into hundreds of thousands of lines of code
Dispatch exploration to sub agents to protect the primary context window
Standardize the AI layer organizationally instead of letting everyone reinvent it

Glossary

Harness / AI Layer — the set of context and tools wrapped around a coding agent: rules, skills, hooks, MCP servers, sub agents, LSP integrations, plugins
Agentic Search — navigating a codebase via command-line tools (grep, ls) rather than a pre-built semantic index
CLAUDE.md — Claude Code's global rules file; can exist at multiple levels of the directory tree
Progressive Disclosure — loading specialized context only when relevant, instead of dumping everything up front
Hook — a script that runs at lifecycle events (PreToolUse, Stop, SessionStart); can be defensive or self-improving
Skill — a reusable prompt/workflow Claude Code can invoke; can be scoped to specific paths
LSP (Language Server Protocol) — the protocol that powers IDE navigation features like go-to-definition and find-references
MCP (Model Context Protocol) — the protocol for exposing tools and external capabilities to Claude
Sub Agent — a delegated Claude session with its own context window, used for exploration tasks that would otherwise pollute the primary context