Cole Medin, who has logged over 2,000 hours using Claude Code since before its general availability in May 2025, presents his battle-tested framework for getting reliable results from AI coding assistants. The core insight: roughly 80% of the time a coding agent produces bad output, it is not a model problem. It is a context management problem. The WISC framework (Write, Isolate, Select, Compress) is designed to keep your context window lean while still giving the agent everything it needs.
Large language models suffer from the "needle in a haystack" problem. As the context window fills up, the model struggles to retrieve the specific piece of information it needs. Research from the Chroma technical report confirms this: just because tokens fit in the window does not mean you should fill it. "Distractor" content (similar but not identical patterns in a codebase) leads the model to pull wrong information with high confidence. This happens constantly with coding agents on larger codebases where files follow similar patterns.
The central question of the entire framework: How do we keep the context window as lean as possible while still giving the coding agent all of the context it needs?
Standardize your commit messages so your agent can read the git log and understand what has been worked on recently. Cole uses a dedicated /commit command that enforces a two-part format: what was built, and how the AI layer (rules/commands) was improved. This avoids over-engineered memory frameworks by leveraging a tool everyone already uses.
Always start a new context window for implementation. The workflow is: (1) plan with the agent in one session, producing a structured markdown spec, (2) start a brand-new session and pass only that spec as the initial context. The spec must contain everything the agent needs. Mixing research and implementation in the same session muddies the context window and degrades output quality.
When sessions run long (e.g., extensive end-to-end testing that consumes 200K+ tokens), use a /handoff command to create a structured summary document. A fresh agent session picks up that document and continues the work without carrying hundreds of thousands of tokens of tool calls and prior conversation.
Sub-agents can perform tens or hundreds of thousands of tokens of research (across your codebase or the web) and return only a concise summary to the main context. Anthropic's own research shows a 90%+ improvement in outcomes when using sub-agents for research loading versus having the main agent do everything. In Cole's example, two parallel sub-agents did extensive research but the main context only consumed 44,000 tokens (4% of the window).
Before committing context to your main session, send a sub-agent to explore documentation or code areas and report back whether that content is relevant. This prevents loading unnecessary context "just in case." For example, a scout sub-agent can scan a docs folder and recommend which documents actually matter for the current task.
Context should be layered, with each layer loaded only when needed:
Compression should be rare if Write, Isolate, and Select are done well. When it is necessary, two options exist:
The best compression strategy is not needing compression at all.