This video demonstrates how to run Claude Code entirely on your local machine using Ollama and open-source models. The result is a setup that is completely free, fully private (no data leaves your computer), and still leverages the Claude Code agent harness. The presenter walks through the tradeoffs, the installation process, how to choose the right local model, and the practical use cases where a local setup makes the most sense.
Claude Code is fundamentally an "agentic harness" that sits on top of a language model. Normally it connects to Anthropic's servers and uses Sonnet 4.6 or Opus 4.6. But because it is a harness, you can swap the underlying model with a local open-source alternative served through Ollama. You keep all of Claude Code's infrastructure (tool use, file editing, agentic loops) while your conversations, code, and data never leave your machine.
The video uses SWE-bench Verified as a benchmark reference point. Sonnet and Opus 4.6 score around 80% on this coding benchmark. The best open-source local models perform as follows:
The presenter frames the gap this way: the local models you can actually run are roughly equivalent to Claude 3.7 Sonnet from about a year ago (February 2025). That is not a bad model. It just means you are trading cutting-edge performance for privacy and zero cost.
Local inference is significantly slower than cloud-based inference. The video includes a side-by-side demo showing a simple "hi" prompt, and the difference is dramatic. Your hardware cannot match the inference infrastructure at Anthropic. Even on a high-end GPU (the presenter uses an NVIDIA 5090), the speed gap is notable.
Cloud-based Claude Code on Opus 4.6 routinely executes 30 to 50 tool calls in a single agentic loop. Local models are less reliable at sustaining that level of multi-step tool use. The simpler the task, the better local models perform.
Visit ollama.com and follow the installation instructions. Ollama is the runtime that downloads and serves open-source models locally.
Three recommended approaches for picking the right model:
These approaches ensure you always get the best model for your hardware at the time you are setting up, since the open-source landscape changes rapidly.
Use ollama pull [model-name] (not ollama run, which starts an interactive session). For example: ollama pull glm4.7-flash.
Create an alias (e.g., claude-local) that configures environment variables to point Claude Code at Ollama instead of Anthropic's API. The key variables are:
http://localhost:11434, which is where Ollama serves its API.This lets you keep your normal claude command for cloud-based usage and use claude-local for the local setup. Both can run simultaneously.
If you hit your usage cap on the $20, $100, or $200 Claude plan, the local setup serves as a free fallback while you wait for limits to reset.
Writing, basic content generation, research summaries, and anything that does not require deep multi-step tool calls or massive context windows. The presenter emphasizes that most people's projects are simpler than they think.
When working with sensitive client data or in situations where you cannot send code or data to external servers, a fully local setup ensures nothing leaves your machine. This is distinct from enterprise data agreements, which require negotiation with providers.
Ollama also offers a cloud API option where you can run more powerful models (the ones that require 48+ GB of RAM) on Ollama's servers for a fee. This gives you access to models like GLM 4.7 at near-Opus performance, but your data is no longer fully private since it passes through Ollama's infrastructure. It sits between fully local and fully cloud-based Anthropic usage.