Claude Code: 100% Free. 100% Private. 100% Local.

Study Guide

Overview

This video demonstrates how to run Claude Code entirely on your local machine using Ollama and open-source models. The result is a setup that is completely free, fully private (no data leaves your computer), and still leverages the Claude Code agent harness. The presenter walks through the tradeoffs, the installation process, how to choose the right local model, and the practical use cases where a local setup makes the most sense.

The Core Concept

Claude Code is fundamentally an "agentic harness" that sits on top of a language model. Normally it connects to Anthropic's servers and uses Sonnet 4.6 or Opus 4.6. But because it is a harness, you can swap the underlying model with a local open-source alternative served through Ollama. You keep all of Claude Code's infrastructure (tool use, file editing, agentic loops) while your conversations, code, and data never leave your machine.

Understanding the Tradeoffs

Model Quality Gap

The video uses SWE-bench Verified as a benchmark reference point. Sonnet and Opus 4.6 score around 80% on this coding benchmark. The best open-source local models perform as follows:

  • GLM 4.7 scores 73.8% (about 91% of Opus), but requires 48 GB of RAM, putting it out of reach for most consumer hardware.
  • Qwen 3 Coder is similarly powerful but also demands high-end hardware.
  • GLM 4.7 Flash scores 59.2%, which is runnable on a modern MacBook Pro.
  • Qwen 2.5 and other smaller-parameter models fall in a similar range.

The presenter frames the gap this way: the local models you can actually run are roughly equivalent to Claude 3.7 Sonnet from about a year ago (February 2025). That is not a bad model. It just means you are trading cutting-edge performance for privacy and zero cost.

Speed

Local inference is significantly slower than cloud-based inference. The video includes a side-by-side demo showing a simple "hi" prompt, and the difference is dramatic. Your hardware cannot match the inference infrastructure at Anthropic. Even on a high-end GPU (the presenter uses an NVIDIA 5090), the speed gap is notable.

Tool Call Limitations

Cloud-based Claude Code on Opus 4.6 routinely executes 30 to 50 tool calls in a single agentic loop. Local models are less reliable at sustaining that level of multi-step tool use. The simpler the task, the better local models perform.

Installation Walkthrough

Step 1: Install Ollama

Visit ollama.com and follow the installation instructions. Ollama is the runtime that downloads and serves open-source models locally.

Step 2: Choose Your Model

Three recommended approaches for picking the right model:

  1. Ask Claude Code itself. Spin up your normal Claude Code instance and ask it to recommend an open-source model based on your hardware.
  2. Use the LLM Fit tool. An open-source GitHub repository that analyzes your system specs and recommends compatible models.
  3. Ask any AI chatbot. Describe your hardware and goals, and it will walk you through the selection.

These approaches ensure you always get the best model for your hardware at the time you are setting up, since the open-source landscape changes rapidly.

Step 3: Pull the Model

Use ollama pull [model-name] (not ollama run, which starts an interactive session). For example: ollama pull glm4.7-flash.

Step 4: Set Up a Shell Alias

Create an alias (e.g., claude-local) that configures environment variables to point Claude Code at Ollama instead of Anthropic's API. The key variables are:

  • ANTHROPIC_AUTH_TOKEN is set to "ollama" (no real API key needed).
  • ANTHROPIC_BASE_URL is set to http://localhost:11434, which is where Ollama serves its API.

This lets you keep your normal claude command for cloud-based usage and use claude-local for the local setup. Both can run simultaneously.

When to Use Local Claude Code

1. Token or Usage Limit Overflow

If you hit your usage cap on the $20, $100, or $200 Claude plan, the local setup serves as a free fallback while you wait for limits to reset.

2. Simple or Straightforward Tasks

Writing, basic content generation, research summaries, and anything that does not require deep multi-step tool calls or massive context windows. The presenter emphasizes that most people's projects are simpler than they think.

3. Data Privacy Requirements

When working with sensitive client data or in situations where you cannot send code or data to external servers, a fully local setup ensures nothing leaves your machine. This is distinct from enterprise data agreements, which require negotiation with providers.

The Middle Ground: Ollama Cloud

Ollama also offers a cloud API option where you can run more powerful models (the ones that require 48+ GB of RAM) on Ollama's servers for a fee. This gives you access to models like GLM 4.7 at near-Opus performance, but your data is no longer fully private since it passes through Ollama's infrastructure. It sits between fully local and fully cloud-based Anthropic usage.

Key Takeaways

  • Local Claude Code is not a replacement for cloud Claude Code. It is a complement for specific scenarios.
  • The open-source model ecosystem is roughly one year behind frontier models in terms of what you can run on consumer hardware.
  • Privacy and cost are the primary reasons to go local, not performance.
  • Open-source models are improving rapidly. The gap will continue to narrow.
  • The setup is straightforward: install Ollama, pull a model, create a shell alias, and you are running.
YouTube