Claude Code + RAG-Anything = LIMITLESS

Study Guide

Overview

This video from Chase AI demonstrates how to use RAG-Anything, an open-source tool from the same team that built LightRAG, to handle non-text documents (images, charts, scanned PDFs, LaTeX equations) in a RAG system. While most RAG systems can only process plain text, RAG-Anything adds multimodal document parsing by combining local models (MinerU, PaddleOCR) with LLM-powered extraction, then merging everything into a unified knowledge graph and vector database. Claude Code is used to automate the installation, ingestion, and querying workflow.

Key Concepts

The Problem with Text-Only RAG

Most RAG systems, including LightRAG on its own, can only handle text documents
Real-world documents contain images, charts, bar graphs, scanned text, and LaTeX equations
Scanned PDFs are technically images, not text, so standard RAG pipelines cannot process them
RAG-Anything solves this by adding a multimodal document processing layer on top of LightRAG

RAG-Anything Architecture

RAG-Anything operates as a wrapper around LightRAG, processing non-text documents through a multi-step pipeline:

Step 1: Document Parsing with MinerU

MinerU is an open-source document parser that runs completely locally for free
It identifies component parts of a document: headers, text blocks, charts, images, LaTeX equations
MinerU does not understand the content; it only classifies and segments the document regions
It includes multiple specialized sub-models for different document element types

Step 2: Specialized Model Processing

Text blocks are sent to PaddleOCR (runs locally) to extract readable text from scanned content
Charts and LaTeX equations are also converted to text by their respective specialized models
Elements that cannot be converted to text (bar graphs, complex images) are screenshotted
This creates two output buckets: a text bucket and an image bucket

Step 3: LLM Processing (GPT/OpenAI)

Both buckets are sent to an LLM (default: GPT 5.4 Nano) with prompts to extract two things:
Embeddings for vector database storage
Entities and relationships for knowledge graph construction
Images are sent as screenshots for OCR-style extraction by the LLM

Step 4: Merging

From one document, four outputs are created: two vector databases and two knowledge graphs (one from text path, one from image path)
These four outputs are merged into one vector database and one knowledge graph
The RAG-Anything outputs are then merged with the existing LightRAG data
Final result: a single unified vector database and knowledge graph that includes both text and non-text content

Why Two Separate Paths?

Processing text and images separately (rather than screenshotting the entire document) is a deliberate design choice that saves significant time and money. Having an LLM process thousands of screenshots would be expensive and slow. By using local models (MinerU, PaddleOCR) to handle text extraction first, only truly visual content needs LLM-powered OCR.

Setup and Installation

Prerequisites

An existing LightRAG Docker instance already running (from the previous video in this series)
Understanding of RAG concepts, knowledge graphs, and vector databases

Installation via Claude Code

A one-shot prompt is provided that tells Claude Code to install everything automatically
Must be run from inside your LightRAG directory
The prompt handles three things:
- Updates the storage path for your existing Docker LightRAG instance
- Updates the default model from GPT 4.0 mini to GPT 5.4 Nano (customizable)
- Fixes the embedding double-wrap bug in the example scripts from the GitHub repo
Uses text-embedding-3-large for embeddings (keeping everything on OpenAI for simplicity)

Performance Note: CPU vs. GPU

By default, MinerU runs on your CPU, which can be slow for large documents
To run on GPU, you need a different version of PyTorch
You can simply ask Claude Code to set up GPU processing and it will handle the configuration

Workflow: Ingesting Non-Text Documents

Two Upload Paths

Text documents: Continue using the LightRAG UI or the LightRAG skill (same as before)
Non-text documents: Must use the RAG-Anything Python script (no UI available)
A Claude Code skill wraps the Python script, so you just say: "Use the RAG-Anything skill to upload these documents"
The skill also handles the required Docker container restart after ingestion

Querying

Querying works identically to standard LightRAG, using plain language through Claude Code
Claude Code can use the LightRAG API directly or through skills
LightRAG retrieval parameters (on the right side of the retrieval section in the UI) can be tuned by Claude Code

Demo: Extracting Data from Charts

A fake NovaTech SaaS revenue PDF with bar charts was ingested through RAG-Anything
When queried for "monthly revenue trend for NovaTech Inc. from January through September 2025," Claude Code returned exact monthly figures that matched the chart data
January: $4.6M, February: $4.9M, March: $5.4M, and so on
This demonstrates that chart/image data is now fully queryable through the unified knowledge base

Key Takeaways

RAG-Anything extends LightRAG to handle any document type, not just plain text
The architecture uses local models first (MinerU, PaddleOCR) to minimize LLM API costs
Everything merges into a single unified knowledge graph and vector database
Claude Code automates both installation (one-shot prompt) and daily use (skills for ingestion and querying)
Understanding the architecture helps you become a more capable AI developer, not just someone who copies prompts
The system is relatively cheap compared to alternatives because of the local parsing layer
The main tradeoff is a heavier install (MinerU and its dependencies) and two separate upload paths

Tools and Resources Mentioned

RAG-Anything: Open-source multimodal document processing layer for RAG systems
LightRAG: The underlying RAG system with knowledge graph and vector database
MinerU: Open-source local document parser with specialized sub-models
PaddleOCR: Local OCR model for extracting text from scanned documents
Claude Code: Used to automate setup, ingestion, and querying
GPT 5.4 Nano: Default LLM for entity/relationship extraction (customizable)
Docker: Used to run the LightRAG server