Overview
This video from Chase AI demonstrates how to use RAG-Anything, an open-source tool from the same team that built LightRAG, to handle non-text documents (images, charts, scanned PDFs, LaTeX equations) in a RAG system. While most RAG systems can only process plain text, RAG-Anything adds multimodal document parsing by combining local models (MinerU, PaddleOCR) with LLM-powered extraction, then merging everything into a unified knowledge graph and vector database. Claude Code is used to automate the installation, ingestion, and querying workflow.
Key Concepts
The Problem with Text-Only RAG
- Most RAG systems, including LightRAG on its own, can only handle text documents
- Real-world documents contain images, charts, bar graphs, scanned text, and LaTeX equations
- Scanned PDFs are technically images, not text, so standard RAG pipelines cannot process them
- RAG-Anything solves this by adding a multimodal document processing layer on top of LightRAG
RAG-Anything Architecture
RAG-Anything operates as a wrapper around LightRAG, processing non-text documents through a multi-step pipeline:
Step 1: Document Parsing with MinerU
- MinerU is an open-source document parser that runs completely locally for free
- It identifies component parts of a document: headers, text blocks, charts, images, LaTeX equations
- MinerU does not understand the content; it only classifies and segments the document regions
- It includes multiple specialized sub-models for different document element types
Step 2: Specialized Model Processing
- Text blocks are sent to PaddleOCR (runs locally) to extract readable text from scanned content
- Charts and LaTeX equations are also converted to text by their respective specialized models
- Elements that cannot be converted to text (bar graphs, complex images) are screenshotted
- This creates two output buckets: a text bucket and an image bucket
Step 3: LLM Processing (GPT/OpenAI)
- Both buckets are sent to an LLM (default: GPT 5.4 Nano) with prompts to extract two things:
- Embeddings for vector database storage
- Entities and relationships for knowledge graph construction
- Images are sent as screenshots for OCR-style extraction by the LLM
Step 4: Merging
- From one document, four outputs are created: two vector databases and two knowledge graphs (one from text path, one from image path)
- These four outputs are merged into one vector database and one knowledge graph
- The RAG-Anything outputs are then merged with the existing LightRAG data
- Final result: a single unified vector database and knowledge graph that includes both text and non-text content
Why Two Separate Paths?
Processing text and images separately (rather than screenshotting the entire document) is a deliberate design choice that saves significant time and money. Having an LLM process thousands of screenshots would be expensive and slow. By using local models (MinerU, PaddleOCR) to handle text extraction first, only truly visual content needs LLM-powered OCR.
Setup and Installation
Prerequisites
- An existing LightRAG Docker instance already running (from the previous video in this series)
- Understanding of RAG concepts, knowledge graphs, and vector databases
Installation via Claude Code
- A one-shot prompt is provided that tells Claude Code to install everything automatically
- Must be run from inside your LightRAG directory
- The prompt handles three things:
- Updates the storage path for your existing Docker LightRAG instance
- Updates the default model from GPT 4.0 mini to GPT 5.4 Nano (customizable)
- Fixes the embedding double-wrap bug in the example scripts from the GitHub repo
- Uses text-embedding-3-large for embeddings (keeping everything on OpenAI for simplicity)
Performance Note: CPU vs. GPU
- By default, MinerU runs on your CPU, which can be slow for large documents
- To run on GPU, you need a different version of PyTorch
- You can simply ask Claude Code to set up GPU processing and it will handle the configuration
Workflow: Ingesting Non-Text Documents
Two Upload Paths
- Text documents: Continue using the LightRAG UI or the LightRAG skill (same as before)
- Non-text documents: Must use the RAG-Anything Python script (no UI available)
- A Claude Code skill wraps the Python script, so you just say: "Use the RAG-Anything skill to upload these documents"
- The skill also handles the required Docker container restart after ingestion
Querying
- Querying works identically to standard LightRAG, using plain language through Claude Code
- Claude Code can use the LightRAG API directly or through skills
- LightRAG retrieval parameters (on the right side of the retrieval section in the UI) can be tuned by Claude Code
Demo: Extracting Data from Charts
- A fake NovaTech SaaS revenue PDF with bar charts was ingested through RAG-Anything
- When queried for "monthly revenue trend for NovaTech Inc. from January through September 2025," Claude Code returned exact monthly figures that matched the chart data
- January: $4.6M, February: $4.9M, March: $5.4M, and so on
- This demonstrates that chart/image data is now fully queryable through the unified knowledge base
Key Takeaways
- RAG-Anything extends LightRAG to handle any document type, not just plain text
- The architecture uses local models first (MinerU, PaddleOCR) to minimize LLM API costs
- Everything merges into a single unified knowledge graph and vector database
- Claude Code automates both installation (one-shot prompt) and daily use (skills for ingestion and querying)
- Understanding the architecture helps you become a more capable AI developer, not just someone who copies prompts
- The system is relatively cheap compared to alternatives because of the local parsing layer
- The main tradeoff is a heavier install (MinerU and its dependencies) and two separate upload paths
Tools and Resources Mentioned
- RAG-Anything: Open-source multimodal document processing layer for RAG systems
- LightRAG: The underlying RAG system with knowledge graph and vector database
- MinerU: Open-source local document parser with specialized sub-models
- PaddleOCR: Local OCR model for extracting text from scanned documents
- Claude Code: Used to automate setup, ingestion, and querying
- GPT 5.4 Nano: Default LLM for entity/relationship extraction (customizable)
- Docker: Used to run the LightRAG server