Vectorless RAG

If you’ve built anything with LLMs in the past couple of years, you’ve probably wired up a Retrieval-Augmented Generation (RAG) pipeline. The playbook is burned into our brains: take a PDF, smash it into 512-token chunks, compute embeddings, shove them into a vector DB, and run a cosine similarity search when a user asks a question.

It works... until it doesn’t.

I’ve been banging my head against the wall with traditional RAG lately, especially on dense technical documentation. Blindly slicing a document into "chunks" obliterates the author's narrative flow. Worse, semantic similarity is a terrible proxy for factual relevance. Just because a chunk sounds like the query doesn't mean it holds the answer.

Lately, I’ve been experimenting with a totally different approach: Vectorless RAG (sometimes called Reasoning-based RAG). It throws out the vector database entirely. Instead of static math, it uses an LLM to perform agentic, context-aware retrieval.

Here’s a breakdown of how it works, what the trade-offs are, and how this exact same pattern is quietly taking over codebase search tools like Claude Code.

How Vectorless RAG Works

Vectorless RAG treats retrieval as an iterative reasoning task. It’s basically teaching an LLM how to read a book: look at the table of contents, find the right chapter, read it, and see if you have the answer.

Phase 1: The "In-Context" Tree Index

Instead of artificial chunking, we parse the document into a semantic, JSON-based hierarchy—essentially a highly detailed Table of Contents. This tree structure lives right in the LLM's context window.

Nodes: Chapters or sections become nodes.
Metadata: Every node gets a node_id, a title, a brief summary, and pointers to the raw data (like page or line numbers).
Hierarchy: Nodes contain sub-nodes, mapping out the whole document recursively.

Because we chunk by meaning (sections/chapters) rather than arbitrary token counts, we avoid context fragmentation entirely.

Phase 2: The Agentic Retrieval Loop

When a query comes in, the agent doesn't embed it. It reads the tree and executes a loop:

Read the ToC: Fetch the tree (just the structure and summaries, not the full text).
Reasoning: Evaluate the user's intent. Which node logically contains the answer?
Extract: Fetch the exact, unfragmented text for that specific node_id.
Evaluate: Ask: "Is this enough to answer the question?" If yes, generate the response. If no, go back to step 1 and pick a different node based on what was just learned.

The Claude Code Parallel: Vectorless Codebase Search

The shift away from vector DBs isn't just for PDFs. I've noticed the exact same architectural shift happening in developer tools. Look at how Anthropic's Claude Code navigates massive local repositories. It doesn't rely on embedded code snippets; it operates as an agent (Understand → Plan → Act → Verify).

Here is how Claude Code mirrors the Vectorless RAG pattern:

RAG Concept	Claude Code Implementation
Semantic Initialization	Parses `package.json`/`Cargo.toml` to build a dependency graph; recursively hunts for `CLAUDE.md` files to bootstrap architectural rules without loading the whole repo.
High-Speed Discovery	Ditches semantic search for fast bash utilities: uses `bfs` for structural mapping and `ugrep` for near-zero latency string matching.
Code Intelligence	Doesn't just match text; uses LSP-backed intelligence (AST parsing) to "jump to definition" and trace actual execution flows deterministically.
Context Management	Aggressively prunes noise. If a search returns hundreds of hits, it auto-compacts the logs down to core function signatures to save context tokens.

The Trade-Offs: Is it worth it?

Vectorless RAG solves the semantic mismatch problem, but it introduces new constraints. Here is the pragmatic breakdown.

The Good

True Relevance: Queries are about intent. An agent can deduce that "how to handle errors" maps to a specific chapter, even if the semantic overlap is low.
Zero Fragmentation: You get whole, coherent sections of text. Hallucinations drop significantly.
Handles Cross-References: Traditional RAG chokes on "see Appendix G" because the text lacks similarity to the target data. An agent just looks up Appendix G in its ToC.
Infrastructure: You can rip out your vector database entirely.

The Bad

Latency is high: A vector lookup takes milliseconds. An LLM reading a JSON tree and executing a multi-step reasoning loop takes seconds. You have to design your UI around this delay.
It gets expensive: Pumping a massive ToC into the prompt for every query, plus the tokens for the reasoning loop, burns through API credits much faster than a static vector search.
Scale Limits: You can't put the ToC of a million documents into a prompt. For massive corpora, you still need a traditional search pass to pre-filter down to a handful of relevant documents before the agent takes over.

Final Thoughts

Vectorless RAG is a fascinating shift. By treating documents like structured narratives instead of bags of embeddings, we unlock a level of precision that traditional RAG struggles to match.

While I wouldn't use it to filter Wikipedia, for deep, accurate Q&A on complex specs or codebase engineering, agentic retrieval is rapidly becoming the new standard. If you're building local coding agents or high-stakes document tools, it's time to start experimenting with reasoning loops over static vector math.

Vectorless RAG

How Vectorless RAG Works

Phase 1: The "In-Context" Tree Index

Phase 2: The Agentic Retrieval Loop

The Claude Code Parallel: Vectorless Codebase Search

The Trade-Offs: Is it worth it?

The Good

The Bad

Final Thoughts

More from this blog

Stacking Agent Memory: Checkpoints, Status Boards, and Active Context

Smart LLM Routing

Using Simulators to Evaluate Multi-Turn AI Agents

Why Grep Won't Save Your RAG Pipeline

Harnessing Conversational AI

Command Palette

How Vectorless RAG Works

Phase 1: The "In-Context" Tree Index

Phase 2: The Agentic Retrieval Loop

The Claude Code Parallel: Vectorless Codebase Search

The Trade-Offs: Is it worth it?

The Good

The Bad

Final Thoughts

More from this blog