Why Grep Won't Save Your RAG Pipeline

I’ve been reading through a recent paper titled "Is Grep All You Need? How Agent Harnesses Reshape Agentic Search". It’s a provocative piece with a premise I normally love. The authors claim that simple lexical tools like grep and regular expressions consistently outperform complex vector databases for AI agents.

I am a massive fan of boring, foundational technology. But while this paper makes one excellent point, its overarching conclusion is fundamentally flawed.

Stacking the deck for grep

The paper's central claim about the superiority of grep is built on a heavily stacked deck. The authors evaluated their claims using a tiny 116-question subset of a benchmark focused on long-horizon conversational memory.

While grep is great at finding specific dates in a chat log, this dataset is wildly unrepresentative of the challenges we face building real RAG systems. Navigating dense technical documentation or parsing legal contracts requires semantic understanding, which is exactly where grep fails.

Worse, the authors pre-processed their data into a highly structured format optimized for regex. They essentially solved the hardest part of the problem beforehand. It is entirely unsurprising that a tool designed for exact pattern matching wins when the data has been explicitly formatted into exact patterns.

The semantic and scalability walls

If you try to use this approach in a real-world application right now, you will immediately hit two walls. The first is semantic flexibility. If a user asks your agent about "termination clauses," but the document uses the phrase "cancellation conditions," a simple grep returns nothing.

The second wall is scalability. grep requires a full linear scan of your text. While that works for small local datasets, it falls apart against enterprise corpora. Running broad regex scans over hundreds of thousands of documents is computationally expensive and slow. Your agents will quickly exhaust their token budgets and spike latency trying to triage massive, unstructured shell outputs.

Breaking down the middle ground

The most frustrating aspect of this paper is the false dichotomy it presents. It assumes you must choose between computationally expensive dense vector search or unscalable, linear shell commands.

We shouldn't regress to linear shell commands when there is an established, robust middle ground. If we actually want to build practical tools, here are three better approaches you can use right now.

Pure BM25

BM25 uses an inverted index, meaning a search across millions of documents is nearly instantaneous. Better yet, it actually ranks the results based on term frequency.

This gives you the speed and precision of keyword matching without forcing your agent to do the heavy lifting of sorting through endless shell output.

Agent-Weighted Hybrid Search

You don't actually have to choose between lexical and semantic search. You can easily run both BM25 and a standard vector search in parallel.

The fun part is using a lightweight AI agent to dynamically choose the weights of each based on the query. If a user asks for a specific "error code 502", the agent cranks up the BM25 weight; if they ask a conceptual question, it leans heavily on the vector search.

Vectorless RAG

I won't rehash my full deep-dive on Vectorless RAG here, but the TL;DR is to throw out the vector database entirely and replace it with an LLM-driven reasoning loop.

Instead of arbitrary chunking, you parse your document into a semantic JSON tree. The agent reads this "Table of Contents," reasons about which section holds the answer, and fetches the exact, unfragmented text.

Why Grep Won't Save Your RAG Pipeline

Stacking the deck for grep

The semantic and scalability walls

Breaking down the middle ground

Pure BM25

Agent-Weighted Hybrid Search

Vectorless RAG

More from this blog

How to Build Privacy into LLM Agents Without Breaking Their Brains

Stacking Agent Memory: Checkpoints, Status Boards, and Active Context

Smart LLM Routing

Using Simulators to Evaluate Multi-Turn AI Agents

Command Palette

Stacking the deck for grep

The semantic and scalability walls

Breaking down the middle ground

Pure BM25

Agent-Weighted Hybrid Search

Vectorless RAG

More from this blog