Skip to main content

Command Palette

Search for a command to run...

All You Need is a Good Chunking

Published
4 min read

If you’ve spent any time building Retrieval-Augmented Generation (RAG) prototypes, you inevitably hit the exact same wall. You wire up a great embedding model, point it at an excellent local LLM, and the answers are still completely useless.

The culprit is almost always the chunking strategy. The core tension of RAG is chunk size: small chunks give you precise search hits, while large chunks give the LLM the surrounding context it needs to actually understand the text.

If you just naively slice a document up by character count, you end up chopping crucial sentences in half. You feed the LLM broken context, and it confidently hallucinates a response.

I’ve been looking at three distinct ways to fix this—Sentence, Recursive, and Hierarchical chunking—and the tooling ecosystem around them is finally getting genuinely interesting.

Sentence chunking: respecting grammatical boundaries

A sentence chunker tries to break text strictly at natural boundaries like periods or exclamation points. It splits the document into individual sentences, and then greedily batches them together until it hits your token limit.

Historically, the standard way to do this in LangChain was by dragging in massive Natural Language Processing libraries like spaCy or NLTK. I always hated this approach. Pulling in hundreds of megabytes of heavy NLP dependencies just to find a period feels absurdly wasteful.

This is why I’ve been really enjoying Chonkie. It’s a newer, ultra-lightweight library that handles sentence chunking perfectly while only requiring an 11MB base install. It runs exponentially faster than the older NLP approaches and respects your token limits beautifully.

Recursive chunking: the pragmatic default

Recursive chunking is the absolute gold standard for general-purpose text. Instead of a single rule, it uses a prioritized waterfall of separators.

It first tries to split by double newlines (\n\n) to get natural paragraphs. If a paragraph exceeds your token limit, it falls back to single newlines (\n), then spaces, and finally individual characters.

LangChain’s RecursiveCharacterTextSplitter does exactly this using standard string operations, which means it executes incredibly fast on virtually any hardware. Chonkie also ships with a highly optimized recursive chunker that pairs neatly with your specific LLM's tokenizer.

Because it prioritizes newlines, it perfectly preserves Markdown headers, bulleted lists, and code blocks. Unless you have a specific reason not to, this should be your default strategy.

The catch is that grammatical chunkers fail spectacularly on structured text. If you feed them Markdown lists or code blocks that lack periods, they completely mangle the formatting. I only use this approach for massive walls of unstructured prose, like podcast transcripts or raw audio logs.

Hierarchical chunking: throwing vision models at PDFs

What happens when your source material is a messy corporate PDF full of tables and multi-column layouts? Recursive chunkers completely choke on these because the line breaks are essentially arbitrary.

Hierarchical chunking treats the document as a visual structure rather than a flat string. It identifies where a table or a section lives, and injects a "breadcrumb trail" of context into every single chunk. A tiny bullet point gets permanently tagged with its parent path, like [Annual Report → Financials → Q3].

IBM’s Docling is currently the most compelling tool I’ve seen for this. It uses vision-language AI models (like Granite-Docling) to perform layout analysis on raw PDFs before doing any text splitting. Its HybridChunker ensures a table is never split across chunks and that context is perfectly preserved.

The trade-off here is hardware. Running vision models over a 100-page PDF will absolutely punish your unified memory and spike your ingestion costs. It is computationally heavy, but if you need to reliably query financial reports or complex contracts, it is practically mandatory.

The verdict

Choosing the right chunking method dictates the ceiling of your entire RAG pipeline. Here is my current mental model for picking one:

Strategy Best Tools Ideal Use Case Compute Cost
Recursive Chonkie Markdown, articles, general documentation. Extremely Low
Sentence Chonkie Audio transcripts, chat logs, unstructured prose. Low
Hierarchical Docling Complex PDFs, legal contracts, financial tables. High (Requires Vision AI)

Start with a recursive chunker around 500 tokens. It's fast, cheap, and handles 90% of what most people are trying to build.

Only upgrade to heavy, structure-aware tools like Docling when you actually observe your pipeline failing to read tables or losing the plot on complex documents.

Chunking

Part 1 of 1

RAG Chunking