Skip to content

context engineering

Improve RAG with a simple technique

Are you burning tokens on irrelevant context? Or losing critical details by sending isolated chunks? There's a better way.

Chunk Windowing

Obligatory AI generated image of chunk windowing

The Problem with Chunks

Once you've matched the top n similar chunks to a query, you could just put each chunk into context by itself. This will work ok, but we can do better! You could put whatever document the chunk came from entirely into context. This is better than supplying the chunk in isolation and works well if all the documents are consistent in size and relatively small. But what if there are really large documents in the corpus? You could end up sending a bunch of irrelevant context to the LLM. Maybe this is somewhat ok too, but it comes with additional latency and cost. If you're looking for a middle ground, you could try chunk windowing.