Improve RAG with a simple technique
Are you burning tokens on irrelevant context? Or losing critical details by sending isolated chunks? There's a better way.
The Problem with Chunks
Once you've matched the top n similar chunks to a query, you could just put each chunk into context by itself. This will work ok, but we can do better! You could put whatever document the chunk came from entirely into context. This is better than supplying the chunk in isolation and works well if all the documents are consistent in size and relatively small. But what if there are really large documents in the corpus? You could end up sending a bunch of irrelevant context to the LLM. Maybe this is somewhat ok too, but it comes with additional latency and cost. If you're looking for a middle ground, you could try chunk windowing.