AI Interview: Chunk Wisely To Avoid RAG Hell

DataStax's Ed Anuff on the finer points of AI app development

John Leonard
clock • 4 min read
AI Interview: Chunk Wisely To Avoid RAG Hell

"Almost any developer worth their salt could build a RAG application with an LLM, once they understand the basics of it," said chief product officer at DataStax, Ed Anuff.

"And then chunking hits and your results start to get really wonky. Now you're in RAG hell, and you have to go off and Google."

Most of us, we suspect, would have gone off and Googled well before that point. RAG? Chunking? RAG hell? We asked Anuff to explain.

Retrieval augmented generation (RAG) has quickly risen to become a standard feature of many natural language genAI applications. It uses a vector database as a halfway house between the user's prompt and the large language model (LLM). The prompt (or context) goes first to the vector database, which might contain only internal information from the user's organization. This then adds context to the prompt before sending it on the LLM to be processed. For example, it might say answers should be restricted to internal information only, or it could augment the prompt with relevant qualifying facts.

Thus enabled, LLMs produce far fewer hallucinations and answer with greater accuracy more contextual relevance. They can also say "I don't know". For example, imagine you have a genAI-powered ecommerce site that only sells TVs. You want the LLM to return information about the televisions you stock, not about other TVs, different electrical goods and definitely not about people with the initial TV, so you restrict the answers using RAG.

Chunking strategy

Chunking is a process that happens at the ingestion stage when documents are fed into the database. Models vary, but none has an infinite context window. Data must be broken into chunks before it is fed in, and the way you do that has a huge affect on what sort of answers will be returned.

Ideally, a chunk should be a discrete piece of information with minimal overlaps. This is because the vector database uses a probabilistic approach when matching the information it holds with the user input. The closer a vector matches the prompt the better.

"The chunk should reduce down to the most accurate vector possible," said Anuff.

Even when using systems that have huge context windows capable of swallowing dozens documents in one, it is usually still more efficient, quicker and more accurate to chunk the data. LLM response time and price both increase linearly with context length, and reasoning across large contexts is difficult.

The simplest approach to breaking up text is fixed-size chunking, splitting a document every few bytes or by character count, but as this takes no account of the semantic content it rarely works well. At the other end of the spectrum is an automated agent-based chunking, where machine learning takes care of the process based on context and meaning, but really it's horses for courses said Anuff.

"If I'm working with a legal firm I might have a bunch of lawyers working with my programmers to define how to extract the structure of a legal contract and chunk it. They can identify all the different variants, and you can build a set of rules and get very good results from that. That's called a domain-specific chunking strategy, and there are a bunch of specialized software companies that do this."

On the other hand, for the majority of cases where documents are more varied and less structured than legal contracts, you want to automate it with agent-based chunking, "where the agent looks at the document and says 'Okay, in this particular case, we'll want to break it up this way, and then this way'."

Frameworks and toolkits are emerging to do this, including unstructured.io, LlamaIndex and Langchain, all of which index data in different ways, such as semantic chunking and recursive splitting.

RAG hell

Chunking strategy is an evolving field, and as such it has become a bit of a bottleneck for AI development, Anuff said. The law of garbage in garbage out still applies, and bad chunking strategy will lead to poor results in a way that's very hard to understand. A new inferno has been born, joining dependency hell, callback hell and scope hell to create fresh torment for developers - RAG hell.

"We're talking to projects and they're, like, we're in RAG hell right now. What they've done is a naive implementation. They've done fixed size chunking or something out of the box, loaded up their data, and they're getting very bad results."

Avoiding descent into RAG hell means thinking carefully about the chunking strategy at the start, not after the fact.

DataStax, which offers its own vector database Astra DB, is partnering with a number of players and projects, including some of those mentioned above, to build a stack that will allow users to chuck in a document or, more likely, thousands of documents, and have them optimally chunked with a minimum of fuss.

"We're doing this because we're in the business of selling databases," Anuff said. "The sooner they get chunking done the sooner they can build applications and the more they're going to use our database."

You may also like
Access Point: Weekly News Roundup For IT Executives – May 17, 2024

Column

Access Point is a weekly roundup of major tech news for IT executives on the go. This edition covers May 13-May 17.

clock 05-17-2024 • 2 min read
Microsoft May Patch Tuesday Fixes Two Actively Exploited Zero Days

Software

An expert called one of the vulnerabilities a "vital security threat"

clock 05-15-2024 • 3 min read
4 Announcements From Google I/O 2024 That Midmarket IT Leaders Should Know

Software

Yes, much of the keynote was focused on AI -- but with some cool features

clock 05-14-2024 • 2 min read

More on Artificial Intelligence

Chief scientist and superalignment lead Ilya Sutskever parts ways with OpenAI

Chief scientist and superalignment lead Ilya Sutskever parts ways with OpenAI

Superalignment will be dealt a serious blow by the loss of its leaders, and the lack of transparency is raising concerns and fueling speculation of an exodus from OpenAI

Penny Horwood
clock 05-15-2024 • 3 min read
AI 'Observability' Becomes Rapidly Growing Tech Market Trend

AI 'Observability' Becomes Rapidly Growing Tech Market Trend

Much of the growth in this market has been ignited by artificial intelligence

Samara Lynn
clock 05-14-2024 • 3 min read
Cyber Distortion Podcasters: AI Is New Frontier Of Digital Warfare (With Video)

Cyber Distortion Podcasters: AI Is New Frontier Of Digital Warfare (With Video)

"Accept AI is the future"

Samara Lynn
clock 04-30-2024 • 2 min read