rag – Let's Build Azure!

TL;DR

Large Language Models (LLMs) are masters of language and will assert lies with a smooth tongue of conviction. These hallucinations are most prominent when you prompt an LLM on subjects not included in their training datasets.

Retrieval Augmented Generation (RAG) is a cost-effective pattern to improve a Large Language Model’s (LLM) expertise on specific knowledge bases. It’s like having a studious book-worm that can rapidly read the information corpus; they can talk about the content and make reference to specific sources cited.

RAG pattern allows you to provide grounding data to your LLMs; this “educates” your LLM and improves their context, response quality, and source citing abilities.

How Can RAG Help You?

Armand Ruiz’s LinkedIn post does a great job summarizing benefits of using the RAG pattern.

Access to up-to-date information The knowledge of LLMs is limited to what they were exposed to during pre-training. With RAG, you can ground the LLM to the latest data feeds, making it perfect for real-time use cases.

Incorporating proprietary data LLMs weren’t exposed to your proprietary enterprise data (data about your users or your specific domain) during their training and have no knowledge of your company data. With RAG, you can expose the LLM to company data that matters.

Minimizing hallucinations LLMs are not accurate knowledge sources and often respond with made-up answers. With RAG, you can minimize hallucinations by grounding the model to your data.

Rapid comparison of LLMs RAG applications allow you to rapidly compare different LLMs for your target use case and on your data, without the need to first train them on data (avoiding the upfront cost and complexity of pre-training or fine-tuning).

Control over the knowledge the LLM is exposed to RAG applications let you add or remove data without changing the model. Company policies change, customers’ data changes, and unlearning a piece of data from a pre-trained model is expensive. With RAG, it’s much easier to remove data points from the knowledge your LLM is exposed to.

Explaining RAG In Simple Terms

This six minute video explains RAG using simple English and story-telling.

Standard RAG in 60 Seconds

RAG is a pattern that can be implemented using many tactics. This video explains the flow of a “standard” RAG implementation. Check out related posts in this blog to explore more advanced patterns to improve LLM responses.

Still Room For Progress

The studious librarian isn’t perfect and neither is RAG. Set realistic expectations by understanding these short-comings. More advanced RAG patterns aim to improve on these areas and there is a large amount of research focused on this – stay tuned!

No Reasoning Capability RAG systems rely on retrieving static information but lack reasoning capabilities to analyze, synthesize, or infer new insights beyond what is retrieved. For example, if you feed a bunch of Facebook posts to an LLM and ask, “how is person X related to person Y?” the LLM cannot figure that out, unless there is a post directly providing that kind of statement.
Override General Knowledge Retrieved data can sometimes override the general knowledge embedded in the model, leading to incorrect or overly specific responses when the retrieved context is flawed or overly narrow. If imported all of the Star Trek episodes into a data set and ask, “what is the fastest speed a spaceship can travel?”, you’re likely to get an answer in warp speed – not yet possible in the real-world.
Semantic Search Shortcomings Semantic search algorithms may not capture the nuance of keywords in queries, leading to mismatches between terms in the vector database and user queries, reducing the effectiveness of retrieval.
Scaling Issues with KNN Algorithms As datasets grow in size or diversity, k-nearest neighbor (KNN) algorithms struggle with scalability, resulting in slower retrieval times and inefficiencies in handling large knowledge bases.
Chunk Sizing Leads to Information Gaps The process of splitting documents into chunks for retrieval can create gaps, causing important context to be lost or fragmented, reducing the accuracy and relevance of generated responses.
Garbage In, Garbage Out If the knowledge base contains outdated or biased information, the LLM will generate similarly outdated or biased responses, which can compromise the reliability of the model.
Dependency on Pre-Indexed Data RAG models depend heavily on pre-indexed data, which means that the system can only retrieve information from what has been stored in the vector database, limiting real-time updates or external data sources.
Complexity in Fine-Tuning Adjusting the retrieval mechanisms or integrating new types of data often requires additional fine-tuning of both the retrieval system and the LLM, which increases complexity and maintenance efforts.
Latency Issues The retrieval step can introduce latency, especially when querying large datasets or using less efficient retrieval methods, which can slow down response times in real-time applications.
Cost of Maintaining Up-to-Date Knowledge Base Keeping the knowledge base current requires constant updates and re-indexing, which can be resource-intensive and costly, especially for large-scale or fast-changing domains.
Contextual Inconsistency Sometimes, the retrieved documents might not align contextually with the user query, leading to incoherent or off-target responses. This is particularly problematic when the system retrieves irrelevant data.
Limited Handling of Multimodal Information RAG systems are typically focused on text-based data, and incorporating multimodal inputs (e.g., images, audio) remains a challenge in maintaining the effectiveness of retrieval.