(optional) Install other VS Code extensions to improve Azure effeciency
Sign into your Azure account with “GitHub Copilot for Azure”
Get familiar with GH Copilot and the “for Azure” extension
A More Detailed Description
Obviously, you need an Azure account with access to a subscription. If you are reading this blog, you probably have one, but you can always Try Azure for free
Make sure Azure Copilot is enabled on your tenant. This is enabled for all users by default, but your tenant administrator may have disabled this. If that is the case, you need to contact them and request the enable it for your identity.
This automatically installs a required dependencies (like GitHub Copilot Chat)
(optional) If you frequently use VS Code and Azure, consider related extensions. Personally, I have found the list below helpful – but it really depends on the kind of activities you perform in Azure (don’t be tricked! check the extension publisher):
Azure Tools (which installs a collection of Azure related extensions)
Azure Machine Learning
Azure IoT Hub
Azure Kubernetes Service
Azure API Management
Azure Automation
Azure Policy
Azure Pipelines
Azure Terraform
Bicep
This first time you use GHCP4AZ, it will ask you to sign into Azure. If you belong to multiple tenants, you need to select a single tenant. You can always change this tenant by running @azure /changeTenant in the Copilot Chat window or performing the click sequence in Set your default tenant.
If you haven’t used GitHub Copilot before take a look at Microsoft’s free Introduction to GitHub Copilot mini-course (it is worth the few minutes) and review other postings specifically about using GitHub Copilot for Azure.
Large Language Models (LLMs) are masters of language and will assert lies with a smooth tongue of conviction. These hallucinations are most prominent when you prompt an LLM on subjects not included in their training datasets.
Retrieval Augmented Generation (RAG) is a cost-effective pattern to improve a Large Language Model’s (LLM) expertise on specific knowledge bases. It’s like having a studious book-worm that can rapidly read the information corpus; they can talk about the content and make reference to specific sources cited.
RAG pattern allows you to provide grounding data to your LLMs; this “educates” your LLM and improves their context, response quality, and source citing abilities.
Access to up-to-date information The knowledge of LLMs is limited to what they were exposed to during pre-training. With RAG, you can ground the LLM to the latest data feeds, making it perfect for real-time use cases.
Incorporating proprietary data LLMs weren’t exposed to your proprietary enterprise data (data about your users or your specific domain) during their training and have no knowledge of your company data. With RAG, you can expose the LLM to company data that matters.
Minimizing hallucinations LLMs are not accurate knowledge sources and often respond with made-up answers. With RAG, you can minimize hallucinations by grounding the model to your data.
Rapid comparison of LLMs RAG applications allow you to rapidly compare different LLMs for your target use case and on your data, without the need to first train them on data (avoiding the upfront cost and complexity of pre-training or fine-tuning).
Control over the knowledge the LLM is exposed to RAG applications let you add or remove data without changing the model. Company policies change, customers’ data changes, and unlearning a piece of data from a pre-trained model is expensive. With RAG, it’s much easier to remove data points from the knowledge your LLM is exposed to.
Explaining RAG In Simple Terms
This six minute video explains RAG using simple English and story-telling.
Standard RAG in 60 Seconds
RAG is a pattern that can be implemented using many tactics. This video explains the flow of a “standard” RAG implementation. Check out related posts in this blog to explore more advanced patterns to improve LLM responses.
Still Room For Progress
The studious librarian isn’t perfect and neither is RAG. Set realistic expectations by understanding these short-comings. More advanced RAG patterns aim to improve on these areas and there is a large amount of research focused on this – stay tuned!
No Reasoning Capability RAG systems rely on retrieving static information but lack reasoning capabilities to analyze, synthesize, or infer new insights beyond what is retrieved. For example, if you feed a bunch of Facebook posts to an LLM and ask, “how is person X related to person Y?” the LLM cannot figure that out, unless there is a post directly providing that kind of statement.
Override General Knowledge Retrieved data can sometimes override the general knowledge embedded in the model, leading to incorrect or overly specific responses when the retrieved context is flawed or overly narrow. If imported all of the Star Trek episodes into a data set and ask, “what is the fastest speed a spaceship can travel?”, you’re likely to get an answer in warp speed – not yet possible in the real-world.
Semantic Search Shortcomings Semantic search algorithms may not capture the nuance of keywords in queries, leading to mismatches between terms in the vector database and user queries, reducing the effectiveness of retrieval.
Scaling Issues with KNN Algorithms As datasets grow in size or diversity, k-nearest neighbor (KNN) algorithms struggle with scalability, resulting in slower retrieval times and inefficiencies in handling large knowledge bases.
Chunk Sizing Leads to Information Gaps The process of splitting documents into chunks for retrieval can create gaps, causing important context to be lost or fragmented, reducing the accuracy and relevance of generated responses.
Garbage In, Garbage Out If the knowledge base contains outdated or biased information, the LLM will generate similarly outdated or biased responses, which can compromise the reliability of the model.
Dependency on Pre-Indexed Data RAG models depend heavily on pre-indexed data, which means that the system can only retrieve information from what has been stored in the vector database, limiting real-time updates or external data sources.
Complexity in Fine-Tuning Adjusting the retrieval mechanisms or integrating new types of data often requires additional fine-tuning of both the retrieval system and the LLM, which increases complexity and maintenance efforts.
Latency Issues The retrieval step can introduce latency, especially when querying large datasets or using less efficient retrieval methods, which can slow down response times in real-time applications.
Cost of Maintaining Up-to-Date Knowledge Base Keeping the knowledge base current requires constant updates and re-indexing, which can be resource-intensive and costly, especially for large-scale or fast-changing domains.
Contextual Inconsistency Sometimes, the retrieved documents might not align contextually with the user query, leading to incoherent or off-target responses. This is particularly problematic when the system retrieves irrelevant data.
Limited Handling of Multimodal Information RAG systems are typically focused on text-based data, and incorporating multimodal inputs (e.g., images, audio) remains a challenge in maintaining the effectiveness of retrieval.