Visualize your RAG Data — Evaluate your Retrieval-Augmented Generation System with Ragas | by Markus Stoll

How to use UMAP dimensionality reduction for Embeddings to show multiple evaluation Questions and their relationships to source documents with Ragas, OpenAI, Langchain and ChromaDB

13 min read

19 hours ago

Retrieval-Augmented Generation (RAG) adds a retrieval step to the workflow of an LLM, enabling it to query relevant data from additional sources like private documents when responding to questions and queries [1]. This workflow does not require costly training or fine-tuning of LLMs on the additional documents. The documents are split into snippets, which are then indexed, often using a compact ML-generated vector representation (embedding). Snippets with similar content will be in proximity to each other in this embedding space.

The RAG application projects the user-provided questions into the embedding space to retrieve relevant document snippets based on their distance to the question. The LLM can use the retrieved information to answer the query and to substantiate its conclusion by presenting the snippets as references.

Animation of the iterations of a UMAP [3] dimensionality reduction for Wikipedia Formula One articles in the embedding space with manually labeled clusters — created by the author.

The evaluation of a RAG application is challenging [2]. Different approaches exist: on one hand, there are methods where the answer as ground truth must be provided by the developer; on the other hand, the answer (and the question) can also be generated by another LLM. One of the largest open-source systems for LLM-supported answering is Ragas [4](Retrieval-Augmented Generation Assessment), which provides

Methods for generating test data based on the documents and
Evaluations based on different metrics for evaluating retrieval and generation steps one-by-one and end-to-end.

In this article, you will learn

The code is available at Github

Start a notebook and install the required python packages

!pip install langchain langchain-openai chromadb renumics-spotlight
%env OPENAI_API_KEY=<your-api-key>

This tutorial uses the following python packages:

Langchain: A framework to integrate language models and RAG components, making the setup process smoother.
Renumics-Spotlight: A visualization tool to interactively explore unstructured ML datasets.
Ragas: a framework that helps you evaluate your RAG pipelines

Disclaimer: The author of this article is also one of the developers of Spotlight.

You can use your own RAG Application, skip to the next part to learn how to evaluate, extract and visualize.

Or you can use the RAG application from the last article with our prepared dataset of all Formula One articles of Wikipedia. There you can also insert your own Documents into a ‘docs/’ subfolder.

This dataset is based on articles from Wikipedia and is licensed under the Creative Commons Attribution-ShareAlike License. The original articles and a list of authors can be found on the respective Wikipedia pages.

Now you can use Langchain’s DirectoryLoader to load all files from the docs subdirectory and split the documents in snippets using the RecursiveCharacterTextSpliter. With OpenAIEmbeddings you can create embeddings and store them in a ChromaDB as vector store. For the Chain itself you can use LangChains ChatOpenAI and a ChatPromptTemplate.

The linked code for this article contains all necessary steps and you can find a detailed description of all steps above in the last article.

One important point is, that you should use a hash function to create ids for snippets in ChromaDB. This allows to find the embeddings in the db if you only have the document with its content and metadata. This makes it possible to skip documents that already exist in the database.

import hashlib
import json
from langchain_core.documents import Documentdef stable_hash_meta(doc: Document) -> str:
"""
Stable hash document based on its metadata.
"""
return hashlib.sha1(json.dumps(doc.metadata, sort_keys=True).encode()).hexdigest()
...
splits = text_splitter.split_documents(docs)
splits_ids = [
{"doc": split, "id": stable_hash_meta(split.metadata)} for split in splits
]
existing_ids = docs_vectorstore.get()["ids"]
new_splits_ids = [split for split in splits_ids if split["id"] not in existing_ids]
docs_vectorstore.add_documents(
documents=[split["doc"] for split in new_splits_ids],
ids=[split["id"] for split in new_splits_ids],
)
docs_vectorstore.persist()

For a common topic like Formula One, one can also use ChatGPT directly to generate general questions. In this article, four methods of question generation are used:

GPT4: 30 questions were generated using ChatGPT 4 with the following prompt “Write 30 question about Formula one”
– Random Example: “Which Formula 1 team is known for its prancing horse logo?”
GPT3.5: Another 199 question were generated with ChatGPT 3.5 with the following prompt “Write 100 question about Formula one” and repeating “Thanks, write another 100 please”
– Example: “”Which driver won the inaugural Formula One World Championship in 1950?”
Ragas_GPT4: 113 questions were generated using Ragas. Ragas utilizes the documents again and its own embedding model to construct a vector database, which is then used to generate questions with GPT4.
– Example: “Can you tell me more about the performance of the Jordan 198 Formula One car in the 1998 World Championship?”
Rags_GPT3.5: 226 additional questions were generated with Ragas — here we use GPT3.5
– Example: “What incident occurred at the 2014 Belgian Grand Prix that led to Hamilton’s retirement from the race?”

from ragas.testset import TestsetGeneratorgenerator = TestsetGenerator.from_default(
openai_generator_llm="gpt-3.5-turbo-16k", 
openai_filter_llm="gpt-3.5-turbo-16k"
)
testset_ragas_gpt35 = generator.generate(docs, 100)

The questions and answers were not reviewed or modified in any way. All questions are combined in a single dataframe with the columns id, question, ground_truth, question_by and answer.

Next, the questions will be posed to the RAG system. For over 500 questions, this can take some time and incur costs. If you ask the questions row-by-row, you can pause and continue the process or recover from a crash without losing the results so far:

for i, row in df_questions_answers.iterrows():
if row["answer"] is None or pd.isnull(row["answer"]):
response = rag_chain.invoke(row["question"])df_questions_answers.loc[df_questions_answers.index[i], "answer"] = response[
"answer"
]
df_questions_answers.loc[df_questions_answers.index[i], "source_documents"] = [
stable_hash_meta(source_document.metadata)
for source_document in response["source_documents"]
]

Not only is the answer stored but also the source IDs of the retrieved document snippets, and their text content as context:

Source link

Visualize your RAG Data — Evaluate your Retrieval-Augmented Generation System with Ragas | by Markus Stoll | Mar, 2024

How to use UMAP dimensionality reduction for Embeddings to show multiple evaluation Questions and their relationships to source documents with Ragas, OpenAI, Langchain and ChromaDB

You May Also Like

Satellites Can See Invisible Lava Flows and Active Wildfires, But How? (Python) | by Mahyar Aboutalebi, Ph.D. 🎓 | Feb, 2024

How to Improve AI Performance by Understanding Embedding Quality | by Eivind Kjosbakken | Feb, 2024