Project Link: View Project
Author: Ruth Kiarie
Email: wangukiarie@gmail.com
In this project, I'm going to building a RAG pipeline with FASTAPI. This will help me gain insight into how building a RAG pipeline with FASTAPI differs from building it without an API. I'm interested in this because it will help me understand how RAG, FASTAPI, ChromaDB and Ollama all come together to work as a RAG assisted AI.
The key tools I used include ChromaDB vector database, FASTAPI-powered /ask endpoint, uvicorn server. Key concepts I learnt include building a manual RAG pipeline, RAG API with FASTAPI and multi-user RAG systems, and how these systems are used for different purposes.
This project took me approximately 2 hours. This project grounded my understanding of RAG pipelines, API's, ChromaDB vector database, Swagger UI and how multi-users are added to the system.
In this step, I'm going to perform a manual RAG system and set up a Python environment. RAG stands for Retrieval Augmented Generation.
I performed RAG manually by adding a personal knowledge base, then asking a question based on that data. The three parts include retrieving data from the knowledge base, prompting the AI and generation of an answer.
The key difference I noticed is that nomic embed-text converts text into numerical representations for search, while qwen2.5:0.5b generates responses for questions asked.
In this step, I'm going to build a Python script that loads, chunks text and stores my data as embeddings. Embeddings are text that are converted into vectors and stored in a database.
I included information about myself which grounds the answers generated by the LLM on the knowledge base added in the RAG pipeline.
When I ask a question, ChromaDB converts it into vectors and finds the vectors closest in that high dimensional space.
In this step, I'm going to build an RAG API with FASTAPI. I'll test it using Swagger UI.
When a question comes in, my endpoint performs the three steps of RAG. It first retrieves the most relevant chunks from ChromaDB, then augments the prompt combining the chunks with the question and generates an answer grounded in the model used.
I tested my API by asking 'What is my name?'. The AI answered with 'Ruth'. The context used were the exact chunks that ChromaDB retrieved from my knowledge base.
In this project extension, I'm adding multi-user support because in the real world RAG systems almost always serve multiple users or data sources. Multi-tenancy means keeping different users data secure, safe and cost effective, as the RAG system is scaled to handle many users.
In this project extension, I added a POST endpoint that dynamically ingests profiles for different users. Metadata filtering allows for the querying of the results based on the requested user.
In this project extension, I tested multi-user queries by by passing user=Jordan in the GET\ask parameters. The filter works because it returns results for Jordan's data only.
I did this project today to learn how to connect to ChromaDB and store text vectors in the database, how to query documents using FASTAPI and connect to Swagger UI, so as to generate accurate answers from the stored chunks. Another skill I want to learn is how Docker is used to containerize the data.