Build a RAG application using MongoDB and Maxim AI

Build a RAG application using MongoDB and Maxim AI
RAG application flow

What is RAG?

Retrieval-augmented generation (RAG) is a process designed to enhance the output of a large language model (LLM) by incorporating information from an external, authoritative knowledge base. This approach ensures that the responses generated by the LLM are not solely dependent on the model's training data.

Two main components of the RAG application:

  • Retrieval - fetching relevant information from a knowledge source, such as a document database. 
  • Generation - taking retrieved information and the user's query to generate a coherent response.

In this blog, we’ll create a RAG application using MongoDB as a vector database and log the retrievals and generations using Maxim’s SDK. 

Prerequisite setup

For this implementation, we require the following tools: 

  1. MongoDB - for storage and retrieval of our context
  2. Maxim AI - for logging of our RAG workflow
  3. OpenAI - for embedding and generation of our response.
  4. Python - programming language

Setting up MongoDB Atlas

We will use MongoDB Atlas to store our data and create a vector search for retrieval.

  1. Start by creating an Atlas account by signing up here
  2. Follow this quick guide to create your cluster and fetch the connection string. 
  3. Install the latest version of the Pymongo library. 
pip install pymongo
  1. Create a MongoDB client. 
from pymongo.mongo_client import MongoClient

mongo_uri = <mongo-connection-string>
client = MongoClient(mongo_uri)
  1. For this blog, we’ll use MongoDB’s sample database- “sample_mflix”. Follow the following steps to load sample data into your cluster. 
Loading a sample database on MongoDB
Loading a sample database on MongoDB

The created sample_mflix.embedded_movies dataset already includes vector embeddings for plots generated with OpenAI's text-embedding-ada-002 model. Each document has a “plot_embedding” field containing an array of 1536 numbers, i.e., the embedding vector. 

db = client["sample_mflix"]
collection = db["embedded_movies"]
  1. Create a vector index for our Database and name it “idx_plot_embedding”. In order to run a vector search, we need to create a vector index first, which enables faster query execution.
from pymongo.operations import SearchIndexModel

search_index_model = SearchIndexModel(
  definition={
    "fields": [
      {
        "type": "vector", #indicates a vector index
        "numDimensions": 1536, #dimension of embedding field 
        "path": "plot_embedding",#field name where embeddings are stored
        "similarity":  "cosine" #matching algorithm for vector index
      }
    ]
  },
  name= "idx_plot_embedding",
  type="vectorSearch",
)

collection.create_search_index(model=search_index_model)

In our RAG, we’ll query this vector index from our application to retrieve the required context.

Setting up Maxim SDK

Maxim logging is a distributed tracing system that enables real-time log monitoring in production environments. 

  1. Get started for free by signing up here
  2. Generate a Maxim API key and ensure to copy the API key before closing the dialog. 
Generating API key on Maxim AI
Generating API key on Maxim AI
  1. Install Maxim’s Python SDK
pip install maxim-py
  1. Generate a log repository ID from the Maxim platform and fetch the ID. 
Creating a log repository on Maxim AI
Creating a log repository on Maxim AI
  1. Initialize Maxim logger. Read more about Maxim’s logger hierarchy.
from maxim import Config, Maxim
from maxim.logger import LoggerConfig

maxim_api_key = <"maxim-api-key"> 
log_repo_id = <"log-repo-id">

maxim = Maxim(Config(api_key=maxim_api_key))
logger = maxim.logger(LoggerConfig(id=log_repo_id))
  1. Maxim decorators: We’ll use decorators from Maxim’s SDK for logging to save the effort of adding endless log statements. Decorators enable us to log our workflow with a single-line addition in code. 
from maxim.decorators import trace, retrieval, current_retrieval
from maxim.decorators.langchain import langchain_llm_call, langchain_callback

Setting up OpenAI key

In our blog, we’ll configure OpenAI via langchain. 

  1. Follow this quick guide to generate an OpenAI key. 
  2. Initialize an OpenAI model using langchain’s functions. 
from langchain.chat_models.openai import ChatOpenAI

openai_key = <"openai-api-key">
llm = ChatOpenAI(api_key=openai_key, model="gpt-4o-mini")

RAG application

Import dependencies

First, we need to import the necessary libraries. 

import json
import logging
import os
from math import log
from flask import Flask, jsonify, request
from langchain.chat_models.openai import ChatOpenAI
from langchain.tools import tool
from langchain_openai import OpenAIEmbeddings
from pymongo.mongo_client import MongoClient
from maxim import Config, Maxim
from maxim.decorators import current_retrieval, current_trace, retrieval, trace
from maxim.decorators.langchain import langchain_callback, langchain_llm_call
from maxim.logger import LoggerConfig

Retrieval

We are using openAI’s embedding model- “text-embedding-ada-002” since this was earlier used to create embeddings of the “plot_embedding” field. We’ll use this model to convert string query into a vector, which will be passed to Atlas index for retrieval. Further, we’ll use Maxim’s “@retrieval” decorator to log retrieval calls.

embeddings = OpenAIEmbeddings(
    model="text-embedding-ada-002",
    openai_api_key=openai_key
)

@retrieval(name="mongo-retrieval") #maxims decorator for logging retrievals
def retrieve_docs(query: str):
    db= client["sample_mflix"]
    collection = db["embedded_movies"]
    query_vector= embeddings.embed_query(query) #embedding string query
    response = collection.aggregate([
        {
            '$vectorSearch': {
                "index": "idx_plot_embedding", #name of Atlas index 
                "path": "plot_embedding", #field storing vector embeddings
                "queryVector": query_vector, #vector for of our query
                "numCandidates": 50,
                "limit": 10 #number of context we wish to pull
            }
        }
    ])
    docs = [
        {
            "Title": item.get('title', 'N/A'),
            "Plot": item.get('plot', 'N/A'),
            "Year": item.get('year', 'N/A')
        }
        for item in response
    ]
    return docs

Generation

Here, we are using OpenAI’s “gpt-4o-mini” model for generation and Maxim’s “@langchain_llm_call” decorator to log LLM generation. 

llm = ChatOpenAI(api_key=openai_key, model="gpt-4o-mini")

@langchain_llm_call(name="llm-call") #maxims decorator to log generation
def execute(query: str):
    context = retrieve_docs(query)
    messages = [
        (
            "system",
            f"You are a smart movie recommendation expert. A question will be asked to you along with relevant information. "
            f"Your task is to recommend the just title of the movie using this context: {json.dumps(context)}"
            f"Respond in proper markdown format",
        ),
        ("human", query),
    ]
    result = llm.invoke(messages, config={"callbacks": [langchain_callback() ]}) # using langchain_callback() function to log model response, tokens etc
    
    return result.content

Handler

We’ll use this function as the entry point to our application. We’ll use Maxim’s “@trace”  decorator to trigger a new trace every time the user makes a new query. A trace is the complete processing of a request through a distributed system, including all the actions between the request and the response. 

@app.post("/chat")
@trace(logger=logger, name="movie-chat-v1") # maxims decorator to initiate a trace
def handler():
    print(current_trace().id)
    query = request.json["query"]
    result = execute(query)
    current_trace().set_output(str(result)) # set the output for traces
    return jsonify({"result": result})

Maxim AI platform

Workflow

We will call our RAG application using Maxim’s “Workflow” feature. 

  1. Add your API endpoint in the address bar.
  2. Write the user query in the “Input” dialog and hit “Run”. 
Maxim AI's Workflow feature
Maxim AI's Workflow feature

Logs

Once the request is completed, the logs will get automatically generated on the Maxim platform. We can check the result of our retrieval and generation here. We can also filter by specific types of spans e.g., retrieval, generation.

Checking logs on Maxim AI platform
Checking logs on Maxim AI platform

Next steps

  1. Learn how to set real-time evaluation of your logs with Maxim AI. Maxim's continuous log evaluation system allows users to monitor application performance in real time.
  2. Learn how to evaluate the context/response of your RAG application with Maxim AI. Maxim allows you to bring in your custom dataset and use our evaluators to evaluate the performance of your application. 

Learn more about Maxim AI: https://www.getmaxim.ai