Maxim Logo
How toEvaluate logs

Set up auto evaluation on logs

Evaluate captured logs automatically from the UI based on filters and sampling

Why evaluate logs?

We know that evaluation is a necessary step while building an LLM, but since an LLM can be non-deterministic, all possible scenarios can never be covered; thus evaluating the LLM on live system also becomes crucial.

Evaluation on logs helps cover cases or scenarios that might not be covered by Test runs, ensuring that the LLM is performing optimally under various conditions. Additionally, it allows for potential issues to be identified early on which allows for making necessary adjustments to improve the overall performance of the LLM in time.

Diagram of the evaluation iteration loop

Before you start

You need to have your logging set up to capture interactions between your LLM and users before you can evaluate them. To do so, you would need to integrate Maxim SDK into your application.

Setting up auto evaluation

Navigate to the repository where you want to evaluate your logs.

Click on Configure evaluation in the top right corner of the page and choose the Setup evaluation configuration option. This will open up the evaluation configuration sheet.

Setup evaluation configuration
Create annotation queue

The sheet's Auto Evaluation section has 3 parts:

  • Select evaluators: Choose the evaluators you want to use for your evaluation.
  • Filters: Setup filters to only evaluate logs that meet a certain criteria.
  • Sampling: Choose a sampling rate, this will help you control the amount of logs that are evaluated and prevent evaluating every log; which could potentially lead to very high costs.

Screenshot of the evaluation configuration sheet

The Human Evaluation section below is explained in the Set up human evaluation on logs section

Finally click on the Save configuration button.

The configuration is now done and your logs should start evaluating automatically based on the filters and sampling rate you have set up! 🎉

Making sense of evaluations on logs

In the logs' table view, you can find the evaluations on a trace in its row towards the left end, displaying the evaluation scores. You can sort the logs by evaluation scores as well by clicking on either of the evaluators' column header.

Screenshot of the logs table with traces having evaluation

Click the trace to view detailed evaluation results. In the sheet, you will find the Evaluation tab, wherein you can see the evaluation in detail.

Screenshot of the details sheet with the evaluation tab highlighted

The evaluation tab displays many details regarding the evaluation of the trace, let us see how you can navigate through them and get more insights into how your LLM is performing.

Evaluation summary

Screenshot of the evaluation summary

Evaluation summary displays the following information (top to bottom, left to right):

  • How many evaluators passed out of the total evaluators across the trace
  • How much did all the evaluators' evaluation cost
  • How many tokens were used across the all evaluators' evaluations
  • What was the total time taken for the evaluation to process

Trace evaluation card

In each card, you will find a tab switcher on the top right corner, this is used to navigate through the evaluation's details. Here is what you can find in in different tabs:

Overview tab

Screenshot of the overview tab in trace evaluation card

All the evaluators run on the trace level and their scores are displayed in a table here along with whether the evaluator passed or failed.

Individual evaluator's tab

Screenshot of the individual evaluator's tab in trace evaluation card

This tab contains the following sections:

  • Result: Shows whether the evaluator passed or failed.
  • Score: Shows the score of the evaluator.
  • Reason (shown where applicable): Displays the reasoning behind the score of the evaluator, if given.
  • Cost (shown where applicable): Shows the cost of the individual evaluator's evaluation.
  • Tokens used (shown where applicable): Shows the number of tokens used by the individual evaluator's evaluation.
  • Model latency (shown where applicable): Shows the time taken by the model to respond back with a result for an evaluator.
  • Time taken: Shows the time taken by the evaluator to evaluate.
  • Variables used to evaluate: Shows the values that were used to replace the variables with while processing the evaluator.
  • Logs: These are logs that were generated during the evaluation process. They might be useful for debugging errors or issues that occurred during the evaluation.

Tree view on the left panel

Screenshot of the tree view on the left panel

This view is essential for when you are evaluating the each log on the node level, essentially on each component of the trace (like a generation or retrieval, etc). This view helps with your perception as to what component's evaluation you are looking at on the right panel (and the component's place in the trace as well). We discuss more about the Node Level Evaluation further down.

On this page