Retrievals and generations are the only components apart from trace or span that can be evaluated currently.
Agentic evaluation
Learn how to use Maxim SDK to attach and process evaluators for node level evaluation on traces, spans, and components such as retrievals or generations.
The Maxim SDK allows attaching evaluators to hierarchical entities, such as traces, spans, and their components (e.g., retrievals and generations). Once the required variables for an evaluator are provided via withVariables
, the evaluation automatically begins. Results are visible on the Evaluations Tab inside the trace details sheet.
Setting up evaluators using SDK
This section will provide an understanding of how to attach evaluators, pass variables, and interpret the evaluation results within Maxim.
Attaching evaluators to entities
Evaluators can be attached to a trace, any span inside the trace, or any component within a span. The withEvaluators
method is used to attach evaluators to an entity.
Providing variables to evaluators
Once evaluators are attached to a component, variables can be passed to them via the withVariables
method. This method accepts a key-value pair of variables and their values.
The evaluator starts processing only after all variables required by the evaluator are provided.
This method sends the specified key-value pairs to the evaluators, ensuring they have the data required to process the evaluation.
You can directly pass the variables after attaching the evaluators by chaining the withVariables
method.
Viewing evaluation results on evaluations tab
To monitor and analyze evaluations, navigate to the Evaluations tab in the trace sheet. This tab presents a hierarchical view of entities with evaluations.
- Entity tree (on left): Displays entities (traces, spans, components) in a structured hierarchy enabling intuitive navigation.
- Overview tab (per entity): A concise summary of all the evaluator results for the entity along with rewritten outputs (if available). Includes:
- Evaluator name
- Pass/Fail result
- Score
Navigating individual evaluator details
Clicking on an evaluator within the overview panel opens a detailed view with the following insights:
- Variables used: Displays the list of variables and their values used to evaluate the entity.
- Logs: Shows evaluation logs.
- Time taken: Time taken to evaluate the evaluator (also provides model latency if present).
- Reasoning (if available): Evaluator's reasoning, if present.
- Cost and Tokens used (if available): Breakdown of resource usage for the evaluation.
- Human annotation scores breakdown (if applicable): Shows annotation score per user and comment provided by them. It is only shown for human evaluators.
Code example for agentic evaluation
Best practices
- Use evaluators selectively to monitor key performance metrics.
- Setup sampling and filtering to ensure accurate evaluation processing without eating up too much cost.
- Attach variables reliably to ensure accurate evaluation processing.
- Regularly check the Evaluation Tab for actionable insights directly from production logs.
Conclusion
Agentic evaluation empowers robust monitoring and debugging across nested entities like traces, spans, generations, and retrievals. By leveraging the Maxim SDK’s intuitive methods and analyzing results via the Evaluations Tab, you can ensure the performance and reliability of your AI-powered workflows.