Learn about the key concepts of Maxim's AI Observability.
Maxim's observability platform builds upon established distributed tracing principles while extending them for GenAI-specific monitoring. Our architecture leverages proven concepts and enhances them with specialized components for AI applications.
Our observability features provide a comprehensive view of your AI applications through monitoring, troubleshooting, and alerting capabilities.
The most central component of Maxim's observability platform is the Log Repository, as it is where logs are ingested. It allows for ease of searching and analyzing.
How should you be structuring log repositories?
Split your logs across multiple repositories in your workspace based on your needs. For example, you might have one repository for production logs and another for development logs; or you could have a single repository for all logs, differentiated by tags indicating the environment.
We recommend implementing separate log repositories for separate applications, but also separate log repositories for separate services that managed by distinct teams. Trying to keep logs that are related to different applications or services in the same repository can lead to difficulties in analyzing and troubleshooting.
The Overview tab provides a comprehensive snapshot of activity within your Log Repository for your specified timeframe. The metrics that are displayed in the overview include:
Total traces
Total usage
Average user feedback
Traces over time (graph)
Total usage over time (graph)
Average user feedback over time (graph)
Latency over time (graph)
Error rate (graph)
![Screenshot of overview metrics]
There is also an overview for Evaluation results, which includes:
No of traces evaluated
Evaluation summary
Mean score over time (graph)
Pass rate over time (graph)
![Screenshot of evaluation results overview metrics]
We talk more about evaluation logs in How to evaluate logs section of the documentation.
Logs Tab is where you'll find all the ingested logs in a tabular format, it is a detailed view where each log is displayed separately with it's own metrics.
![Screenshot of the logs tab]
For each log entry we display the following in the table:
Field
Description
Timestamp
The start time of the log
Input
The user's prompt or query
Output
The final response
Model
The AI models seen within the trace (e.g., gpt-4o)
Tokens
Token count for the whole trace
Cost
The cost for the whole trace in USD
Latency
Response time in milliseconds
User feedback
User feedback for a trace (if available)
Tags
Tags associated with the log entry
Apart from the above fields, there are Evaluation score fields per evaluator too, these display any evaluation done on the trace and what score was obtained for that evaluation.
Each trace can be clicked to see an expanded view of the trace in a sheet. It displays detailed information on each component of the trace and also the evaluations done on each component.
Once you start logging, you can set up alerts to monitor specific metrics. Alerts are triggered when the metric exceeds a certain threshold, and they can be configured to notify you via slack, pagerduty or opsgenie.
![Screenshot of the alerts tab]
You can learn more on configuring alerts in the How to set up alerts section of the documentation.
Now that you know how to navigate through a Log Repository, let us discuss what each log contains and what each component of the log and their properties represent.
Session is a top level entity that captures all the multi-turn interactions of your system. For example, if you are building a chatbot, a session in Maxim logger is an entire conversation of a user with your chatbot.
Sessions persist across multiple interactions and remain active until explicitly closed - you can keep on adding different traces to it over the course of time unless you want to explicitly close the session.
In distributed tracing, a trace is the complete processing of a request through a distributed system, including all the actions between the request and the response.
Property
Description
id
Unique identifier for the trace; this usually can be your request id.
name (optional)
Name of the trace; you can keep it same as your API call. i.e. chatQuery
tags (optional)
Key-value pairs which you can use for filtering on the dashboard.
There is no limit on the number of tags or the size of the string, although lower amounts are better and faster for search performance specific to your repo.
Spans are fundamental building blocks of distributed tracing. A single trace in distributed tracing consists of a series of tagged time intervals known as spans. Spans represent a logical unit of work in completing a user request or transaction.
Sub-spans
A span can have other spans as children. You can create as many subspans as you want to logically group flows within a span.
Property
Description
id
Unique identifier for the span; it can be uuid() for each new trace. This has to be unique across all elements in a trace. If you these are duplicated, data gets overwritten.
name
Name of the span; you can keep it same as your API call. i.e. chatQuery
tags
These are span specific tags.
spanId
Parent span id; these are automatically generated when you call span.addSpan().
Events mark significant points within a span or a trace recording instantaneous occurrences that provide additional context for understanding system behavior.
Property
Description
id
Unique identifier for the event; this has to be unique in a trace across all elements
name
Name of the event. you can keep it same as your API call. i.e. chatQuery
A Retrieval represents a query operation to fetch relevant context or information from a knowledge base or vector database within a trace or span. It is commonly used in RAG (Retrieval Augmented Generation) workflows, where context needs to be fetched before making LLM calls.
Property
Description
id
Unique identifier for the retrieval; this has to be unique in a trace.
name
Name of the retrieval; it can be specific to your workflow intent detection or final summarization call .
tags
These are retrieval specific tags.
input
Input used to fetch relevant chunks from your knowledge base
Tool Call represents an external system or service call done based on an LLM response. Each tool call can be logged separately to track its input, output and latency.
Property
Description
id
Unique identifier for the tool call; this has to be unique in a trace (can be found in the tool call reponse of the LLM).
name
Name of the tool call.
description
Description of the tool call.
args
Arguments passed to the tool call, these are tool specific arguments.