Maxim AI January 2025 Updates ✨

Feature spotlight
🔍 Evaluate and monitor AI agents using Maxim
It is a constant challenge for AI teams to debug, monitor, and understand what's going wrong in the complex workflows of AI agents, which involve multiple steps, i.e., LLM calls, tool calls, data retrievals, etc.
To ensure the quality of AI agents, evaluation needs to happen at each step, i.e., for LLM generation, planner action, tool call, etc. With Maxim’s observability and evaluation suite, you can attach custom evaluators to each level of your logging hierarchy (trace, span, or component within the span).
Key features:
- Granular monitoring: Log and evaluate each step of your agentic workflow in real time. Continuously monitor quality at each step and define alerts for proactive issue resolution.
- SDK support: Integrate tracing and evaluation directly from your code using programming language support offered by Maxim.
- Metrics: Track key metrics such as cost, latency, and evaluator scores for each node.
This ensures you gain actionable insights into every part of your AI agent's operations, helping you debug faster and maintain production-grade performance. Learn more on agentic evaluations.
🐋 DeepSeek-R1 is now available on Maxim!
Leverage the capabilities of this OpenAI o1 competitor on Maxim to design custom AI evaluators for your workflows and experiment with your prompts.
Enable DeepSeek-R1 on Maxim via the Together AI provider:
- Go to Settings > Models > Together AI and select DeepSeek R1.
💾 Integrate evals to deployment workflows
CI/CD pipelines are central to modern software development, automating workflows and enabling teams to meet high-quality standards with speed. As modern AI teams ship features at an increasingly fast pace, integrating quality checks directly into deployment pipelines becomes critical.
Maxim’s evaluation suite seamlessly integrates into your development workflows through CLI, SDK, or support for GitHub Actions. Integrating Maxim CLI in your CI/CD pipelines enables you to evaluate the features thoroughly before deployment, ensuring that only high-quality updates reach users. Key benefits of Maxim CLI:
- Custom evaluators: Leverage Maxim’s robust evaluators and define tailored evaluation criteria to suit your unique requirements.
- Pass/fail criteria: Use evaluation scores to automate deployment decisions.
- Deployment mapping: Easily trace test runs back to deployments for faster debugging
Ensure the speed and reliability of your AI deployments with Maxim CLI.
Feature round-up
🕳️ OTel support for distributed tracing and observability
AI teams manage vast amounts of production data and logs, making cross-platform monitoring and troubleshooting challenging without a standardized telemetry framework. Fragmented data, vendor lock-in, and complex integrations further complicate the process of routing and managing telemetry data across platforms.
OpenTelemetry (OTel) is an open-source observability framework that provides standardized protocols and tools for collecting and routing telemetry data in a unified format. Maxim is fully OpenTelemetry compliant, enabling you to seamlessly relay/forward application logs to New Relic or any observability platform of your choice that supports OTel. Key benefits:
- Unified observability: Collect and route telemetry data, including production logs, in a standardized format—no more juggling multiple protocols.
- Enhanced flexibility: Avoid vendor lock-in and choose the observability tools that best fit your enterprise needs
- Enterprise-grade monitoring: Integrate your connectors with just one step and leverage Maxim's observability and evaluation stack to maintain production quality and seamless AI operations.
Stay ahead of operational challenges with Maxim's robust and flexible monitoring framework. Learn how to set up data connectors on Maxim.
💻 Trigger test runs on your local workflows using Python SDK
The Maxim SDK empowers developers to test their AI workflows directly from their local environment, eliminating the need for repetitive data uploads and back-and-forth interactions with the platform.
Our SDK support gives you the following benefits:
- Flexible data sources: Use local CSV files or other data sources as test datasets.
- Local testing: Trigger test runs directly on your local machine without uploading data to Maxim.
- Seamless monitoring: Track the status of test runs in the Maxim dashboard, just like regular runs.
This makes quality assurance faster, more flexible, and developer-friendly. Start building with Maxim SDK.
Upcoming release
🔁 Agent simulation
Simulate and test your multi-turn workflows on a wide variety of scenarios and across different user personas before taking your application to production. Maxim’s agent simulation feature will streamline and simplify the process of ensuring the reliability of your AI agent.
Knowledge nuggets
📝 Build a RAG Application using MongoDB and Maxim
Retrieval-augmented generation (RAG) is a process designed to enhance the output of a large language model (LLM) by incorporating information from an external, authoritative knowledge base. There are two key components of a RAG application: Retrieval (fetching context from knowledge sources) and Generation (using the retrieved context to generate a grounded response).
Maxim enables you to evaluate the quality of the retrieved context and generated response in your RAG application, as well as continuously monitor performance in production. In this blog, we've outlined the step-by-step process of building and monitoring a RAG application by leveraging MongoDB as a vector database and Maxim AI for tracing the output of retrieval and generation components.
