Learn how to programmatically trigger test runs using Maxim's SDK with custom datasets, flexible output functions, and evaluations for your AI applications.
While Maxim's web interface provides a powerful way to run tests, the SDK offers even more flexibility and control. With the SDK, you can:
The SDK uses a builder pattern to configure and run tests. Follow this example to trigger test runs:
import { Maxim } from "@maximai/maxim-js";const maxim = new Maxim({ apiKey: "" });const result = await maxim.createTestRun("My First SDK Test", "your-workspace-id").withDataStructure(/* your data structure here */).withData(/* your data here */).yieldsOutput(/* your output function here */).withWorkflowId(/* or you can pass workflow ID from Maxim platform */).withPromptVersionId(/* or you can pass prompt version ID from Maxim platform */).withEvaluators(/* your evaluators here */).run();
Copy your workspace ID from the workspace switcher in the left topbar
For smaller datasets or programmatically generated data:
import { Maxim } from "@maximai/maxim-js";const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });const manualData = [ { question: "What is the capital of France?", answer: "Paris", context: "France is a country in Western Europe..." }, { question: "Who wrote Romeo and Juliet?", answer: "William Shakespeare", context: "William Shakespeare was an English playwright..." }];const result = maxim .createTestRun("Manual Data Test", workspaceId) .withDataStructure({ question: "INPUT", answer: "EXPECTED_OUTPUT", context: "CONTEXT_TO_EVALUATE" }) .withData(manualData) // ... rest of the configuration
import { Maxim } from "@maximai/maxim-js";const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });const result = maxim .createTestRun("Custom Output Test", workspaceId) .withDataStructure({ question: "INPUT", answer: "EXPECTED_OUTPUT", context: "CONTEXT_TO_EVALUATE" }) .withData(myData) .withWorkflowId(workflowIdFromDashboard, contextToEvaluate) // context to evaluate is optional; it can either be a variable used in the workflow or a column name present in the dataset
Find the workflow ID in the workflows tab and from menu click on copy ID.
import { Maxim } from "@maximai/maxim-js";const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });const result = maxim .createTestRun("Custom Output Test", workspaceId) .withDataStructure({ question: "INPUT", answer: "EXPECTED_OUTPUT", context: "CONTEXT_TO_EVALUATE" }) .withData(myData) .withPromptVersionId(promptVersionIdFromPlatform, contextToEvaluate) // context to evaluate is optional; it can either be a variable used in the prompt or a column name present in the dataset
To get prompt version ID, go to prompts tab, select the version you want to run tests on and from menu click on copy version id.
The output function is where you define how to generate responses for your test cases:
import { Maxim } from "@maximai/maxim-js";const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });const result = maxim .createTestRun("Custom Output Test", workspaceId) .withDataStructure({ question: "INPUT", answer: "EXPECTED_OUTPUT", context: "CONTEXT_TO_EVALUATE" }) .withData(myData) .yieldsOutput(async (data) => { // Call your model or API const response = await yourModel.call( data.question, data.context ); return { // Required: The actual output data: response.text, // Optional: Context used for evaluation // Returning a value here will utilize this context for // evaluation instead of the CONTEXT_TO_EVALUATE column (if provided) retrievedContextToEvaluate: response.relevantContext, // Optional: Performance metrics meta: { usage: { promptTokens: response.usage.prompt_tokens, completionTokens: response.usage.completion_tokens, totalTokens: response.usage.total_tokens, latency: response.latency }, cost: { input: response.cost.input, output: response.cost.output, total: response.cost.input + response.cost.output } } }; })
If your output function throws an error, the entry will be marked as failed and you'll receive the index in the failed_entry_indices array after the run completes.
Manage how many entries are processed in parallel:
import { Maxim } from "@maximai/maxim-js";const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });const result = await maxim .createTestRun("Long Test", workspaceId) // ... previous configuration .withConcurrency(5); // Process 5 entries at a time