Maxim Logo
How to

Trigger Test Runs using SDK

Learn how to programmatically trigger test runs using Maxim's SDK with custom datasets, flexible output functions, and evaluations for your AI applications.

While Maxim's web interface provides a powerful way to run tests, the SDK offers even more flexibility and control. With the SDK, you can:

  • Use custom datasets directly from your code
  • Control how outputs are generated
  • Integrate testing into your CI/CD pipeline
  • Get real-time feedback on test progress
  • Handle errors programmatically

Example of triggering test runs using the SDK

The SDK uses a builder pattern to configure and run tests. Follow this example to trigger test runs:

import { Maxim } from "@maximai/maxim-js";
 
const maxim = new Maxim({ apiKey: "" });
 
const result = await maxim
.createTestRun("My First SDK Test", "your-workspace-id")
.withDataStructure(/* your data structure here */)
.withData(/* your data here */)
.yieldsOutput(/* your output function here */)
.withWorkflowId(/* or you can pass workflow ID from Maxim platform */)
.withPromptVersionId(/* or you can pass prompt version ID from Maxim platform */)
.withEvaluators(/* your evaluators here */)
.run();

Copy your workspace ID from the workspace switcher in the left topbar

Screenshot of copy workspace ID option

Understanding the data structure

Understand the data structure to maintain type safety and validate data columns. It maps your data columns to specific types that Maxim understands.

Basic structure

Define your data structure using an object that maps column names to specific types.

const dataStructure = {
    myQuestionColumn: "INPUT",
    expectedAnswerColumn: "EXPECTED_OUTPUT",
    contextColumn: "CONTEXT_TO_EVALUATE",
    additionalDataColumn: "VARIABLE"
}

Available types

  • INPUT - Main input text (only one allowed)
  • EXPECTED_OUTPUT - Expected response (only one allowed)
  • CONTEXT_TO_EVALUATE - Context for evaluation (only one allowed)
  • VARIABLE - Additional data columns (multiple allowed)
  • NULLABLE_VARIABLE - Optional data columns (multiple allowed)

Example

import { Maxim } from "@maximai/maxim-js";
 
const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });
 
const result = maxim
    .createTestRun("Question Answering Test", workspaceId)
    .withDataStructure({
        question: "INPUT",
        answer: "EXPECTED_OUTPUT",
        context: "CONTEXT_TO_EVALUATE",
        metadata: "NULLABLE_VARIABLE"
    })
    // ... rest of the configuration

Working with data sources

Maxim's SDK supports multiple ways to provide test data:

1. Callable

import { CSVFile, Maxim } from '@maximai/maxim-js';
 
const myCSVFile = new CSVFile('./test.csv', {
    question: 0, // column index in CSV
    answer: 1,
    context: 2
});
 
const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });
 
const result = maxim
    .createTestRun("CSV Test Run", workspaceId)
    .withDataStructure({
        question: "INPUT",
        answer: "EXPECTED_OUTPUT",
        context: "CONTEXT_TO_EVALUATE"
    })
    .withData(myCSVFile)
    // ... rest of the configuration

The CSVFile class automatically validates your CSV headers against the data structure and provides type-safe access to your data.

2. Manual data array

For smaller datasets or programmatically generated data:

import { Maxim } from "@maximai/maxim-js";
 
const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });
 
const manualData = [
    {
        question: "What is the capital of France?",
        answer: "Paris",
        context: "France is a country in Western Europe..."
    },
    {
        question: "Who wrote Romeo and Juliet?",
        answer: "William Shakespeare",
        context: "William Shakespeare was an English playwright..."
    }
];
 
const result = maxim
    .createTestRun("Manual Data Test", workspaceId)
    .withDataStructure({
        question: "INPUT",
        answer: "EXPECTED_OUTPUT",
        context: "CONTEXT_TO_EVALUATE"
    })
    .withData(manualData)
    // ... rest of the configuration

3. Platform datasets

Use existing datasets from your Maxim workspace:

import { Maxim } from "@maximai/maxim-js";
 
const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });
 
const result = maxim
    .createTestRun("Platform Dataset Test", workspaceId)
    .withDataStructure({
        question: "INPUT",
        answer: "EXPECTED_OUTPUT",
        context: "CONTEXT_TO_EVALUATE"
    })
    .withData("your-dataset-id")
    // ... rest of the configuration

Trigger a test on a workflow stored on Maxim platform

import { Maxim } from "@maximai/maxim-js";
 
const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });
 
const result = maxim
    .createTestRun("Custom Output Test", workspaceId)
    .withDataStructure({
        question: "INPUT",
        answer: "EXPECTED_OUTPUT",
        context: "CONTEXT_TO_EVALUATE"
    })
    .withData(myData)
    .withWorkflowId(workflowIdFromDashboard, contextToEvaluate) // context to evaluate is optional; it can either be a variable used in the workflow or a column name present in the dataset

Find the workflow ID in the workflows tab and from menu click on copy ID.

screenshot of copying ID workflow

Trigger a test on a prompt version stored on Maxim platform

import { Maxim } from "@maximai/maxim-js";
 
const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });
 
const result = maxim
    .createTestRun("Custom Output Test", workspaceId)
    .withDataStructure({
        question: "INPUT",
        answer: "EXPECTED_OUTPUT",
        context: "CONTEXT_TO_EVALUATE"
    })
    .withData(myData)
    .withPromptVersionId(promptVersionIdFromPlatform, contextToEvaluate) // context to evaluate is optional; it can either be a variable used in the prompt or a column name present in the dataset

To get prompt version ID, go to prompts tab, select the version you want to run tests on and from menu click on copy version id.

topbar

Custom output function

The output function is where you define how to generate responses for your test cases:

import { Maxim } from "@maximai/maxim-js";
 
const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });
 
const result = maxim
    .createTestRun("Custom Output Test", workspaceId)
    .withDataStructure({
        question: "INPUT",
        answer: "EXPECTED_OUTPUT",
        context: "CONTEXT_TO_EVALUATE"
    })
    .withData(myData)
    .yieldsOutput(async (data) => {
        // Call your model or API
        const response = await yourModel.call(
            data.question,
            data.context
        );
 
        return {
            // Required: The actual output
            data: response.text,
 
            // Optional: Context used for evaluation
            // Returning a value here will utilize this context for
            // evaluation instead of the CONTEXT_TO_EVALUATE column (if provided)
            retrievedContextToEvaluate: response.relevantContext,
 
            // Optional: Performance metrics
            meta: {
                usage: {
                    promptTokens: response.usage.prompt_tokens,
                    completionTokens: response.usage.completion_tokens,
                    totalTokens: response.usage.total_tokens,
                    latency: response.latency
                },
                cost: {
                    input: response.cost.input,
                    output: response.cost.output,
                    total: response.cost.input + response.cost.output
                }
            }
        };
    })

If your output function throws an error, the entry will be marked as failed and you'll receive the index in the failed_entry_indices array after the run completes.

Adding evaluators

Choose which evaluators to use for your test run:

import { Maxim } from "@maximai/maxim-js";
 
const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });
 
const result = maxim
    .createTestRun("Evaluated Test", workspaceId)
    // ... previous configuration
    .withEvaluators(
        "Faithfulness", // names of evaluators installed in your workspace
        "Semantic Similarity",
        "Answer Relevance"
    )

Human evaluation

For evaluators that require human input, setting up the human evaluation configuration is required and can be done as follows:

import { Maxim } from "@maximai/maxim-js";
 
const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });
 
const result = maxim
    .createTestRun("Human Evaluated Test", workspaceId)
    // ... previous configuration
    .withEvaluators("Human Evaluator")
    .withHumanEvaluationConfig({
        emails: ["[email protected]"],
        instructions: "Please evaluate the response according to the evaluation criteria"
    })

Custom evaluators

You can create custom evaluators to implement specific evaluation logic for your test runs:

import {
    Maxim,
    createDataStructure,
    createCustomEvaluator,
    createCustomCombinedEvaluatorsFor,
} from "@maximai/maxim-js";
 
const maxim = new Maxim({
    apiKey: process.env.MAXIM_API_KEY
});
 
const dataStructure = createDataStructure({
    Input: 'INPUT',
    'Expected Output': 'EXPECTED_OUTPUT',
    stuff: 'CONTEXT_TO_EVALUATE',
});
 
// example of creating a custom evaluator
const myCustomEvaluator = createCustomEvaluator<typeof dataStructure>(
    'apostrophe-checker',
    (result) => {
        if (result.output.includes("'")) {
            return {
                score: true,
                reasoning: 'The output contains an apostrophe',
            };
        } else {
            return {
                score: false,
                reasoning: 'The output does not contain an apostrophe',
            };
        }
    },
    {
        onEachEntry: {
            scoreShouldBe: '=',
            value: true,
        },
        forTestrunOverall: {
            overallShouldBe: '>=',
            value: 80,
            for: 'percentageOfPassedResults',
        },
    },
);
 
// example of creating a combined custom evaluator
const myCombinedCustomEvaluator = createCustomCombinedEvaluatorsFor(
    'apostrophe-checker-2',
    'containsSpecialCharacters',
).build<typeof dataStructure>(
    (result) => {
        return {
            'apostrophe-checker-2': {
                score: result.output.includes("'") ? true : false,
                reasoning: result.output.includes("'")
                    ? 'The output contains an apostrophe'
                    : 'The output does not contain an apostrophe',
            },
            containsSpecialCharacters: {
                score: result.output
                    .split('')
                    .filter((char) => /[!@#$%^&*(),.?"':{}|<>]/.test(char))
                    .length,
            },
        };
    },
    {
        'apostrophe-checker-2': {
            onEachEntry: {
                scoreShouldBe: '=',
                value: true,
            },
            forTestrunOverall: {
                overallShouldBe: '>=',
                value: 80,
                for: 'percentageOfPassedResults',
            },
        },
        containsSpecialCharacters: {
            onEachEntry: {
                scoreShouldBe: '>',
                value: 3,
            },
            forTestrunOverall: {
                overallShouldBe: '>=',
                value: 80,
                for: 'percentageOfPassedResults',
            },
        },
    },
);

Using custom evaluators

Once created, custom evaluators can be used alongside built-in evaluators:

import { Maxim } from "@maximai/maxim-js";
 
const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });
 
const result = await maxim
    .createTestRun(`sdk test run ${Date.now()}`, payload.workspaceId)
    .withEvaluators(
        // platform evaluators
        'Faithfulness',
        'Semantic Similarity',
        // custom evaluators
        myCustomEvaluator,
        myCombinedCustomEvaluator,
    )
    .run();

Advanced configuration

Concurrency control

Manage how many entries are processed in parallel:

import { Maxim } from "@maximai/maxim-js";
 
const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });
 
const result = await maxim
    .createTestRun("Long Test", workspaceId)
    // ... previous configuration
    .withConcurrency(5); // Process 5 entries at a time

Timeout configuration

Set custom timeout for long-running tests:

import { Maxim } from "@maximai/maxim-js";
 
const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });
 
const result = await maxim
    .createTestRun("Long Test", workspaceId)
    // ... previous configuration
    .run(120) // Wait up to 120 minutes

Complete example

Here's a complete example combining all the features:

import { CSVFile, Maxim } from '@maximai/maxim-js';
 
const maxim = new Maxim({ apiKey: "YOUR_API_KEY" });
 
// Initialize your data source
const testData = new CSVFile('./qa_dataset.csv', {
    question: 0,
    expected_answer: 1,
    context: 2,
    metadata: 3
});
 
try {
    const result = await maxim
        .createTestRun(`QA Evaluation ${new Date().toISOString()}`, 'your-workspace-id')
        .withDataStructure({
            question: "INPUT",
            expected_answer: "EXPECTED_OUTPUT",
            context: "CONTEXT_TO_EVALUATE",
            metadata: "NULLABLE_VARIABLE"
        })
        .withData(testData)
        .yieldsOutput(async (data) => {
            const startTime = Date.now();
 
            // Your model call here
            const response = await yourModel.generateAnswer(
                data.question,
                data.context
            );
 
            const latency = Date.now() - startTime;
 
            return {
                data: response.answer,
                // Returning a value here will utilize this context for
                // evaluation instead of the CONTEXT_TO_EVALUATE column
                // (in this case, the `context` column)
                retrievedContextToEvaluate: response.retrievedContext,
                meta: {
                    usage: {
                        promptTokens: response.tokens.prompt,
                        completionTokens: response.tokens.completion,
                        totalTokens: response.tokens.total,
                        latency
                    },
                    cost: {
                        input: response.cost.prompt,
                        output: response.cost.completion,
                        total: response.cost.total
                    }
                }
            };
        })
        .withEvaluators(
            "Faithfulness",
            "Answer Relevance",
            "Human Evaluator"
        )
        .withHumanEvaluationConfig({
            emails: ["[email protected]"],
            instructions: `Please evaluate the responses for accuracy and completeness. Consider both factual correctness and answer format.`
        })
        .withConcurrency(10)
        .run(30); // 30 minutes timeout
 
    console.log("Test Run Link:", result.testRunResult.link);
    console.log("Failed Entries:", result.failedEntryIndices);
    console.log("Evaluation Results:", result.testRunResult.result[0]);
    /*
    the result.testRunResult.result[0] object looks like this (values are mock data):
    {
        cost: {
            input: 1.905419538506091,
            completion: 2.010163610111029,
            total: 3.915583148617119
        },
        latency: {
            min: 6,
            max: 484.5761906393187,
            p50: 438,
            p90: 484,
            p95: 484,
            p99: 484,
            mean: 346.2,
            standardDeviation: 179.4284,
            total: 5
        },
        name: 'sdk test run 1734931207308',
        usage: { completion: 206, input: 150, total: 356 },
        individualEvaluatorMeanScore: {
            Faithfulness: { score: 0, outOf: 1 },
            'Answer Relevance': { score: 0.2, outOf: 1 },
        }
    }
    */
} catch (error) {
    console.error("Test Run Failed:", error);
} finally {
    await maxim.cleanup();
}

On this page