The workspace ID can be found in your URL following the /workspace
path.
JS/TS
Learn how to programmatically run test runs using Maxim's SDK with custom datasets, flexible output functions, and evaluations for your AI applications.
The SDK uses a builder pattern to configure and run tests. Here's a basic example:
Understanding Data Structure
The data structure is a crucial concept that helps maintain type safety and validates your data columns. It maps your data columns to specific types that Maxim understands.
Basic Structure
The data structure is an object where keys are your column names and values are the column types.
Available Types
INPUT
- Main input text (only one allowed)EXPECTED_OUTPUT
- Expected response (only one allowed)CONTEXT_TO_EVALUATE
- Context for evaluation (only one allowed)VARIABLE
- Additional data columns (multiple allowed)NULLABLE_VARIABLE
- Optional data columns (multiple allowed)
Example
Working with Data Sources
Maxim's SDK supports multiple ways to provide test data:
1. CSV Files
The CSVFile
class provides a robust way to work with CSV files:
The CSVFile
class automatically validates your CSV headers against the data structure and provides type-safe access to your data.
2. Manual Data Array
For smaller datasets or programmatically generated data:
3. Platform Dataset
Use existing datasets from your Maxim workspace:
Custom Output Function
The output function is where you define how to generate responses for your test cases:
If your output function throws an error, the entry will be marked as failed and you'll receive the index in the failedEntryIndices
array after the run completes.
Adding Evaluators
Choose which evaluators to use for your test run:
Human Evaluation
For evaluators that require human input, setting up the human evaluation configuration is required and can be done as follows:
Advanced Configuration
Concurrency Control
Manage how many entries are processed in parallel:
Timeout Configuration
Set custom timeout for long-running tests:
Complete Example
Here's a complete example combining all the features: