Create Datasets Using Templates
Create Datasets quickly with predefined structures using our templates:
Prompt or Workflow Testing
Choose this template for single-turn interactions based on individual inputs to test prompts or workflows. Example: Input column with prompts like “Summarize this article about climate change” paired with an Expected Output column containing ideal responses.Agent Simulation
Select this template for multi-turn simulations to test agent behaviors across conversation sequences. Example: Scenario column with “Customer inquiring about return policy” and Expected Steps column outlining the agent’s expected actions.Dataset Testing
Use this template when evaluating against existing output data to compare expected and actual results. Example: Input column with “What’s the weather in New York?” and Expected Output column with “65°F and sunny” for direct evaluation.Create Datasets Using CSV
You can also import or create datasets in Maxim using CSV files.1
Create or Import a Dataset Using CSV
- Go to the Datasets section in the Library.
- Click Create New or Upload CSV.
- Upload your CSV file.
- Add columns in the Dataset for columns in CSV.
- Add to dataset and save.
CSV-based dataset creation is useful when you already have structured data prepared in spreadsheets or logs. Ensure columns are mapped correctly to avoid mismatches.
Update Existing Datasets Using CSV
You can add new entries to an existing dataset by uploading a CSV with compatible columns.1
Update an Existing Dataset with CSV
- Prepare your CSV file with columns matching your dataset structure (e.g.,
Input
,Expected_Output
). - In the Maxim UI, go to Library → Datasets, select your dataset.
- Use the Upload CSV or Add entries option.
- Map CSV columns to dataset columns.
- Confirm import to append the new rows to your dataset.
When updating a dataset, the CSV must follow the same column structure defined in the dataset to ensure consistency.
Add Images to Your Dataset
You can enhance your datasets by including images alongside other data types. This is particularly useful for:- Visual content evaluation
- Image-based prompts and responses
- Multi-modal testing scenarios
1
Add Images to Your Dataset
You can add images to your Dataset by creating a column of type Images. We support both URL and local file paths.

When working with images in datasets:
- Supported formats include common image types (PNG, JPG, JPEG, GIF)
- For URLs, ensure they are publicly accessible
- For local files, maintain consistent file paths across your team
Column types
Scenario
The Scenario column type allows you to define specific situations or contexts for your test cases. Use this column to describe the background, user intent, or environment in which an interaction takes place. Scenarios help guide agents or models to respond appropriately based on the described situation. Examples:- “A customer wants to buy an iPhone.”
- “A user is trying to cancel their subscription.”
- “A student asks for help with a math problem.”
Expected Steps
The Expected Steps column type allows you to specify the sequence of actions or decisions that an agent should take in response to a given scenario. This helps users clearly outline the ideal process or workflow, making it easier for evaluators to verify whether the agent is behaving as intended. Use this column to break down the expected agent behavior into individual, logical steps. This is especially useful for multi-turn interactions or complex tasks where the agent’s reasoning and actions need to be evaluated step by step. Example:Expected Tool Calls
The Expected Tool Calls column type allows you to specify which tools (such as APIs, functions, or plugins) you expect an agent to use in response to a scenario. This is especially useful when running prompt runs, where you want to evaluate whether the agent is choosing and invoking the correct tools as part of its reasoning process. Use this column to list the names of the tools or actions that should be called, optionally including parameters or expected arguments. This helps ensure that the agent’s tool usage aligns with your expectations for the task. Examples:- “search_web”
- “get_weather(location=‘San Francisco’)”
- “send_email(recipient, subject, body)”
inAnyOrder
anyOne
anyOne
combinator is used when any one of several possible tool calls is acceptable to fulfill the requirement. This is useful in scenarios where there are multiple valid ways for an agent to achieve the same outcome, and you want to allow for flexibility in the agent’s approach.
For example, in the following JSON, either get_pull_request_reviews
or get_pull_request_comments
(with the specified arguments) will be considered a valid response. The agent only needs to make one of these tool calls to satisfy the expectation.
Conversation History
Conversation history allows you to include a chat history while running Prompt tests. The sequence of messages sent to the LLM is as follows:- messages in the prompt version
- history
- input column in the dataset.
Format
- Conversation history is always a JSON array