Maxim Logo
How toEvaluate Workflows via API

Automate workflow evaluation via CI/CD

Trigger test runs in CI/CD pipelines to evaluate workflows automatically.

The following builds upon Evaluate Prompts -> Automate prompt evaluation via CI/CD. Please refer to it if you haven't already.

Pre-requisites

Apart from the pre-requisites mentioned in Evaluate Prompts -> Automate prompt evaluation via CI/CD, you also need A workflow to test upon

Pre-requisites mentioned earlier that you need:

  1. An API key from Maxim
  2. A dataset to test against
  3. Evaluators to evaluate the workflow against the dataset
  4. and A workflow to test upon

Test runs via CLI

Apart from what was introduced earlier, you can use the -w flag in place of -p to specify the workflow to test upon.

Installation

Use the following command template to install the CLI tool (if you are using Windows, please refer to the Windows example as well):

wget https://downloads.getmaxim.ai/cli/<VERSION>/<OS>/<ARCH>/maxim

For more please refer to Evaluate Prompts -> Automate prompt evaluation via CI/CD -> Test runs via CLI

Triggering a test run

Use this template to trigger a test run:

Shell command to trigger a test run
# If you haven't added the binary to your PATH,
# replace `maxim` with the path to the binary you just downloaded
maxim test -w <workflow_id> -d <dataset_id> -e <comma_separated_evaluator_names>

Here are the arguments/flags that you can pass to the CLI to configure your test run

Argument / FlagDescription
-wWorkflow ID or IDs; in case you send multiple IDs (comma separated), it will create a comparison run.
-dDataset ID
-eComma separated evaluator names
Ex. bias,clarity
--json(optional) Output the result in JSON format

Test runs via GitHub Action

Apart from what was introduced earlier, you can use the workflow_id "with parameter" in place of prompt_version_id to specify the workflow to test upon.

Quick Start

In order to add the GitHub Action to your workflow, you can start by adding a step that uses maximhq/actions/test-runs@v1 as follows:

Please ensure that you have the following setup:

  • in GitHub action secrets
    • MAXIM_API_KEY
  • in GitHub action variables
    • WORKSPACE_ID
    • DATASET_ID
    • WORKFLOW_ID
.github/workflows/test-runs.yml
name: Run Test Runs with Maxim
 
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
 
env:
  TEST_RUN_NAME: "Test Run via GitHub Action"
  CONTEXT_TO_EVALUATE: "context"
  EVALUATORS: "bias, clarity, faithfulness"
 
jobs:
  test_run:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Repository
        uses: actions/checkout@v2
      - name: Running Test Run
        id: test_run
        uses: maximhq/actions/test-runs@v1
        with:
          api_key: ${{ secrets.MAXIM_API_KEY }}
          workflow_id: ${{ vars.WORKFLOW_ID }}
          test_run_name: ${{ env.TEST_RUN_NAME }}
          dataset_id: ${{ vars.DATASET_ID }}
          workflow_id: ${{ vars.WORKFLOW_ID }}
          context_to_evaluate: ${{ env.CONTEXT_TO_EVALUATE }}
          evaluators: ${{ env.EVALUATORS }}
      - name: Display Test Run Results
        if: success()
        run: |
          printf '%s\n' '${{ steps.test_run.outputs.test_run_result }}'
          printf '%s\n' '${{ steps.test_run.outputs.test_run_failed_indices }}'
          echo 'Test Run Report URL: ${{ steps.test_run.outputs.test_run_report_url }}'

This will trigger a test run on the platform and wait for it to complete before proceeding. The progress of the test run will be displayed in the Running Test Run section of the GitHub Action's logs as displayed below:

GitHub Action Running Test Run Logs

Inputs

The following are the inputs that can be used to configure the GitHub Action:

NameDescriptionRequired
api_keyMaxim API keyYes
workspace_idWorkspace ID to run the test run inYes
test_run_nameName of the test runYes
dataset_idDataset ID for the test runYes
workflow_idWorkflow ID to run for the test run (do not use with prompt_version_id)Yes (No if prompt_version_id is provided)
prompt_version_idPrompt version ID to run for the test run
(discussed in Evaluate Prompts -> Automate prompt evaluation via CI/CD, do not use with workflow_id)
Yes (No if workflow_id is provided)
context_to_evaluateVariable name to evaluate; could be any variable used in the workflow / prompt or a column nameNo
evaluatorsComma separated list of evaluator namesNo
human_evaluation_emailsComma separated list of emails to send human evaluations toNo (required in case there is a human evaluator in evaluators)
human_evaluation_instructionsOverall instructions for human evaluatorsNo
concurrencyMaximum number of concurrent test run entries runningNo (defaults to 10)
timeout_in_minutesFail if test run overall takes longer than this many minutesNo (defaults to 15 minutes)

Outputs

The outputs that are provided by the GitHub Action in case it doesn't fail are:

NameDescription
test_run_resultResult of the test run
test_run_report_urlURL of the test run report
test_run_failed_indicesIndices of failed test run entries

Evaluating Prompts

Please refer to Evaluate Prompts -> Automate prompt evaluation via CI/CD

On this page