Please ensure that you have the following setup:
- in GitHub action secrets
- MAXIM_API_KEY
- in GitHub action variables
- WORKSPACE_ID
- DATASET_ID
- WORKFLOW_ID
Learn how to integrate test runs with Maxim's GitHub Action.
GitHub actions enable you to automate your CI/CD pipeline. They provide a powerful way to run tests, build, and deploy your application. Our GitHub Action can seamlessly integrate with your existing deployment workflows, allowing you to ensure that your LLM is functioning as you expect.
In order to add the GitHub Action to your workflow, you can start by adding a step that uses maximhq/actions/test-runs@v1
as follows:
Please ensure that you have the following setup:
name: Run Test Runs with Maxim
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
TEST_RUN_NAME: "Test Run via GitHub Action"
CONTEXT_TO_EVALUATE: "context"
EVALUATORS: "bias, clarity, faithfulness"
jobs:
test_run:
runs-on: ubuntu-latest
steps:
- name: Checkout Repository
uses: actions/checkout@v2
- name: Running Test Run
id: test_run
uses: maximhq/actions/test-runs@v1
with:
api_key: ${{ secrets.MAXIM_API_KEY }}
workspace_id: ${{ vars.WORKSPACE_ID }}
test_run_name: ${{ env.TEST_RUN_NAME }}
dataset_id: ${{ vars.DATASET_ID }}
workflow_id: ${{ vars.WORKFLOW_ID }}
context_to_evaluate: ${{ env.CONTEXT_TO_EVALUATE }}
evaluators: ${{ env.EVALUATORS }}
- name: Display Test Run Results
if: success()
run: |
printf '%s\n' '${{ steps.test_run.outputs.test_run_result }}'
printf '%s\n' '${{ steps.test_run.outputs.test_run_failed_indices }}'
echo 'Test Run Report URL: ${{ steps.test_run.outputs.test_run_report_url }}'
This will trigger a test run on the platform and wait for it to complete before proceeding. The progress of the test run will be displayed in the Running Test Run section of the GitHub Action's logs as displayed below:
The following are the inputs that can be used to configure the GitHub Action:
Name | Description | Required |
---|---|---|
api_key | Maxim API key | Yes |
workspace_id | Workspace ID to run the test run in | Yes |
test_run_name | Name of the test run | Yes |
dataset_id | Dataset ID for the test run | Yes |
workflow_id | Workflow ID to run for the test run (do not use with prompt_version_id ) | Yes (No if prompt_version_id is provided) |
prompt_version_id | Prompt version ID to run for the test run (do not use with workflow_id ) | Yes (No if workflow_id is provided) |
context_to_evaluate | Variable name to evaluate; could be any variable used in the workflow / prompt or a column name | No |
evaluators | Comma separated list of evaluator names | No |
human_evaluation_emails | Comma separated list of emails to send human evaluations to | No (required in case there is a human evaluator in evaluators ) |
human_evaluation_instructions | Overall instructions for human evaluators | No |
concurrency | Maximum number of concurrent test run entries running | No (defaults to 10) |
timeout_in_minutes | Fail if test run overall takes longer than this many minutes | No (defaults to 15 minutes) |
The outputs that are provided by the GitHub Action in case it doesn't fail are:
Name | Description |
---|---|
test_run_result | Result of the test run |
test_run_report_url | URL of the test run report |
test_run_failed_indices | Indices of failed test run entries |