Quickstart

Evaluating your prompt for tool call accuracy

Ensuring your prompt generates accurate tool calls is crucial for building reliable and efficient AI workflows. Maxim's Tool Schema feature streamlines this process and assesses the performance of your prompts against defined schemas. This guide explains how to leverage the schema input and evaluate tool call accuracy effectively.

Understanding tool call accuracy

Tool call accuracy measures how well your prompt generates responses that align with the expected tool schema. High accuracy ensures seamless integration and reliable outcomes, minimizing errors and manual corrections.

Steps to evaluate tool call accuracy

  1. Create a Prompt tool with schema

    • Navigate to the Prompt tools section and create a new prompt.
    • Select the Schema prompt tool type.
    • Define your schema. For example, a schema to fetch a delivery date might look like:
    Delivery date schema
    {
        "type": "function",
        "function": {
            "name": "get_delivery_date",
            "parameters": {
                "type": "object",
                "required": ["order_id"],
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID."
                    }
                },
                "additionalProperties": false
            },
            "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'"
        }
    }
    • Save the prompt tool using the Save button at the top-right corner of the editor.

    Refer to this resource for more information on prompts and how to set them up.

  2. Test with a saved or new prompt

    • Use a saved prompt or create a new one.
    • Add multiple prompt tools in the prompt configuration to test if the correct tool is picked based on the input.

    Test with a saved or new prompt

  3. Prepare the Dataset

    • Create a dataset with at least two columns:

      • Input: Contains the questions or queries for the prompt, e.g., "Give me the delivery date for the order with order_id = 1243".
      • Tool Call: Specifies the expected tool call output:
      Dataset
      [
          {
              "function": {
                  "arguments": {
                      "order_id": "order_12345"
                  },
                  "name": "get_delivery_date"
              },
              "type": "function"
          }
      ]

    The final dataset would resemble this :

    Final dataset

    To learn more about the dataset, you can refer to this resource.

  4. Configure the Test run

    • In the test configuration, add the dataset.
    • Select the Tool Call Accuracy Evaluator under Statistical Evaluators.
    • If the evaluator is not available in your workspace, add it from the Evaluator Store.

    Configuration Test Run

  5. Test runs and share results

    • Trigger the test and navigate to the Runs section to view performance metrics.
    • Review the detailed reports to analyze tool call accuracy.
    • Share the report with your team to collaborate and refine the workflow further.

    Share results

On this page