Function Call Coverage
Checks whether an agent made all required function calls during execution, producing an all_required_calls_made score of 1.0 or 0.0. Use this when evaluating tool-using agents where specific function calls must appear in the trace. For validation of function call arguments rather than presence, use a Python Scorer instead.
This scorer is an experimental feature and the API is subject to change.
Output
all_required_calls_made:1.0if all required function calls were made (in the correct order when usingin_ordermode),0.0otherwise.required_calls_coverage: Fraction of required function calls that were made at least once.num_required_calls_made: Number of required function calls that were made at least once.num_required_calls_not_made: Number of required function calls that were never made.num_unrequired_calls: Number of function calls made to functions not in the required list. This includes both calls to functions that are not required at all, and excessive calls to functions that are required.num_required_calls_total: Total number of required function calls.
Modes
Two modes are available via the mode field:
any_order: All required calls must appear in the trace, in any order.in_order: All required calls must appear in the trace as a subsequence in the specified order.
Given function_calls: ["search", "calculator"]:
| Trace | Mode | all_required_calls_made | num_required_calls_made | num_unrequired_calls |
|---|---|---|---|---|
["search", "calculator"] | any | 1.0 | 2 | 0 |
["calculator", "search"] | any | 1.0 | 2 | 0 |
["calculator", "search"] | in order | 0.0 | 2 | 0 |
["search", "lookup"] | any | 0.0 | 1 | 1 |
Examples
Example: Any Order. Required calls are read from the dataset; the scorer passes if all appear in the trace in any order.
# tasks/task.yaml
...
scorers:
- type: "function_call_coverage"
function_calls: "{{ sample.function_calls }}"
mode: "any_order"Example: In Order. Required calls are defined statically; the scorer passes only if they appear as a subsequence of the trace in the specified order.
# tasks/task.yaml
...
scorers:
- type: "function_call_coverage"
function_calls: '["search", "calculator"]'
mode: "in_order"Configuration
Properties
type Literal "function_call_coverage" required
The type of the scorer.
function_calls string, TemplateValue required
Jinja template that produces the list of required function call names.
The required function calls can be:
- A hard-coded list (e.g.
["search", "calculator"]) - Refer to a sample field (e.g.
"{{ sample.function_calls }}") - Derived from sample data (e.g.
"{{ sample.tools \| map(attribute='name') \| list }}")
sample represents the current row of the dataset (with a field for every dataset column).
The template should produce a JSON list of function call name strings.
mode string
any_order: checks only that every required function call was made at least once, in any order.
in_order: additionally checks that the required function calls appear as a subsequence of the trace — i.e. in the specified order, with other function calls allowed in between.
any_order
key string
Unique identifier assigned to the entity in AI GO!.
Default:None
purpose ScorerPurpose
The purpose of this scorer.
score: The scorer is used to score the solver output or the dataset sample.qa: The scorer is used to do QA over the solver output or the dataset sample.
score
display_name string
The display name of the scorer.
Default:None
metrics array[PythonMetricTemplate, BinaryClassificationMetricTemplate, MulticlassClassificationMetricTemplate, MeanMetricTemplate, MaxMetricTemplate, MinMetricTemplate, StdDevMetricTemplate, FrequencyMetricTemplate, RecallMetricTemplate, PrecisionMetricTemplate, F1ScoreMetricTemplate]
The metrics associated with this scorer, which will produce per-task metrics.
Default:None
