Mean
Computes the arithmetic mean of the scores (with the given name) across all samples. Use this as the default aggregation for any scorer that produces a numeric field - for example, averaging is_correct to get an accuracy rate, or averaging a judge's score to get an overall quality score. For spread around the mean, pair with Standard Deviation.
Output
A single metric named {field}_mean by default, or the value of name if provided.
Examples
Example: RAG QA. Averages the judge's score field across all samples to produce a single quality metric for the run.
...
definition:
...
scorers:
- type: "model_as_a_judge_scorer"
model_key: "<< config.judge_model >>"
system_prompt: >
You are an evaluator assessing the << config.evaluation_dimension >> of a
model response to a question.
On a scale from 0 to 100, assign 100 if the response is fully
<< config.evaluation_dimension >> and 0 if it is not at all.
Return only the numeric score.
user_prompt: >
<context>{{ sample.context }}</context>
<response>{{ model_output.choices[0].message.content }}</response>
score_min: 0
score_max: 100
metrics:
- type: "mean"
field: "score"
name: "Mean Score"
- type: "min"
field: "score"
name: "Min Score"
- type: "max"
field: "score"
name: "Max Score"Example: Completeness. Averages the boolean is_complete field to produce a completeness rate across all dataset samples.
...
definition:
...
scorers:
- type: "python"
compute_scores_snippet: !include "completeness_scorer.py"
metrics:
- type: "mean"
field: "is_complete"
name: "Field Completeness"Configuration
Properties
type Literal "mean" required
The type of the metric.
field string, TemplateValue required
The field over which to compute the mean.
name string, TemplateValue
The name given to the metric value. If not specified, it is {field}_mean
None
key string
Unique identifier assigned to the entity in AI GO!.
Default:None
