Standard Deviation
Computes the standard deviation of the scores (with the given name) across all samples. Use this to measure how consistent the scorer results are - a high standard deviation indicates the model performs unevenly across samples. Pair with Mean to understand both the central tendency and the spread of scores.
Output
A single metric named {field}_std_dev by default, or the value of name if provided.
Examples
Example: Score Consistency. Measures how consistently a model-as-a-judge scorer rates responses by computing the standard deviation of the score field alongside the mean.
scorers:
- type: model_as_a_judge_scorer
model_key: "<< config.judge_model >>"
system_prompt: "Rate the response quality from 0 to 10."
user_prompt: "Response: {{ model_output.choices[0].message.content }}"
score_min: 0
score_max: 10
metrics:
- type: mean
field: score
name: Mean Score
- type: std_dev
field: score
name: Score Std DevConfiguration
Properties
type Literal "std_dev" required
The type of the metric.
field string, TemplateValue required
The field over which to compute the standard deviation.
name string, TemplateValue
The name given to the metric value. If not specified, it is {field}_std_dev
None
key string
Unique identifier assigned to the entity in AI GO!.
Default:None
