Standard Deviation

Computes the standard deviation of the scores (with the given name) across all samples. Use this to measure how consistent the scorer results are - a high standard deviation indicates the model performs unevenly across samples. Pair with Mean to understand both the central tendency and the spread of scores.

Output

A single metric named {field}_std_dev by default, or the value of name if provided.

Examples

Example: Score Consistency. Measures how consistently a model-as-a-judge scorer rates responses by computing the standard deviation of the score field alongside the mean.

scorers:
  - type: model_as_a_judge_scorer
    model_key: "<< config.judge_model >>"
    system_prompt: "Rate the response quality from 0 to 10."
    user_prompt: "Response: {{ model_output.choices[0].message.content }}"
    score_min: 0
    score_max: 10
    metrics:
      - type: mean
        field: score
        name: Mean Score
      - type: std_dev
        field: score
        name: Score Std Dev

Configuration

Properties

type Literal "std_dev" required

The type of the metric.

field string, TemplateValue required

The field over which to compute the standard deviation.

name string, TemplateValue

The name given to the metric value. If not specified, it is {field}_std_dev

Default: None

key string

Unique identifier assigned to the entity in AI GO!.

Default: None