Binary Classification

Computes a set of binary classification metrics by comparing predicted and ground-truth labels across all samples. Use this when your scorer outputs both a prediction and a ground-truth label per sample and you want a comprehensive view of classification quality - including accuracy, precision, recall, F1, and a confusion matrix. For multi-class problems, use Multiclass Classification instead.

Output

Multiple metrics derived from the confusion matrix:

accuracy: Fraction of samples correctly classified.
precision: TP / (TP + FP) for the positive class.
recall: TP / (TP + FN) for the positive class.
f1_score: Harmonic mean of precision and recall.

Label Matching

Ground-truth and predicted values are compared as strings against positive_answer and negative_answer. Samples whose values do not match either answer are excluded from the report.

Examples

Example: Yes/No Classification. A model-as-a-judge classifier labels each response as "yes" or "no". The binary classification metric compares the judge's predictions against the ground-truth labels stored in the dataset.

scorers:
  - type: model_as_a_judge_classifier
    model_key: "<< config.judge_model >>"
    system_prompt: "Does the response correctly answer the question? Reply yes or no."
    user_prompt: "Response: {{ model_output.choices[0].message.content }}"
    correct_labels:
      - "yes"
    incorrect_labels:
      - "no"
    metrics:
      - type: binary-classification
        field_gt: ground_truth_label
        field_pred: predicted_label
        positive_answer: "yes"
        negative_answer: "no"

Configuration

Properties

type Literal "binary-classification" required

The type of the metric.

field_gt string, TemplateValue required

The field in the scores containing the ground-truth answer.

field_pred string, TemplateValue required

The field in the scores containing the predicted answer.

positive_answer string, TemplateValue required

The answer treated as a positive. The answer field is converted to a string before comparison.

negative_answer string, TemplateValue required

The answer treated as a negative. The answer field is converted to a string before comparison.

key string

Unique identifier assigned to the entity in AI GO!.

Default: None