Binary Classification
Computes a set of binary classification metrics by comparing predicted and ground-truth labels across all samples. Use this when your scorer outputs both a prediction and a ground-truth label per sample and you want a comprehensive view of classification quality - including accuracy, precision, recall, F1, and a confusion matrix. For multi-class problems, use Multiclass Classification instead.
Output
Multiple metrics derived from the confusion matrix:
accuracy: Fraction of samples correctly classified.precision: TP / (TP + FP) for the positive class.recall: TP / (TP + FN) for the positive class.f1_score: Harmonic mean of precision and recall.
Label Matching
Ground-truth and predicted values are compared as strings against
positive_answer and negative_answer. Samples whose values do not match
either answer are excluded from the report.
Examples
Example: Yes/No Classification. A model-as-a-judge classifier labels each response as "yes" or "no". The binary classification metric compares the judge's predictions against the ground-truth labels stored in the dataset.
scorers:
- type: model_as_a_judge_classifier
model_key: "<< config.judge_model >>"
system_prompt: "Does the response correctly answer the question? Reply yes or no."
user_prompt: "Response: {{ model_output.choices[0].message.content }}"
correct_labels:
- "yes"
incorrect_labels:
- "no"
metrics:
- type: binary-classification
field_gt: ground_truth_label
field_pred: predicted_label
positive_answer: "yes"
negative_answer: "no"Configuration
Properties
type Literal "binary-classification" required
The type of the metric.
field_gt string, TemplateValue required
The field in the scores containing the ground-truth answer.
field_pred string, TemplateValue required
The field in the scores containing the predicted answer.
positive_answer string, TemplateValue required
The answer treated as a positive. The answer field is converted to a string before comparison.
negative_answer string, TemplateValue required
The answer treated as a negative. The answer field is converted to a string before comparison.
key string
Unique identifier assigned to the entity in AI GO!.
Default:None
