Multiclass Classification

Computes a set of multiclass classification metrics by comparing predicted and ground-truth scores across all samples. Use this when your scorer outputs both a prediction and a ground-truth label per sample and the label space has more than two classes. Class labels are inferred automatically from the data - you do not need to enumerate them. For two-class problems, use Binary Classification instead.

Output

Multiple metrics per class and overall:

accuracy: Fraction of samples correctly classified across all classes.
{class}_precision, {class}_recall: Per-class metrics.

Class Inference

Class labels are derived from the values found in field_gt and field_pred at runtime. No configuration is needed to specify the label set in advance.

Examples

Example: Sentiment Classification. A labeler assigns one of three sentiment labels to each response. The multiclass classification metric compares the assigned labels against a ground-truth sentiment column in the dataset.

scorers:
  - type: python
    compute_scores_snippet: !include compute_scores.py
    # Produces 'gt_label' and 'pred_label' scores 
    metrics:
      - type: multiclass-classification
        field_gt: gt_label
        field_pred: pred_label

Configuration

Properties

type Literal "multiclass-classification" required

The type of the metric.

field_gt string, TemplateValue required

The field in the scores containing the ground-truth answer.

field_pred string, TemplateValue required

The field in the scores containing the predicted answer.

key string

Unique identifier assigned to the entity in AI GO!.

Default: None