F1 Score

Computes the F1 score (2·TP / (2·TP + FP + FN)) - the harmonic mean of precision and recall - from per-sample true positive, false positive, and false negative counts. Use this with scorers that produce counts - for example, a scorer that computes how many items in the model's output were actually correct and how many were not. Pair with Precision and Recall to see the individual components.

Output

A single metric named f1_score by default, or the value of name if provided. Value is in the [0, 1] range; 1.0 means perfect precision and recall.

Examples

Example: Function Call Overlap. A Python scorer checks which required function calls the model made. F1 balances precision and recall into a single overlap score.

scorers:
  - type: python
    compute_scores_snippet: !include "function_overlap_scorer.py"
    # scorer returns: num_true_positives, num_false_positives, num_false_negatives
    metrics:
      - type: precision
        num_true_positives_field: num_true_positives
        num_false_positives_field: num_false_positives
        name: Call Precision
      - type: recall
        num_true_positives_field: num_true_positives
        num_false_negatives_field: num_false_negatives
        name: Call Recall
      - type: f1_score
        num_true_positives_field: num_true_positives
        num_false_positives_field: num_false_positives
        num_false_negatives_field: num_false_negatives
        name: Call F1

Configuration

Properties

type Literal "f1_score"

Type

Default: f1_score

num_true_positives_field string required

The field that contains the number of true positives.

num_false_positives_field string required

The field that contains the number of false positives.

num_false_negatives_field string required

The field that contains the number of false negatives.

name string

The name given to the metric value. If not specified, it is f1_score.

Default: None

key string

Unique identifier assigned to the entity in AI GO!.

Default: None