String Equality

Scores each sample by checking whether the value-to-check exactly matches the ground-truth answer, producing an is_correct score of 1.0 (match) or 0.0 (no match). Use this for short, well-defined answers such as yes/no, a single word, or a fixed label. For nuanced or free-form outputs, use the Model Scorer instead.

Output

  • is_correct: 1.0 if the model output matches the ground-truth exactly, 0.0 otherwise.

Comparison Behavior

Comparison is case-sensitive and whitespace-sensitive. By default the full content of the last chat completion message is compared against the ground-truth. Use value to override what is compared.

Examples

Example: Yes/No QA. Compares the model's answer against a ground-truth stored in the answer column.

...
definition:
  ...
  scorers:
    - type: "string_equals"
      ground_truth: "{{ sample.answer }}"
      metrics:
        - type: "mean"
          field: "is_correct"
          name: "Accuracy"

Configuration

Properties


type Literal "string_equals" required

The type of the scorer.


ground_truth string, TemplateValue required

The ground truth against which the solver output is compared.

The ground-truth can be:

  1. A hard-coded string (ex: "YES")
  2. Refer to the sample data (ex: "{{ sample.country }}")
  3. Or a mix of (1) and (2) (ex: "The country is {{ sample.country }}").

sample represents the current row of the dataset (with a field for every dataset column).


value string, TemplateValue

The value which will be compared against the ground-truth.

The value can be:

  1. A hard-coded string (ex: "YES")
  2. Refer to the sample data (ex: "{{ sample.country }}")
  3. (For model tasks) Refer to the solver output (ex: "{{ solver_output.output }}")
  4. Or a mix of the others (ex: "The country is {{ sample.country }}").

sample represents the current row of the dataset (with a field for every dataset column).

If value is None:

  • If the task has a solver and the solver output is a chat completion response, then the value is set to the output message content.
  • Otherwise, an error is produced.
Default: None

purpose ScorerPurpose

The purpose of this scorer.

  • score: The scorer is used to score the solver output or the dataset sample.
  • qa: The scorer is used to do QA over the solver output or the dataset sample.
Default: score

key string

Unique identifier assigned to the entity in AI GO!.

Default: None

display_name string

The display name of the scorer.

Default: None

metrics array[PythonMetricTemplate, BinaryClassificationMetricTemplate, MulticlassClassificationMetricTemplate, MeanMetricTemplate, MaxMetricTemplate, MinMetricTemplate, StdDevMetricTemplate, FrequencyMetricTemplate, RecallMetricTemplate, PrecisionMetricTemplate, F1ScoreMetricTemplate]

The metrics associated with this scorer, which will produce per-task metrics.

Default: None