String Equality (Multiple Choice)

Scores each sample by checking whether the first character of the model output matches the correct multiple-choice key, producing an is_correct score of 1.0 (match) or 0.0 (no match). Use this for standard MCQA tasks where the model responds with a single letter such as A, B, C, or D. For free-form answers, use String Equality or the Model Scorer instead.

Output

  • is_correct: 1.0 if the predicted choice matches the ground-truth key, 0.0 otherwise.

Validity Checking

If the model output does not unambiguously match any of the provided choices, the sample is marked incorrect and the completion_validity metadata field is set to INVALID. Only single-character choices are supported.

Examples

Example: Multiple Choice QA. Shows a task where the dataset provides the correct answer letter and the list of valid single-character choices.

...
definition:
  ...
  scorers:
    - type: "string_equals_mcqa"
      ground_truth_choice: "{{ sample.<< config.answer_column >> }}"
      choices: "{{ sample.choices }}"
      metrics:
        - type: "mean"
          field: "is_correct"
          name: "MCQA Accuracy"
| prompt                                             | answer | choices         |
| :------------------------------------------------- | :----- | :-------------- |
| What is the correct answer? A. ..., B. ..., C. ... | B      | ['A', 'B', 'C'] |
| What is the correct answer? A. ..., B. ..., C. ... | C      | ['A', 'B', 'C'] |
💡

The choices field expects single character choices. Multiple character choices (i.e., 'yes' or 'no') are not supported.

Configuration

Properties


type Literal "string_equals_mcqa" required

The type of the scorer.


ground_truth_choice string, TemplateValue

Jinja template that produces the ground truth choice.

The ground-truth choice can be:

  1. A hard-coded string (e.g. "A")
  2. Refer to a sample field (e.g. "{{ sample.correct_answer }}")
  3. Derived from sample data (e.g. "{{ sample.choices[sample.correct_index] }}")
  4. Or a mix of the above (e.g. "{{ sample.answer_key \| upper }}")

sample represents the current row of the dataset (with a field for every dataset column).

The template should produce a single character choice (e.g., "B").

Default: None

choices string, TemplateValue

Jinja template that produces the list of choices.

The choices can be:

  1. A hard-coded list (e.g. ["A", "B", "C", "D"])
  2. Refer to a sample field (e.g. "{{ sample.answer_choices }}")
  3. Derived from sample data (e.g. "{{ sample.options \| map(attribute='key') \| list }}")

sample represents the current row of the dataset (with a field for every dataset column).

The template should produce a list of single character choices (e.g., ["A", "B", "C", "D"]). The result can be a JSON string or a Python list.

Default: None

purpose ScorerPurpose

The purpose of this scorer.

  • score: The scorer is used to score the solver output or the dataset sample.
  • qa: The scorer is used to do QA over the solver output or the dataset sample.
Default: score

key string

Unique identifier assigned to the entity in AI GO!.

Default: None

display_name string

The display name of the scorer.

Default: None

metrics array[PythonMetricTemplate, BinaryClassificationMetricTemplate, MulticlassClassificationMetricTemplate, MeanMetricTemplate, MaxMetricTemplate, MinMetricTemplate, StdDevMetricTemplate, FrequencyMetricTemplate, RecallMetricTemplate, PrecisionMetricTemplate, F1ScoreMetricTemplate]

The metrics associated with this scorer, which will produce per-task metrics.

Default: None

ground_truth_choice_field string, TemplateValue

Column in the dataset that contains the ground truth choice. The column is expected to contain single character choices (e.g. 'B'). The first character of the model's output message content is string matched (string equality) against this value to evaluate a sample.

This field is deprecated and will be removed in future versions. Use 'ground_truth_choice' with a Jinja template instead (e.g., '{{ sample.field_name }}').

Default: None

choices_field string, TemplateValue

Column in the dataset that contains the choices e.g. a column containing ['A', 'B', 'C', 'D'].

This field is deprecated and will be removed in future versions. Use 'choices' with a Jinja template instead (e.g., '{{ sample.field_name }}').

Default: None