Text Similarity (BLEU Score)
Scores each sample by computing n-gram overlap between the value to check and the ground-truth, producing a bleu_score in the [0, 1] range. Use this for translation or text generation tasks where the expected output is well-defined. Avoid for single-word comparisons. For semantic similarity or paraphrased outputs, use the Model Scorer instead.
Output
bleu_score: n-gram overlap with the ground-truth, in the[0, 1]range.0.0means no overlap;1.0means a perfect match.
Configuration
Properties
type Literal "bleu" required
The type of the scorer.
ground_truth string, TemplateValue required
The ground truth against which the value is compared.
The ground-truth can be:
- A hard-coded string (ex:
"YES") - Refer to the sample data (ex:
"{{ sample.country }}") - Or a mix of (1) and (2) (ex:
"The country is {{ sample.country }}").
sample represents the current row of the dataset (with a field for every dataset
column).
value string, TemplateValue
The value which will be compared against the ground-truth.
The value can be:
- A hard-coded string (ex:
"YES") - Refer to the sample data (ex:
"{{ sample.country }}") - (For model tasks) Refer to the solver output (ex:
"{{ solver_output.output }}") - Or a mix of the others (ex:
"The country is {{ sample.country }}").
sample represents the current row of the dataset (with a field for every dataset
column).
If value is None:
- If the task has a solver and the solver output is a chat completion response, then the value is set to the output message content.
- Otherwise, an error is produced.
None
key string
Unique identifier assigned to the entity in AI GO!.
Default:None
purpose ScorerPurpose
The purpose of this scorer.
score: The scorer is used to score the solver output or the dataset sample.qa: The scorer is used to do QA over the solver output or the dataset sample.
score
display_name string
The display name of the scorer.
Default:None
metrics array[PythonMetricTemplate, BinaryClassificationMetricTemplate, MulticlassClassificationMetricTemplate, MeanMetricTemplate, MaxMetricTemplate, MinMetricTemplate, StdDevMetricTemplate, FrequencyMetricTemplate, RecallMetricTemplate, PrecisionMetricTemplate, F1ScoreMetricTemplate]
The metrics associated with this scorer, which will produce per-task metrics.
Default:None
