Python
Computes fully custom metrics using a Python function that receives all sample scores for all samples and returns a dict of metric names to numeric values. Use this when the built-in aggregation metrics cannot express the logic you need - for example, accuracy over only valid samples, or composite metrics derived from multiple score fields. For simpler aggregations, prefer the declarative metrics such as Mean or Frequency.
Output
One metric per key returned by compute_metrics. Values must be int or float.
Function Signature
Both def and async def are supported. The function receives a
list[SampleScore] and must return dict[str, int | float]. A SampleScore
has a values field (a dict of score names to score values) and a metadata field
(a dict of arbitrary metadata computed by the scorer).
def compute_metrics(scores: "list[SampleScore]") -> dict[str, int | float]:
...Each SampleScore has a values dict containing the score fields produced
by the scorer for that sample.
Examples
Example: Geography QA. Computes accuracy only over samples where the is_valid score is True, plus a separate validity rate metric.
...
definition:
...
scorers:
- type: "python"
compute_scores_snippet: !include "geography_scorer.py"
metrics:
- type: "python"
compute_metrics_snippet: !include "geography_metric.py"from __future__ import annotations
from latticeflow.assessment.dtypes import SampleScore
def compute_metrics(scores: list[SampleScore]) -> dict[str, int | float]:
valid_scores = [s for s in scores if s.values["is_valid"]]
return {
"accuracy": sum(s.values["is_correct"] for s in valid_scores) / len(scores),
"validity": len(valid_scores) / len(scores),
}Configuration
Properties
type Literal "python" required
The type of the metric.
compute_metrics_snippet string, TemplateValue required
The Python snippet that defines a compute_metrics(scores: list[SampleScore]) -> dict[str, int \| float] function that computes the metric values. Both def and async def are supported.
key string
Unique identifier assigned to the entity in AI GO!.
Default:None
