Python

Computes fully custom metrics using a Python function that receives all sample scores for all samples and returns a dict of metric names to numeric values. Use this when the built-in aggregation metrics cannot express the logic you need - for example, accuracy over only valid samples, or composite metrics derived from multiple score fields. For simpler aggregations, prefer the declarative metrics such as Mean or Frequency.

Output

One metric per key returned by compute_metrics. Values must be int or float.

Function Signature

Both def and async def are supported. The function receives a list[SampleScore] and must return dict[str, int | float]. A SampleScore has a values field (a dict of score names to score values) and a metadata field (a dict of arbitrary metadata computed by the scorer).

def compute_metrics(scores: "list[SampleScore]") -> dict[str, int | float]:
    ...

Each SampleScore has a values dict containing the score fields produced by the scorer for that sample.

Examples

Example: Geography QA. Computes accuracy only over samples where the is_valid score is True, plus a separate validity rate metric.

...
definition:
  ...
  scorers:
    - type: "python"
      compute_scores_snippet: !include "geography_scorer.py"
      metrics:
        - type: "python"
          compute_metrics_snippet: !include "geography_metric.py"

from __future__ import annotations

from latticeflow.assessment.dtypes import SampleScore


def compute_metrics(scores: list[SampleScore]) -> dict[str, int | float]:
    valid_scores = [s for s in scores if s.values["is_valid"]]
    return {
        "accuracy": sum(s.values["is_correct"] for s in valid_scores) / len(scores),
        "validity": len(valid_scores) / len(scores),
    }

Configuration

Properties

type Literal "python" required

The type of the metric.

compute_metrics_snippet string, TemplateValue required

The Python snippet that defines a compute_metrics(scores: list[SampleScore]) -> dict[str, int \| float] function that computes the metric values. Both def and async def are supported.

key string

Unique identifier assigned to the entity in AI GO!.

Default: None