Evaluations

An evaluation is a set of configured tasks (i.e. task specifications) that are run on a model or a dataset.

Quick Links

Usage

  1. Define the evaluation in a YAML file evaluation.yaml (see example). For the sake of this example, we assume that we have previously created a task with key "my-task" that has a single parameter "config_key" that needs to be configured.
evaluation:
  key: "my-evaluation"
  display_name: "My Evaluation"
  task_specifications:
    - key: "my-configured-task"
      task_key: "my-task"
      task_config:
        config_key: "config_value"
        ...
      model_key: "my-model-key"
      display_name: "Evaluation of my task"
    ...
  1. Run the evaluation.
$ lf run -f run.yaml
On AI app 'app'.
[Evaluation(ID="1")] Created successfully
[Evaluation(ID="1")] Started successfully.
  1. Check the status of the evaluation.
$ lf evaluation overview 1
Evaluation ID: 1
Status: pending
Created at: 02/12/2025 16:02:40

             Tasks results for evaluation with ID '1' (1 rows)
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Task Result ID ┃ Task Name ┃ Task Execution Status ┃ Task Execution Progress ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 1              │ My Task   │ pending               │ 0.0% (0/10)             │
└────────────────┴───────────┴───────────────────────┴─────────────────────────┘
  1. Once the evaluation finished, you can download the evidence of the evaluation.
$ lf evaluation download --results-dir ./results 1
📘

Tip: you can control caching via the cache_policy field of the evaluation config (see Cache Policy).