Evaluations
An evaluation is a set of configured tasks (i.e. task specifications) that are run on a model or a dataset.
Quick Links
Usage
- Define the evaluation in a YAML file
evaluation.yaml(see example). For the sake of this example, we assume that we have previously created a task with key"my-task"that has a single parameter"config_key"that needs to be configured.
evaluation:
key: "my-evaluation"
display_name: "My Evaluation"
task_specifications:
- key: "my-configured-task"
task_key: "my-task"
task_config:
config_key: "config_value"
...
model_key: "my-model-key"
display_name: "Evaluation of my task"
...- Run the evaluation.
$ lf run -f run.yaml
On AI app 'app'.
[Evaluation(ID="1")] Created successfully
[Evaluation(ID="1")] Started successfully.- Check the status of the evaluation.
$ lf evaluation overview 1
Evaluation ID: 1
Status: pending
Created at: 02/12/2025 16:02:40
Tasks results for evaluation with ID '1' (1 rows)
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Task Result ID ┃ Task Name ┃ Task Execution Status ┃ Task Execution Progress ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 1 │ My Task │ pending │ 0.0% (0/10) │
└────────────────┴───────────┴───────────────────────┴─────────────────────────┘- Once the evaluation finished, you can download the evidence of the evaluation.
$ lf evaluation download --results-dir ./results 1Tip: you can control caching via the
cache_policyfield of the evaluation config (see Cache Policy).
