Task Result Logs

A Task Result Log captures the full output of a task execution, including metrics, per-sample evidence, solver traces, and any errors that occurred during the run.

To download a task result log using the CLI, use the lf export eval command.

Task Result Logs Overview

Properties


format_version Literal "v1" required

Format Version


app_version string required

The version of AI GO that computed this task result log.


status string required

Status


evidence TaskResultEvidence required


specification TaskResultSpecification required


execution TaskExecution required


errors array[TaskResultError] required

Errors

Definitions

TaskResultEvidence

Properties


metrics array[MetricData] required

The metrics. Should only be None if an error occurred.


samples array[SampleEvidence]

The sample evidence (as produced by tasks).

Default: None

tabular_evidence object

A mapping between keys and Dataframes. During validation, non-DataFrame data is accepted as long as it can be validated into a pandas DataFrame.

Default: None

tabular_evidence_metadata object

A mapping betweens keys and metadata for a Dataframe.

Default: None

free_form_evidence object

Non-table-structured evidence can be recorded in this field

Default: None

errors array[TaskResultError]

A list of task-level errors.

Default: []

failures TaskResultFailures

Default: None

SampleEvidence

Properties


sample_id string, integer required

Sample Id


sample SampleData required


solver SolverData required


scores array[ScoresData] required

Scores


action_records null

Action Records

Default: None

errors array[TaskResultError] required

Errors

SolverData

Properties


output null required

Output

SampleData

The raw sample data for a single evaluated sample.

Properties


data object required

The sample's field values.

ScoresData

The score values produced by a single scorer for a single sample.

Properties


scorer_key string required

The key identifying the scorer.


scorer_purpose enum ScorerPurpose

Default: score

Possible ScorerPurpose values

Allowed Values:

  • score
  • qa

scorer_name string

Optional display name of the scorer.

Default: None

values object required

The score values produced by the scorer.


metadata object

Optional metadata associated with the scorer output.

Default: None

direct_ios array[null]

Raw model endpoint I/O for each prediction call (if scorer uses a model).

Default: []

MetricData

An object that contains the metric scores.

Properties


values object required

Values


metric_key string required

The key of the metric.


metric_type string

The type of the metric. Present for newly computed results and may be missing for legacy results.

Default: None

scorer_name string

The display name of the scorer to which the metric belongs. Present only for benchmark tasks.

Default: None

scorer_key string

The key of the scorer to which the metric belongs. Present only for benchmark tasks.

Default: None

scorer_purpose enum ScorerPurpose

The purpose of the scorer. Present only for benchmark tasks.

Default: score

Possible ScorerPurpose values

Allowed Values:

  • score
  • qa

reason string

A freeform explanation for the metric value. Present only for system tasks.

Default: None

TaskResultError

Properties


error_type string required

The type of the error.


message string required

The specific error message that occurred during evaluation.


hint string

The suggestion to try out to fix the issue.

Default: None

stage enum TaskResultErrorStage

Default: None

Possible TaskResultErrorStage values

Allowed Values:

  • configuration
  • dataset
  • solver
  • score
  • metric
  • action

TaskResultFailures

Properties


num_errors integer required

Num Errors


num_total integer required

Num Total

TaskResultSpecification

The task specification stored inside a TaskResultLog, capturing what was evaluated and how.

Properties


display_name string required

The display name of the evaluation.


task StoredTask required


config object required

Task configuration used for this evaluation.


evaluated_entity StoredDataset, StoredModel

The dataset or model that was evaluated. Present only for benchmark tasks.

Default: None

run_config null required

TaskExecution

Timing and resource usage information for a task execution.

Properties


runtime number required

The runtime of the task in seconds.


started_at integer required

A Unix timestamp in seconds.


ended_at integer required

A Unix timestamp in seconds.


model_usage null

Default: None

Trace

Represents a conversation trace between a user and an agent.

A trace stores a sequence of items in the Open Responses format, including user messages, assistant messages, function calls, and function call outputs. It provides helper methods to extract individual turns, find function calls, and inspect the conversation.

The preamble property exposes everything before the first user message (system messages, assistant greetings, initial function calls, etc.). Everything from the first user message onward is accessible via turns.

For multi-agent traces, an optional events field provides a richer execution record with span markers encoding agent hierarchy. Use Trace.from_events() to construct event-based traces; items is derived automatically.

Properties


FORMAT string

Format

Default: open_responses

items array[null] required

Items


metadata TraceMetadata

Default: None

events null

Events

Default: None

span_id string

Span Id

Default: None

span_name string

Span Name

Default: None

span_type string

Span Type

Default: None

TraceMetadata

Trace-level metadata capturing identity, provenance, and summary information.

All fields are optional. Only the trace data itself (items/events) is required. Metadata enriches the trace for filtering, grouping, and analysis.

Properties


trace_id string

Trace Id

Default: None

source_type string

Source Type

Default: None

source_uri string

Source Uri

Default: None

agent string

Agent

Default: None

model string

Model

Default: None

tags array[string]

Tags

Default: None

created_at string

Created At

Default: None

total_time number

Total Time

Default: None

total_tokens integer

Total Tokens

Default: None

message_count integer

Message Count

Default: None

error string

Error

Default: None

extra object

Extra

Default: None

TableMetadata

Properties


columns_metadata object

Mapping column names to their corresponding metadata.