Task Result Logs
A Task Result Log captures the full output of a task execution, including metrics, per-sample evidence, solver traces, and any errors that occurred during the run.
To download a task result log using the CLI, use the lf export eval command.
Task Result Logs Overview
Properties
format_version Literal "v1" required
Format Version
app_version string required
The version of AI GO that computed this task result log.
status string required
Status
evidence TaskResultEvidence required
specification TaskResultSpecification required
execution TaskExecution required
errors array[TaskResultError] required
Errors
Definitions
TaskResultEvidence
TaskResultEvidenceProperties
metrics array[MetricData] required
The metrics. Should only be None if an error occurred.
samples array[SampleEvidence]
The sample evidence (as produced by tasks).
Default:None
tabular_evidence object
A mapping between keys and Dataframes. During validation, non-DataFrame data is accepted as long as it can be validated into a pandas DataFrame.
Default:None
tabular_evidence_metadata object
A mapping betweens keys and metadata for a Dataframe.
Default:None
free_form_evidence object
Non-table-structured evidence can be recorded in this field
Default:None
errors array[TaskResultError]
A list of task-level errors.
Default:[]
failures TaskResultFailures
None
SampleEvidence
SampleEvidenceProperties
sample_id string, integer required
Sample Id
sample SampleData required
solver SolverData required
scores array[ScoresData] required
Scores
action_records null
Action Records
Default:None
errors array[TaskResultError] required
Errors
SolverData
SolverDataProperties
output null required
Output
SampleData
SampleDataThe raw sample data for a single evaluated sample.
Properties
data object required
The sample's field values.
ScoresData
ScoresDataThe score values produced by a single scorer for a single sample.
Properties
scorer_key string required
The key identifying the scorer.
scorer_purpose enum ScorerPurpose
score
Possible ScorerPurpose values
Allowed Values:
scoreqa
scorer_name string
Optional display name of the scorer.
Default:None
values object required
The score values produced by the scorer.
metadata object
Optional metadata associated with the scorer output.
Default:None
direct_ios array[null]
Raw model endpoint I/O for each prediction call (if scorer uses a model).
Default:[]
MetricData
MetricDataAn object that contains the metric scores.
Properties
values object required
Values
metric_key string required
The key of the metric.
metric_type string
The type of the metric. Present for newly computed results and may be missing for legacy results.
Default:None
scorer_name string
The display name of the scorer to which the metric belongs. Present only for benchmark tasks.
Default:None
scorer_key string
The key of the scorer to which the metric belongs. Present only for benchmark tasks.
Default:None
scorer_purpose enum ScorerPurpose
The purpose of the scorer. Present only for benchmark tasks.
Default:score
Possible ScorerPurpose values
Allowed Values:
scoreqa
reason string
A freeform explanation for the metric value. Present only for system tasks.
Default:None
TaskResultError
TaskResultErrorProperties
error_type string required
The type of the error.
message string required
The specific error message that occurred during evaluation.
hint string
The suggestion to try out to fix the issue.
Default:None
stage enum TaskResultErrorStage
None
Possible TaskResultErrorStage values
Allowed Values:
configurationdatasetsolverscoremetricaction
TaskResultFailures
TaskResultFailuresProperties
num_errors integer required
Num Errors
num_total integer required
Num Total
TaskResultSpecification
TaskResultSpecificationThe task specification stored inside a TaskResultLog, capturing what was evaluated and how.
Properties
display_name string required
The display name of the evaluation.
task StoredTask required
config object required
Task configuration used for this evaluation.
evaluated_entity StoredDataset, StoredModel
The dataset or model that was evaluated. Present only for benchmark tasks.
Default:None
run_config null required
TaskExecution
TaskExecutionTiming and resource usage information for a task execution.
Properties
runtime number required
The runtime of the task in seconds.
started_at integer required
A Unix timestamp in seconds.
ended_at integer required
A Unix timestamp in seconds.
model_usage null
None
Trace
TraceRepresents a conversation trace between a user and an agent.
A trace stores a sequence of items in the Open Responses format, including user messages, assistant messages, function calls, and function call outputs. It provides helper methods to extract individual turns, find function calls, and inspect the conversation.
The preamble property exposes everything before the first user
message (system messages, assistant greetings, initial function calls,
etc.). Everything from the first user message onward is accessible via
turns.
For multi-agent traces, an optional events field provides a richer
execution record with span markers encoding agent hierarchy. Use
Trace.from_events() to construct event-based traces; items is
derived automatically.
Properties
FORMAT string
Format
Default:open_responses
items array[null] required
Items
metadata TraceMetadata
None
events null
Events
Default:None
span_id string
Span Id
Default:None
span_name string
Span Name
Default:None
span_type string
Span Type
Default:None
TraceMetadata
TraceMetadataTrace-level metadata capturing identity, provenance, and summary information.
All fields are optional. Only the trace data itself (items/events) is required. Metadata enriches the trace for filtering, grouping, and analysis.
Properties
trace_id string
Trace Id
Default:None
source_type string
Source Type
Default:None
source_uri string
Source Uri
Default:None
agent string
Agent
Default:None
model string
Model
Default:None
tags array[string]
Tags
Default:None
created_at string
Created At
Default:None
total_time number
Total Time
Default:None
total_tokens integer
Total Tokens
Default:None
message_count integer
Message Count
Default:None
error string
Error
Default:None
extra object
Extra
Default:None
TableMetadata
TableMetadataProperties
columns_metadata object
Mapping column names to their corresponding metadata.
