Core Types
This page documents the public data types exported from latticeflow.core.dtypes. These are the shared types used across the AI GO! SDK — messages, traces, model inputs and outputs, solver outputs, scoring structures, and the enums and type aliases that tie them together. Every type below can be imported directly from the package:
from latticeflow.core.dtypes import Trace, SampleScore, SolverTrace, ModelResponseEach section lists a type's fields and, where applicable, its public properties and methods. Enums list their allowed values, and the Type Aliases section maps each alias to its members.
Models
ActionRecord
ActionRecordA record of an action taken for a sample.
Properties
action ActionRuleAction required
rule_key string required
The key of the action rule that created the action record.
Pattern: ^[a-zA-Z0-9_\-]+$
Max Length: 250
AssistantMessage
AssistantMessageA message with role assistant.
Properties
type Literal "message"
The type of the message. Always set to message.
Default: message
id string required
The unique ID of the message.
status MessageStatus required
role Literal "assistant"
Role
Default: assistant
content array[InputTextContent, OutputTextContent, TextContent, SummaryTextContent, ReasoningTextContent, RefusalContent, InputImageContent, InputFileContent, InputVideoContent] required
The content of the message
BaseTraceEvent
BaseTraceEventBase class for all trace events, providing common metadata fields.
Properties
id string
Id
span_id string
Span Id
Default: None
timestamp string
Timestamp
Default: None
metadata object
Metadata
Default: None
ChatCompletionInput
ChatCompletionInputProperties
messages array[ChatCompletionInputMessage, ChatCompletionOutputMessage] required
Messages
response_format ChatCompletionResponseFormatJSONSchema, ChatCompletionResponseFormatText
Response Format
Default: None
ChatCompletionInputMessage
ChatCompletionInputMessageProperties
role string required
Role
content string, array[FileContentItem] required
Content
ChatCompletionJSONSchema
ChatCompletionJSONSchemaProperties
name string required
The name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
description string
A description of what the response format is for, used by the model to determine how to respond in the format.
Default: None
schema object
The schema for the response format, described as a JSON Schema object. Learn how to build JSON schemas here.
Default: None
strict boolean required
Whether to enable strict schema adherence when generating the output. If set to true, the model will always follow the exact schema defined in the schema field. Only a subset of JSON Schema is supported when strict is true. To learn more, read the Structured Outputs guide.
ChatCompletionJudgeInput
ChatCompletionJudgeInputProperties
sample object required
Sample
solver_output SingleSolverOutput, GroupedSolverOutput, SolverTrace, GroupedSolverTrace required
Solver Output
messages array[ChatCompletionInputMessage, Any, array[string], ChatCompletionOutputMessage, array[array[number]]] required
Messages
model_output OpenResponsesModelOutput, RAGCompletionOutput, ChatCompletionModelOutput, EmbeddingsModelOutput, Any required
Model Output
input_prompt string required
Input Prompt
ChatCompletionModelOutput
ChatCompletionModelOutputProperties
choices array[ChatCompletionModelOutputChoice] required
Choices
usage ModelUsage
Default: None
ChatCompletionModelOutputChoice
ChatCompletionModelOutputChoiceProperties
message ChatCompletionOutputMessage required
ChatCompletionOutputMessage
ChatCompletionOutputMessageProperties
role string required
Role
content string required
Content
refusal string
Refusal
Default: None
ChatCompletionResponseFormatJSONSchema
ChatCompletionResponseFormatJSONSchemaProperties
type Literal "json_schema" required
The type of response format being defined. Always json_schema.
json_schema ChatCompletionJSONSchema required
Structured Outputs configuration options, including a JSON Schema.
ChatCompletionResponseFormatText
ChatCompletionResponseFormatTextProperties
type Literal "text" required
The type of response format being defined. Always text.
CompactionEvent
CompactionEventRecords a compaction boundary where the conversation context was shortened.
After this event the next ModelCallEvent.input_context will reflect the
compacted context rather than the full history.
Properties
id string
Id
span_id string
Span Id
Default: None
timestamp string
Timestamp
Default: None
metadata object
Metadata
Default: None
type Literal "compaction"
Type
Default: compaction
strategy string required
Strategy
tokens_before integer
Tokens Before
Default: None
tokens_after integer
Tokens After
Default: None
CustomEvent
CustomEventA fallback event type for arbitrary structured data.
Serves as an escape hatch for event types not yet modelled (e.g.
SandboxEvent, ApprovalEvent from inspect-ai), custom solver
instrumentation, and forward compatibility with new external event types.
Properties
id string
Id
span_id string
Span Id
Default: None
timestamp string
Timestamp
Default: None
metadata object
Metadata
Default: None
type Literal "custom"
Type
Default: custom
name string required
Name
data object required
Data
CustomTaskInputMessage
CustomTaskInputMessageA custom task input item carrying an opaque user-defined payload.
Properties
type Literal "custom_task_input_message"
Type
Default: custom_task_input_message
content Any required
The opaque user-defined payload.
CustomTaskOutputMessage
CustomTaskOutputMessageA custom task output item carrying an opaque user-defined payload.
Properties
type Literal "custom_task_output_message"
Type
Default: custom_task_output_message
content Any required
The opaque user-defined payload.
DatasetProgressState
DatasetProgressStateProperties
num_total_samples integer required
Num Total Samples
num_samples_generated integer required
Num Samples Generated
DirectModelIO
DirectModelIORaw model endpoint I/O before/after adapter conversion.
Properties
direct_input ModelEndpointInput required
The raw request sent to the model endpoint.
direct_output ModelEndpointOutput required
The raw response received from the model endpoint.
EmbeddingsModelOutput
EmbeddingsModelOutputProperties
embeddings array[array[number]] required
Embeddings
usage ModelUsage
Default: None
ErrorEvent
ErrorEventRecords an error that occurred during execution.
Properties
id string
Id
span_id string
Span Id
Default: None
timestamp string
Timestamp
Default: None
metadata object
Metadata
Default: None
type Literal "error"
Type
Default: error
message string required
Message
traceback string
Traceback
Default: None
FieldMetadata
FieldMetadataMetadata for a single field in a tabular evidence table.
Properties
display_name string
Display name for the field.
Default: None
description string
Description of the field semantics.
Default: None
primary boolean
Whether the field is directly relevant to the understanding of the main correctness result.
Default: True
FileContentItem
FileContentItemA file content item within an input message.
Properties
type Literal "file" required
Type
file FileRef required
FileRef
FileRefA reference to a file by its identifier.
Properties
file_id string required
The identifier of the file.
FunctionCall
FunctionCallA function tool call that was generated by the model.
Properties
type Literal "function_call"
The type of the item. Always function_call.
Default: function_call
id string required
The unique ID of the function call item.
call_id string required
The unique ID of the function tool call that was generated.
name string required
The name of the function that was called.
arguments string required
The arguments JSON string that was generated.
status FunctionCallStatus required
FunctionCallEvent
FunctionCallEventRecords a tool call lifecycle as a single event.
Absorbs what was previously two items (FunctionCall + FunctionCallOutput)
into one event with execution metadata.
Properties
id string
Id
span_id string
Span Id
Default: None
timestamp string
Timestamp
Default: None
metadata object
Metadata
Default: None
type Literal "function_call_event"
Type
Default: function_call_event
call_id string required
Call Id
function string required
Function
arguments string required
Arguments
result string, array[InputTextContent, InputImageContent, InputFileContent] required
Result
status FunctionCallStatus required
working_time number
Working Time
Default: None
error string
Error
Default: None
agent string
Agent
Default: None
agent_span_id string
Agent Span Id
Default: None
model_call_id string
Model Call Id
Default: None
FunctionCallOutput
FunctionCallOutputA function tool call output that was returned by the tool.
Properties
type Literal "function_call_output"
The type of the function tool call output. Always function_call_output.
Default: function_call_output
id string required
The unique ID of the function tool call output. Populated when this item is returned via API.
call_id string required
The unique ID of the function tool call generated by the model.
output string, array[InputTextContent, InputImageContent, InputFileContent] required
Output
status FunctionCallOutputStatusEnum required
GroupedSolverOutput
GroupedSolverOutputProperties
solver_outputs array[SingleSolverOutput], object required
Solver Outputs
GroupedSolverTrace
GroupedSolverTraceGrouped solver output using Open Responses types.
Produced when GroupedSingleTurnSolver runs with message_format="open_responses".
Each element of solver_outputs corresponds to one sub-call made during solving.
Properties
solver_outputs array[SolverTrace], object required
Solver Outputs
InputFileContent
InputFileContentA file input to the model.
Properties
type Literal "input_file"
The type of the input item. Always input_file.
Default: input_file
filename string
The name of the file to be sent to the model.
Default: None
file_url string
The URL of the file to be sent to the model.
Default: None
InputImageContent
InputImageContentAn image input to the model. Learn about image inputs.
Properties
type Literal "input_image"
The type of the input item. Always input_image.
Default: input_image
image_url string
Image Url
Default: None
detail ImageDetail required
InputTextContent
InputTextContentA text input to the model.
Properties
type Literal "input_text"
The type of the input item. Always input_text.
Default: input_text
text string required
The text input to the model.
JudgeInput
JudgeInputProperties
sample object required
Sample
LFBaseModel
LFBaseModelA BaseModel which excludes unset fields by default when serialising.
No properties defined.
LogProb
LogProbThe log probability of a token.
Properties
token string required
Token
logprob number required
Logprob
bytes array[integer] required
Bytes
top_logprobs array[TopLogProb] required
Top Logprobs
Message
MessageA message to or from the model.
Properties
type Literal "message"
The type of the message. Always set to message.
Default: message
id string required
The unique ID of the message.
status MessageStatus required
role MessageRole required
content array[InputTextContent, OutputTextContent, TextContent, SummaryTextContent, ReasoningTextContent, RefusalContent, InputImageContent, InputFileContent, InputVideoContent] required
The content of the message
MessageEvent
MessageEventRecords that a conversation item was added to the trace.
This is the incremental conversation record — each message (user, assistant,
system) gets its own event.
Properties
id string
Id
span_id string
Span Id
Default: None
timestamp string
Timestamp
Default: None
metadata object
Metadata
Default: None
type Literal "message_event"
Type
Default: message_event
item Message, CustomTaskInputMessage, CustomTaskOutputMessage required
Item
model_call_id string
Model Call Id
Default: None
MetricData
MetricDataAn object that contains the metric scores.
Properties
values object required
Values
metric_key string required
The key of the metric.
metric_type string
The type of the metric. Present for newly computed results and may be missing for legacy results.
Default: None
scorer_name string
The display name of the scorer to which the metric belongs. Present only for benchmark tasks.
Default: None
scorer_key string
The key of the scorer to which the metric belongs. Present only for benchmark tasks.
Default: None
scorer_purpose ScorerPurpose
The purpose of the scorer. Present only for benchmark tasks.
Default: score
reason string
A freeform explanation for the metric value. Present only for system tasks.
Default: None
ModelCallEvent
ModelCallEventRecords a model API call with the full input context, output, and metadata.
input_context captures the actual context window at each model call.
Properties
id string
Id
span_id string
Span Id
Default: None
timestamp string
Timestamp
Default: None
metadata object
Metadata
Default: None
type Literal "model_call_event"
Type
Default: model_call_event
model string required
Model
input_context array[Message, FunctionCall, FunctionCallOutput, CustomTaskInputMessage, CustomTaskOutputMessage] required
Input Context
output_items array[Message, FunctionCall, FunctionCallOutput, CustomTaskInputMessage, CustomTaskOutputMessage] required
Output Items
usage ModelUsage
Default: None
tools array[string]
Tools
Default: None
total_time number
Total Time
Default: None
error string
Error
Default: None
ModelEndpointInput
ModelEndpointInputThe raw request body sent to the model endpoint after adapter conversion.
Properties
body string required
The raw request body as a JSON string.
ModelEndpointOutput
ModelEndpointOutputThe raw response received from the model endpoint before adapter conversion.
Properties
body string required
The raw response body as a JSON string.
status_code integer required
The HTTP status code of the response.
headers object
The HTTP response headers.
Default: None
ModelResponse
ModelResponseA useful container for a model response that contains the raw model output
as well as the trace items derived from that model output.
Properties
raw_output OpenResponsesModelOutput, RAGCompletionOutput, ChatCompletionModelOutput, EmbeddingsModelOutput, Any required
Raw Output
items array[Message, FunctionCall, FunctionCallOutput, CustomTaskInputMessage, CustomTaskOutputMessage] required
Items
Computed Properties
text str
Text content of the last assistant message.
Raises:
ValueError: If no assistant message is present in items.
ModelUsage
ModelUsageToken usage statistics for a model inference call.
Properties
num_completion_tokens integer
The number of completion tokens used.
Default: None
num_prompt_tokens integer
The number of prompt tokens used.
Default: None
OpenResponsesModelOutput
OpenResponsesModelOutputModel output in Open Responses format.
Used when the model returns a response object containing
a sequence of OpenResponse output items (assistant messages, function calls,
function call outputs, etc.) from a single model.predict() call.
Properties
items array[Message, FunctionCall, FunctionCallOutput, CustomTaskInputMessage, CustomTaskOutputMessage] required
Items
usage ModelUsage
Default: None
OutputTextContent
OutputTextContentA text output from the model.
Properties
type Literal "output_text"
The type of the output text. Always output_text.
Default: output_text
text string required
The text output from the model.
annotations array[UrlCitationBody, TextCitationBody] required
The annotations of the text output.
logprobs array[LogProb]
Logprobs
Default: None
PolicyRuleMetricsInput
PolicyRuleMetricsInputProperties
evaluation_key string required
Evaluation Key
evaluation_id string required
Evaluation Id
task_result_id string required
Task Result Id
task_specification_key string required
Task Specification Key
task_specification_display_name string required
Task Specification Display Name
scorer_key string required
Scorer Key
metric_key string required
Metric Key
values object required
Values
RAGCompletionModelOutputChoice
RAGCompletionModelOutputChoiceProperties
message ChatCompletionOutputMessage required
references array[RAGReference] required
References
RAGCompletionOutput
RAGCompletionOutputProperties
choices array[RAGCompletionModelOutputChoice] required
Choices
usage ModelUsage
Default: None
RAGReference
RAGReferenceA reference retrieved from the knowledge base during RAG inference.
Properties
content string required
The text content of the reference.
ReasoningTextContent
ReasoningTextContentReasoning text from the model.
Properties
type Literal "reasoning_text"
The type of the reasoning text. Always reasoning_text.
Default: reasoning_text
text string required
The reasoning text from the model.
RefusalContent
RefusalContentA refusal from the model.
Properties
type Literal "refusal"
The type of the refusal. Always refusal.
Default: refusal
refusal string required
The refusal explanation from the model.
RunEvidence
RunEvidenceProperties
index integer required
Index
metrics array[MetricData] required
The metrics. If an error occurred, the metrics will be None.
samples array[SampleEvidence]
The sample evidence (as produced by the given run of a task).
Default: None
errors array[TaskResultError]
A list of task-level errors.
Default: []
failures TaskResultFailures
Default: None
SampleData
SampleDataThe raw sample data for a single evaluated sample.
Properties
data object required
The sample's field values.
SampleEvidence
SampleEvidenceProperties
sample_id string, integer required
Sample Id
sample SampleData required
Sample data. Only present for legacy evidence or when computing repeatability, otherwise present in the trials.
solver SolverData required
Solver data. Only present for legacy evidence, otherwise present in the trials.
scores array[ScoresData] required
Scores data. Only present for legacy evidence, or when score aggregation occurred (such as when using multiple trials with score aggregation or when assessing repeatability).
action_records array[ActionRecord]
Action Records
Default: None
errors array[TaskResultError] required
Errors
trials array[SampleTrialEvidence]
Trials
Methods
build classmethod
build(model_input: LFModelInput | None, model_output: LFModelOutput | None, score_values: LFBaseModel | ScoreValues, score_metadata: LFBaseModel | dict[str, Any] | None = None, sample_id: int | str | None = None, sample_data: dict[str, Any] | None = None, scorer_key: str | None = None, solver_model_direct_input: ModelEndpointInput | None = None, solver_model_direct_output: ModelEndpointOutput | None = None, scorer_model_direct_input: ModelEndpointInput | None = None, scorer_model_direct_output: ModelEndpointOutput | None = None, message_format: TraceFormat = 'open_responses') -> SampleEvidencebuild_with_1_trial classmethod
build_with_1_trial(*, sample_id: str | int, sample: SampleData | None, solver: SolverData | None, scores: list[ScoresData] | None, errors: list[TaskResultError]) -> SampleEvidenceSampleScore
SampleScoreProperties
values object required
Values
metadata object
Metadata
Default: None
direct_ios array[DirectModelIO]
Direct Ios
Default: []
SampleTrialEvidence
SampleTrialEvidenceProperties
index integer required
Index
sample_id string, integer required
Sample Id
sample SampleData required
solver SolverData required
scores array[ScoresData] required
Scores
errors array[TaskResultError] required
Errors
ScoresData
ScoresDataThe score values produced by a single scorer for a single sample.
Properties
scorer_key string required
The key identifying the scorer.
scorer_purpose ScorerPurpose
Default: score
scorer_name string
Optional display name of the scorer.
Default: None
values object required
The score values produced by the scorer.
metadata object
Optional metadata associated with the scorer output.
Default: None
direct_ios array[DirectModelIO]
Raw model endpoint I/O for each prediction call (if scorer uses a model).
Default: []
Secret
SecretAn object representing a secret.
Properties
name string required
The name of the secret.
Pattern: ^[A-Za-z0-9_-]+$
SingleSolverJudgeInput
SingleSolverJudgeInputProperties
sample object required
Sample
solver_output SingleSolverOutput, GroupedSolverOutput, SolverTrace, GroupedSolverTrace required
Solver Output
messages array[ChatCompletionInputMessage, Any, array[string], ChatCompletionOutputMessage, array[array[number]]] required
Messages
model_output OpenResponsesModelOutput, RAGCompletionOutput, ChatCompletionModelOutput, EmbeddingsModelOutput, Any required
Model Output
SingleSolverOutput
SingleSolverOutputProperties
messages array[ChatCompletionInputMessage, Any, array[string], ChatCompletionOutputMessage, array[array[number]]] required
Messages
output OpenResponsesModelOutput, RAGCompletionOutput, ChatCompletionModelOutput, EmbeddingsModelOutput, Any required
Output
direct_ios array[DirectModelIO]
Direct Ios
Default: []
Methods
from_input_and_output classmethod
from_input_and_output(model_input: LFModelInput, model_output: LFModelOutput, direct_model_input: ModelEndpointInput | None = None, direct_model_output: ModelEndpointOutput | None = None) -> SingleSolverOutputSolverData
SolverDataProperties
output SingleSolverOutput, GroupedSolverOutput, SolverTrace, GroupedSolverTrace required
Output
SolverJudgeInput
SolverJudgeInputProperties
sample object required
Sample
solver_output SingleSolverOutput, GroupedSolverOutput, SolverTrace, GroupedSolverTrace required
Solver Output
SolverTrace
SolverTraceSolver output using Open Responses types with structured trace.
Produced when the solver's message_format is "open_responses".
Wraps a full :class:Trace and preserves the raw model outputs for
each model.predict() call made during solving.
Properties
trace Trace required
raw_outputs array[OpenResponsesModelOutput, RAGCompletionOutput, ChatCompletionModelOutput, EmbeddingsModelOutput, Any] required
Raw Outputs
direct_ios array[DirectModelIO]
Direct Ios
Default: []
Computed Properties
items list[TraceItem]
Shorthand for self.trace.items.
messages list[LFMessage]
Legacy SingleSolverOutput-style view of the trace as LFMessages.
output LFModelOutput | None
Legacy SingleSolverOutput-style view of the last raw output.
Methods
add_model_response method
add_model_response(response: ModelResponse) -> NoneRecord a model response: extend the trace items and append the raw output.
append method
append(item: TraceItem) -> NoneAppend a single item to the trace.
append_custom_task_input_message method
append_custom_task_input_message(content: Any) -> NoneAppend a custom task input item with an opaque payload.
append_system_message method
append_system_message(content: str) -> NoneAppend a system message with plain text content.
append_user_message method
append_user_message(content: str) -> NoneAppend a user message with plain text content.
extend method
extend(items: list[TraceItem]) -> NoneExtend the trace with a list of items.
SolverTraceJudgeInput
SolverTraceJudgeInputProperties
sample object required
Sample
trace Trace required
model_outputs array[OpenResponsesModelOutput, RAGCompletionOutput, ChatCompletionModelOutput, EmbeddingsModelOutput, Any] required
Model Outputs
solver_output SingleSolverOutput, GroupedSolverOutput, SolverTrace, GroupedSolverTrace
Solver Output
Default: None
messages array[ChatCompletionInputMessage, Any, array[string], ChatCompletionOutputMessage, array[array[number]]]
Messages
Default: None
model_output OpenResponsesModelOutput, RAGCompletionOutput, ChatCompletionModelOutput, EmbeddingsModelOutput, Any
Model Output
Default: None
input_prompt string
Input Prompt
Default: None
SpanBeginEvent
SpanBeginEventMarks the beginning of a named execution span.
Spans define hierarchical boundaries for agents, tools, and other execution
phases.
Properties
id string
Id
span_id string required
Span Id
timestamp string
Timestamp
Default: None
metadata object
Metadata
Default: None
type Literal "span_begin"
Type
Default: span_begin
parent_span_id string
Parent Span Id
Default: None
name string required
Name
span_type string
Span Type
Default: None
SpanEndEvent
SpanEndEventMarks the end of a named execution span.
Properties
id string
Id
span_id string required
Span Id
timestamp string
Timestamp
Default: None
metadata object
Metadata
Default: None
type Literal "span_end"
Type
Default: span_end
SummaryTextContent
SummaryTextContentA summary text from the model.
Properties
type Literal "summary_text"
The type of the object. Always summary_text.
Default: summary_text
text string required
A summary of the reasoning output from the model so far.
SystemMessage
SystemMessageA message with role system.
Properties
type Literal "message"
The type of the message. Always set to message.
Default: message
id string required
The unique ID of the message.
status MessageStatus required
role Literal "system"
Role
Default: system
content array[InputTextContent, OutputTextContent, TextContent, SummaryTextContent, ReasoningTextContent, RefusalContent, InputImageContent, InputFileContent, InputVideoContent] required
The content of the message
SystemTaskMetricEntry
SystemTaskMetricEntryA single metric entry produced by a system task's compute_evidence function.
Properties
value number, integer required
The numeric value of the metric.
reason string
A freeform explanation for the metric value. Mapped to the reason field in MetricData.
Default: None
SystemTaskOutput
SystemTaskOutputThe expected return value of the compute_evidence function in a system task snippet.
Properties
metrics object required
A mapping from metric key to the metric entry.
TLSContext
TLSContextDefines the TLS context.
Properties
validation_context CertificateValidationContext
Settings for validating server certificates.
Default: None
TaskExecution
TaskExecutionTiming and resource usage information for a task execution.
Properties
runtime number required
The runtime of the task in seconds.
started_at integer required
A Unix timestamp in seconds.
ended_at integer required
A Unix timestamp in seconds.
model_usage ModelUsageStats
Default: None
TaskProgressState
TaskProgressStateProperties
num_total_samples integer required
Num Total Samples
num_processed_samples integer required
Num Processed Samples
num_samples_with_errors integer required
Num Samples With Errors
TaskResultError
TaskResultErrorProperties
error_type string required
The type of the error.
message string required
The specific error message that occurred during evaluation.
hint string
The suggestion to try out to fix the issue.
Default: None
stage TaskResultErrorStage
Default: None
TaskResultEvidence
TaskResultEvidenceProperties
metrics array[MetricData] required
The metrics. If an error occurred, the metrics will be None.
samples array[SampleEvidence]
The sample evidence (as produced by tasks).
Default: None
runs array[RunEvidence]
Per-run evidence for repeatability task results. None when repeatability is not assessed.
Default: None
errors array[TaskResultError]
A list of task-level errors.
Default: []
failures TaskResultFailures
Default: None
Methods
adapt_metrics_if_needed classmethod
adapt_metrics_if_needed(value: Any) -> Anybuild_flat_metrics_dict method
build_flat_metrics_dict() -> MetricValuesTaskResultFailures
TaskResultFailuresProperties
num_errors integer required
Num Errors
num_total integer required
Num Total
TaskResultLog
TaskResultLogProperties
format_version Literal "v1" required
Format Version
app_version string required
The version of AI GO that computed this task result log.
status string required
Status
evidence TaskResultEvidence required
specification TaskResultSpecification required
execution TaskExecution required
errors array[TaskResultError] required
Errors
TaskResultSpecification
TaskResultSpecificationThe task specification stored inside a TaskResultLog, capturing what was evaluated and how.
Properties
display_name string required
The display name of the evaluation.
task StoredTask required
config object required
Task configuration used for this evaluation.
evaluated_entity StoredDataset, StoredModel
The dataset or model that was evaluated. Present only for benchmark tasks.
Default: None
run_config EvaluationConfig required
repeatability_config RepeatabilityConfig
Default: None
TextCitationBody
TextCitationBodyA citation referencing a plain-text source (e.g. a retrieved knowledge-base chunk).
Properties
type Literal "text_citation"
The type of the text citation. Always text_citation.
Default: text_citation
content string required
The text content of the cited source.
TopLogProb
TopLogProbThe top log probability of a token.
Properties
token string required
Token
logprob number required
Logprob
bytes array[integer] required
Bytes
Trace
TraceRepresents a conversation trace between a user and an agent.
A trace stores a sequence of items in the Open Responses format,
including user messages, assistant messages, function calls, and
function call outputs. It provides helper methods to extract
individual turns, find function calls, and inspect the conversation.
The preamble property exposes everything before the first user
message (system messages, assistant greetings, initial function calls,
etc.). Everything from the first user message onward is accessible via
turns.
For multi-agent traces, an optional events field provides a richer
execution record with span markers encoding agent hierarchy. Use
Trace.from_events() to construct event-based traces; items is
derived automatically.
Properties
FORMAT string
Format
Default: open_responses
items array[Message, FunctionCall, FunctionCallOutput, CustomTaskInputMessage, CustomTaskOutputMessage] required
Items
metadata TraceMetadata
Default: None
events array[MessageEvent, FunctionCallEvent, ModelCallEvent, SpanBeginEvent, SpanEndEvent, CompactionEvent, ErrorEvent, CustomEvent]
Events
Default: None
span_id string
Span Id
Default: None
span_name string
Span Name
Default: None
span_type string
Span Type
Default: None
Computed Properties
assistant_messages list[AssistantMessage]
Return all assistant messages across the trace (excluding preamble).
conversation_items list[TraceItem]
Return items from the first user message onward (excludes preamble).
function_calls list[FunctionCall]
Return all function calls across the trace (excluding preamble).
function_outputs list[FunctionCallOutput]
Return all function call outputs across the trace (excluding preamble).
preamble list[TraceItem]
Return all items before the first user message.
system_messages list[SystemMessage]
Return all system messages in the preamble in order.
turns list[Turn]
Extract individual conversation turns.
user_messages list[UserMessage]
Return all user messages across the entire trace.
Methods
from_events classmethod
from_events(events: list[TraceEvent], *, span_id: str | None = None, **kwargs: Any) -> TraceConstruct a Trace from an event stream.
span_id identifies which span the Trace represents. Defaults to
None for a root Trace; pass a span id when constructing a Trace for
a specific span.
The item-derivation strategy is picked from the event stream itself:
- If any
MessageEventis present, items come from direct-span
MessageEventandFunctionCallEventvalues (native LF traces). - Otherwise, items are reconstructed from direct-span
ModelCallEvent.
This path handles event streams imported from inspect-ai-style event streams
that don't produceMessageEvents.
from_items classmethod
from_items(items: list[TraceItem], **kwargs: Any) -> TraceConstruct a Trace from conversation items only (no events).
Use this for simple or legacy single-agent traces where no execution
metadata is needed.
get_first_system_prompt method
get_first_system_prompt() -> str | NoneReturn the text of the first system message, if any.
get_function_call_arguments method
get_function_call_arguments(call: FunctionCall) -> dictParse and return the JSON arguments of a function call.
get_function_call_pairs method
get_function_call_pairs() -> list[tuple[FunctionCall, FunctionCallOutput | None]]Return all (function_call, function_output) pairs matched by call_id.
get_function_calls_by_name method
get_function_calls_by_name(name: str) -> list[FunctionCall]Return all function calls with the given function name.
get_function_output_for_call method
get_function_output_for_call(call_id: str) -> FunctionCallOutput | NoneReturn the function output matching a given call_id, if any.
get_function_output_text method
get_function_output_text(output: FunctionCallOutput) -> strExtract the text content from a function call output.
get_last_assistant_text method
get_last_assistant_text() -> str | NoneReturn the text content of the last assistant message, if any.
get_last_user_text method
get_last_user_text() -> str | NoneReturn the text of the last user message, if any.
spans method
spans() -> list[Trace]Return immediate child spans as Trace objects, or an empty list if there
are no child spins.
Each child span is a Trace with its own items, events, and span metadata.
Spans are returned in chronological order (matching the event stream order).
Call .spans() on a child recursively to get sub-sub-agent spans.
TraceMetadata
TraceMetadataTrace-level metadata capturing identity, provenance, and summary information.
All fields are optional. Only the trace data itself (items/events) is required.
Metadata enriches the trace for filtering, grouping, and analysis.
Properties
trace_id string
Trace Id
Default: None
source_type string
Source Type
Default: None
source_uri string
Source Uri
Default: None
agent string
Agent
Default: None
model string
Model
Default: None
tags array[string]
Tags
Default: None
created_at string
Created At
Default: None
total_time number
Total Time
Default: None
total_tokens integer
Total Tokens
Default: None
message_count integer
Message Count
Default: None
error string
Error
Default: None
extra object
Extra
Default: None
Turn
TurnA single conversational turn initiated by a user message.
A turn starts with a user message and includes all subsequent items
until the next user message (or end of trace). This typically includes:
- The user message itself
- Zero or more assistant actions (function calls, function outputs,
assistant messages) that form the response to the user message.
Properties
user_message UserMessage required
assistant_items array[Message, FunctionCall, FunctionCallOutput, CustomTaskInputMessage, CustomTaskOutputMessage] required
Assistant Items
Computed Properties
assistant_messages list[AssistantMessage]
Return all assistant messages in this turn in order.
function_call_pairs list[tuple[FunctionCall, FunctionCallOutput | None]]
Return pairs of (function_call, function_output) matched by call_id.
function_calls list[FunctionCall]
Return all function calls in this turn in order.
function_outputs list[FunctionCallOutput]
Return all function call outputs in this turn in order.
UrlCitationBody
UrlCitationBodyA citation for a web resource used to generate a model response.
Properties
type Literal "url_citation"
The type of the URL citation. Always url_citation.
Default: url_citation
url string required
The URL of the web resource.
start_index integer required
The index of the first character of the URL citation in the message.
end_index integer required
The index of the last character of the URL citation in the message.
title string required
The title of the web resource.
UserMessage
UserMessageA message with role user.
Properties
type Literal "message"
The type of the message. Always set to message.
Default: message
id string required
The unique ID of the message.
status MessageStatus required
role Literal "user"
Role
Default: user
content array[InputTextContent, OutputTextContent, TextContent, SummaryTextContent, ReasoningTextContent, RefusalContent, InputImageContent, InputFileContent, InputVideoContent] required
The content of the message
Enums
FunctionCallOutputStatusEnum
FunctionCallOutputStatusEnumSimilar to FunctionCallStatus. All three options are allowed here for compatibility, but because in practice these items will be provided by developers, only completed should be used.
Allowed Values:
in_progresscompletedincomplete
FunctionCallStatus
FunctionCallStatusAllowed Values:
in_progresscompletedincomplete
MessageRole
MessageRoleAllowed Values:
userassistantsystemdeveloper
MessageStatus
MessageStatusAllowed Values:
in_progresscompletedincomplete
TaskResultDataStatus
TaskResultDataStatusThe execution status of a task result.
Allowed Values:
pendingcancelledsuccessfailed
ChatCompletionResponseFormat = ChatCompletionResponseFormatJSONSchema | ChatCompletionResponseFormatText
ConversationItem = Message | CustomTaskInputMessage | CustomTaskOutputMessage
DType = Type | Tuple
InputMessageContent = str | List
JSONType = NoneType | int | str | bool | float | List | Mapping
LFInputMessage = ChatCompletionInputMessage | Any | list
LFMessage = ChatCompletionInputMessage | Any | list | ChatCompletionOutputMessage | list
LFModelInput = ChatCompletionInput | list | Any
LFModelOutput = OpenResponsesModelOutput | RAGCompletionOutput | ChatCompletionModelOutput | EmbeddingsModelOutput | Any
LFOutputMessage = ChatCompletionOutputMessage | Any | list
ResultDType = pd.DataFrame | NoneType | int | str | bool | float | List | Mapping | BaseModel
RuleDefinition = ExistsRuleDefinition | ThresholdRuleDefinition
RuleScope = PolicyRuleSimpleScope | PolicyRuleFinegrainedScope
TraceEvent = MessageEvent | FunctionCallEvent | ModelCallEvent | SpanBeginEvent | SpanEndEvent | CompactionEvent | ErrorEvent | CustomEvent
TraceItem = Message | FunctionCall | FunctionCallOutput | CustomTaskInputMessage | CustomTaskOutputMessage
Supporting Types
ActionRule
ActionRuleProperties
key string required
Key: 1-250 chars, allowed: a-z A-Z 0-9 _ -
Pattern: ^[a-zA-Z0-9_\-]+$
Max Length: 250
action ActionRuleAction required
The action to be applied to samples that match the filter.
filter FilterComparison, FilterMembership, FilterUnary required
The filter that determines which samples the action applies to.
ActionRuleAction
ActionRuleActionAllowed Values:
exclude_from_metrics
BenchmarkTaskDefinitionTemplate
BenchmarkTaskDefinitionTemplateProperties
type Literal "benchmark_task" required
The type of task definition.
evaluated_entity_type EvaluatedEntityType required
dataset TaskDatasetTemplate
The dataset used by this task
Default: None
solver TaskSolverTemplate
The solver used by this task
Default: None
scorers array[TaskScorerTemplate] required
The scorers used by this task
trials TrialsDefinitionTemplate
Default: None
actions array[ActionRule]
The actions used by this task
Default: None
BooleanParameterSpec
BooleanParameterSpecProperties
type Literal "boolean" required
The type of the parameter.
key string required
The key of the parameter.
display_name string required
The display name of the parameter.
description string
The description of the parameter.
Default: None
default_value boolean
The default value to use.
Default: None
nullable boolean
Whether this parameter is nullable.
Default: False
CachePolicy
CachePolicyThe caching policy to use for the task results in the evaluation. Supported values:
- reuse - Use a cached task result if one is available (the default). Partial task
results are also reused automatically - if a task is the same as another, completed
task for all of its configuration except the scorers configuration, then only the
scores, metrics and errors and failures related to them will be recomputed. This saves
queries to the model during the solver part of the evaluation. - update - Do not use cached task results, but cache the results of the execution.
- no-cache - Do not use cached task results and do not cache the results of the execution.
Allowed Values:
reuseupdateno-cache
CategoricalParameterSpec
CategoricalParameterSpecProperties
type Literal "categorical" required
The type of parameter.
key string required
The key of the parameter.
display_name string required
The display name of the parameter.
description string
The description of the parameter.
Default: None
allowed_values array[string] required
Allowed Values
multiple boolean
Whether the parameter can have multiple values.
Default: False
default_value string
The default value to use.
Default: None
nullable boolean
Whether this parameter is nullable.
Default: False
values_mapping object
A mapping over the categorical values.
Default: None
CertificateValidationContext
CertificateValidationContextDefines how server certificates should be validated.
Properties
trusted_ca string, Secret
base64 representation of PEM-encoded certificate(s).
Provide a raw base64 string or reference a secret.
For example: cat cert.pem \| base64 -w 0
Default: None
trust_chain_verification TrustChainVerification
Settings for verifying the trust chain of the server certificate.
Default: None
ConfigurationDatasetGenerationError
ConfigurationDatasetGenerationErrorProperties
stage Literal "configuration" required
Stage
error_type string required
The type of the error.
message string required
The specific error message that occurred during generation.
CustomInferenceModelConfig
CustomInferenceModelConfigClient configuration for a model, that is provided manually by the user.
Properties
adapter_id string required
The ID of the model adapter to be used with this model.
connection_type Literal "custom_inference" required
The type of connection config.
run_inference_snippet string required
The code snippet to make a call to the model.
environment object required
Environment variables required to run the model client snippet. Values may reference secrets.
timeout number required
Timeout in seconds for the total runtime of the Python snippet.
DataSourceDatasetGenerationError
DataSourceDatasetGenerationErrorProperties
stage Literal "data_source" required
Stage
error_type string required
The type of the error.
message string required
The specific error message that occurred during generation.
iteration integer required
The iteration number of the data source generation that caused the error.
DatasetColumnParameterSpec
DatasetColumnParameterSpecProperties
type Literal "dataset_column" required
The type of the parameter.
key string required
The key of the parameter.
display_name string required
The display name of the parameter.
description string
The description of the parameter.
Default: None
default_value string
The default value to use.
Default: None
nullable boolean
Whether this parameter is nullable.
Default: False
DatasetGenerationDebugOptions
DatasetGenerationDebugOptionsProperties
enabled boolean required
When true, the response will include a full pipeline trace for each source sample, which contains the source sample itself and the input and output at each synthesizer stage.
include_io boolean required
When true, the model input and output are included in the trace for each synthesizer call that produced I/O. Has no effect when enabled is false.
DatasetGenerationMetadata
DatasetGenerationMetadataDataset generation metadata.
Properties
dataset_generator_id string
Dataset Generator Id
Default: None
execution_status ExecutionStatus required
dataset_generation_id string required
The dataset generation ID.
dataset_generation_request DatasetGenerationRequest required
The dataset generation request.
progress ExecutionProgress
Default: None
result_status ResultStatus
Default: None
errors array[ConfigurationDatasetGenerationError, SynthesizerDatasetGenerationError, DataSourceDatasetGenerationError]
List of errors that occurred during dataset generation.
Default: None
DatasetGenerationRequest
DatasetGenerationRequestProperties
dataset_generator_config object required
The configuration used by the dataset generator.
num_samples integer required
The number of samples to generate. At least 1 sample must be requested.
debug DatasetGenerationDebugOptions
Default: None
DatasetMetadata
DatasetMetadataDataset metadata.
Properties
num_rows integer required
Num Rows
columns array[string] required
Columns
download_url string required
URL to download the dataset in JSONL format.
data_version string required
Data Version
DatasetParameterSpec
DatasetParameterSpecProperties
type Literal "dataset" required
The type of the parameter.
key string required
The key of the parameter.
display_name string required
The display name of the parameter.
description string
The description of the parameter.
Default: None
default_value string
The default value to use.
Default: None
nullable boolean
Whether this parameter is nullable.
Default: False
DictParameterSpec
DictParameterSpecProperties
type Literal "dict" required
The type of the parameter.
value_dtype ScalarDtype required
The data type of the values in the dict.
key string required
The key of the parameter.
display_name string required
The display name of the parameter.
description string
The description of the parameter.
Default: None
default_value object
The default value to use.
Default: None
nullable boolean
Whether this parameter is nullable.
Default: False
EvaluatedEntityType
EvaluatedEntityTypeAllowed Values:
datasetmodel
EvaluationConfig
EvaluationConfigParameters required when starting an evaluation.
Properties
num_samples integer
The number of samples to evaluate. If not specified, all samples will be evaluated.
Default: None
subsampling Subsampling
Default: None
cache_policy CachePolicy
The caching policy to use for the task results in the evaluation. Supported values:
- reuse - Use a cached task result if one is available (the default). Partial task
results are also reused automatically - if a task is the same as another, completed
task for all of its configuration except the scorers configuration, then only the
scores, metrics and errors and failures related to them will be recomputed. This saves
queries to the model during the solver part of the evaluation. - update - Do not use cached task results, but cache the results of the execution.
- no-cache - Do not use cached task results and do not cache the results of the execution.
Default: reuse
trials_config TrialsConfig
Default: None
ExecutionProgress
ExecutionProgressProperties
progress number required
A progress indicator for the task result.
num_total_samples integer
The total number of samples to be processed for this task result.
Default: None
num_processed_samples integer
The number of samples already processed for this task result.
Default: None
num_samples_with_errors integer
The number of samples for which an error occurred for this task result.
Default: None
ExecutionStatus
ExecutionStatusAllowed Values:
not_startedpendingcancelledfinished
FilterComparison
FilterComparisonProperties
op FilterComparisonOp required
expression string required
An expression encoding what to compare against the value.
Depending on the context, it can refer to different variables:
- When filtering a dataset: it can refer to the
sampleand use dot or bracket
notation to access the columns.
If filtering a dataset with column names that are illegal under jinja
substitution rules (e.g. containing spaces), use bracket notation to access
the column. - When used within a task action: it can refer to the
sample, thesolver_outputor thescores(which is a mapping between scorer
keys and their corresponding score values dict).
value string, number, integer, boolean required
The value against which the expression is compared.
FilterComparisonOp
FilterComparisonOpThe comparison operator to apply.
Allowed Values:
equalsnot_equalsgreater_thanless_thangreater_or_equalless_or_equal
FilterMembership
FilterMembershipProperties
op FilterMembershipOp required
expression string required
An expression encoding what to check membership against the values.
Depending on the context, it can refer to different variables:
- When filtering a dataset: it can refer to column values by name (ex:
{{ category }}). - When used within a task action: it can refer to the
sample, thesolver_outputor thescores(which is a mapping between scorer
keys and their corresponding score values dict).
values array[string, number, boolean] required
The set of values to test membership against.
FilterMembershipOp
FilterMembershipOpThe membership operator to apply.
Allowed Values:
innot_in
FilterUnary
FilterUnaryProperties
op FilterUnaryOp required
expression string required
An expression encoding what to apply the unary operator to.
Depending on the context, it can refer to different variables:
- When filtering a dataset: it can refer to column values by name (ex:
{{ category }}). - When used within a task action: it can refer to the
sample, thesolver_outputor thescores(which is a mapping between scorer
keys and their corresponding score values dict).
FilterUnaryOp
FilterUnaryOpThe unary operator to apply.
Allowed Values:
existsnot_existsis_trueis_false
FloatParameterSpec
FloatParameterSpecProperties
type Literal "float" required
The type of the parameter.
key string required
The key of the parameter.
display_name string required
The display name of the parameter.
description string
The description of the parameter.
Default: None
min number
The minimum value of the parameter.
Default: None
max number
The maximum value of the parameter.
Default: None
default_value number
The default value to use.
Default: None
nullable boolean
Whether this parameter is nullable.
Default: False
ImageDetail
ImageDetailAllowed Values:
lowhighauto
InputVideoContent
InputVideoContentA content block representing a video input to the model.
Properties
type Literal "input_video"
The type of the input content. Always input_video.
Default: input_video
video_url string required
A base64 or remote url that resolves to a video file.
IntParameterSpec
IntParameterSpecProperties
type Literal "int" required
The type of the parameter.
key string required
The key of the parameter.
display_name string required
The display name of the parameter.
description string
The description of the parameter.
Default: None
min integer
The minimum value of the parameter.
Default: None
max integer
The maximum value of the parameter.
Default: None
default_value integer
The default value to use.
Default: None
nullable boolean
Whether this parameter is nullable.
Default: False
IntegrationModelProviderId
IntegrationModelProviderIdThe internal identifiers for all model providers known by the system.
Allowed Values:
anthropicfireworksgeminilatticeflownovitaopenaisambanovatogether
ListParameterSpec
ListParameterSpecProperties
type Literal "list" required
The type of the parameter.
dtype ScalarDtype required
The data type of the elements in the list.
key string required
The key of the parameter.
display_name string required
The display name of the parameter.
description string
The description of the parameter.
Default: None
default_value array[Any]
The default value to use.
Default: None
nullable boolean
Whether this parameter is nullable.
Default: False
MLTask
MLTaskThe type of machine learning task to be performed.
Allowed Values:
chat_completionembeddingscustom
MaxAggregator
MaxAggregatorAggregates numeric scores by taking the maximum value.
Properties
function Literal "max" required
Function
score_name string
The name to give to the aggregated score.
Default: None
MeanAggregator
MeanAggregatorAggregates numeric scores by computing the mean.
Properties
function Literal "mean" required
Function
score_name string
The name to give to the aggregated score.
Default: None
MinAggregator
MinAggregatorAggregates numeric scores by taking the minimum value.
Properties
function Literal "min" required
Function
score_name string
The name to give to the aggregated score.
Default: None
ModelCustomConnectionConfig
ModelCustomConnectionConfigConnection configuration for a model, that is provided manually by the user.
Properties
connection_type Literal "custom_connection" required
The type of connection config.
adapter_id string required
The ID of the model adapter to be used with this model.
url string required
The model endpoint URL.
api_key string, Secret
The key to be passed as the authorization header (Authorization: Bearer API_KEY).
Provide a raw string (deprecated) or reference a secret.
Default: None
model_key string
This field is used in case the model is not specified in the URL but in the body instead. For the "openai" adapter, this will be passed as the "model" parameter. For custom adapters, this value is available as model_info.model_key.
Default: None
tls_context TLSContext
TLS configuration for secure connections to the model endpoint.
Default: None
custom_headers object
Additional headers to include in requests to the model endpoint. Values may reference secrets.
Default: None
ModelParameterSpec
ModelParameterSpecProperties
type Literal "model" required
The type of the parameter.
key string required
The key of the parameter.
display_name string required
The display name of the parameter.
description string
The description of the parameter.
Default: None
default_value string
The default value to use.
Default: None
nullable boolean
Whether this parameter is nullable.
Default: False
ModelProviderConnectionConfig
ModelProviderConnectionConfigConnection configuration for a model, that is retrieved from a well-known provider integrated with the system.
Properties
connection_type Literal "provider_connection" required
The type of connection config.
provider_id ModelProviderId required
The id of the model provider.
model_key string required
A key used to identify the model in the external provider.
ModelProviderId
ModelProviderIdModelUsageStats
ModelUsageStatsAn object that contains the model usage summary for the task result.
Properties
num_samples integer required
Num Samples
num_completion_tokens integer
Num Completion Tokens
Default: None
num_prompt_tokens integer
Num Prompt Tokens
Default: None
PassAtKAggregator
PassAtKAggregatorAggregates binary (True/False) scores using the pass@k estimator. Estimates the probability that at least one of k independent attempts will succeed, computed as 1 - (1 - p)^k where p is the empirical pass rate across trials.
Properties
function Literal "pass@k" required
Function
k integer required
The number of independent attempts in the scenario being modelled.
score_name string
The name to give to the aggregated score.
Default: None
PassPowerKAggregator
PassPowerKAggregatorAggregates binary (True/False) scores using the pass^k estimator. Estimates the probability that an agent would succeed on all k independent attempts, computed as p^k where p is the empirical pass rate across trials.
Properties
function Literal "pass^k" required
Function
k integer required
The number of independent attempts in the scenario being modelled.
score_name string
The name to give to the aggregated score.
Default: None
RepeatabilityConfig
RepeatabilityConfigConfiguration for a repeatability task result.
Properties
num_runs integer required
Number of times to run the task.
ResultStatus
ResultStatusAllowed Values:
succeededfailed
ScalarDtype
ScalarDtypeThe scalar data type.
Allowed Values:
stringintegerfloatboolean
ScoreAggregator
ScoreAggregatorAggregation configuration for one or more score keys.
Properties
score_name string required
The name of the score that will be aggregated.
aggregator MeanAggregator, MinAggregator, MaxAggregator, PassAtKAggregator, PassPowerKAggregator required
Aggregator
ScorerPurpose
ScorerPurposeAllowed Values:
scoreqa
StoredDataset
StoredDatasetProperties
display_name string required
The display name of the dataset.
description string
An optional description of the dataset.
Default: None
long_description string
Long description of the dataset in Markdown format.
Default: None
key string required
Key: 1-250 chars, allowed: a-z A-Z 0-9 _ -
Pattern: ^[a-zA-Z0-9_\-]+$
Max Length: 250
id string required
Id
dataset_metadata DatasetMetadata
Dataset metadata.
Default: None
dataset_generation_metadata DatasetGenerationMetadata
Dataset generation metadata.
Default: None
created_at integer
Unix timestamp (in seconds).
Default: None
updated_at integer
Unix timestamp (in seconds).
Default: None
tags array[StoredTag]
Tags associated with the dataset.
Default: []
StoredModel
StoredModelProperties
id string required
Id
display_name string required
The name of the Model.
key string required
Unique identifier assigned to the entity in AI GO!.
Pattern: ^((together|gemini|openai|fireworks|sambanova|anthropic|novita|latticeflow)\$)?[a-zA-Z0-9_-]+$
Max Length: 250
description string
Description
Default: None
rate_limit integer
The maximum allowed number of requests per minute.
Default: None
max_concurrent_requests integer
The maximum number of concurrent inference requests.
Default: None
task MLTask required
config ModelCustomConnectionConfig, CustomInferenceModelConfig, ModelProviderConnectionConfig required
The configuration for connecting to the model.
adapter_id string required
The ID of the model adapter to be used with this model.
created_at integer
Unix timestamp (in seconds).
Default: None
updated_at integer
Unix timestamp (in seconds).
Default: None
StoredTag
StoredTagProperties
id string required
Id
value string required
The text value of the tag.
color string required
The color (#RRGGBB or #RGB) associated with the tag, used for UI representation.
Pattern: ^#([0-9a-fA-F]{6}|[0-9a-fA-F]{3})$
StoredTask
StoredTaskProperties
id string required
Id
key string required
Key: 1-250 chars, allowed: a-z A-Z 0-9 _ -
Pattern: ^[a-zA-Z0-9_\-]+$
Max Length: 250
display_name string required
The display name of the task.
description string required
The description of the task.
long_description string
Long description of the task in Markdown format.
Default: None
tasks array[MLTask]
The ML tasks for which the task is applicable.
Default: []
config_spec array[FloatParameterSpec, IntParameterSpec, BooleanParameterSpec, StringParameterSpec, ModelParameterSpec, DatasetParameterSpec, DatasetColumnParameterSpec, ListParameterSpec, DictParameterSpec, CategoricalParameterSpec] required
Config Spec
definition BenchmarkTaskDefinitionTemplate, SystemTaskDefinitionTemplate required
Definition
provider TaskProvider required
The provider of the task.
tags array[StoredTag] required
Tags associated with the task.
created_at integer
Unix timestamp (in seconds).
Default: None
updated_at integer
Unix timestamp (in seconds).
Default: None
StringKind
StringKindSpecifies the kind of string parameter.
Allowed Values:
freeformpythonjinja
StringParameterExample
StringParameterExampleProperties
value string required
The example value for the string parameter.
display_name string required
The display name of the example.
StringParameterSpec
StringParameterSpecProperties
type Literal "string" required
The type of the parameter.
key string required
The key of the parameter.
display_name string required
The display name of the parameter.
description string
The description of the parameter.
Default: None
default_value string
The default value of the parameter.
Default: None
nullable boolean
Whether this parameter is nullable.
Default: False
string_kind StringKind
Default: freeform
examples array[StringParameterExample]
Examples for the string parameter.
Default: None
Subsampling
SubsamplingThe subsampling strategy to use when selecting samples for evaluation. Supported values:
- head - Select the first N samples.
- random - Select N random samples. The random seed is fixed for reproducibility.
If not specified, defaults to 'head'.
Allowed Values:
headrandom
SynthesizerDatasetGenerationError
SynthesizerDatasetGenerationErrorProperties
stage Literal "synthesizer" required
Stage
error_type string required
The type of the error.
message string required
The specific error message that occurred during generation.
source_sample object required
The source sample for which an error occurred.
synthesizer_index integer required
The index of the synthesizer that caused the error.
SystemTaskDefinitionTemplate
SystemTaskDefinitionTemplateProperties
type Literal "system_task" required
The type of task definition.
compute_evidence_snippet string required
Python source code defining a def compute_evidence() function (sync or async) that returns metrics and optional metadata.
TaskDatasetTemplate
TaskDatasetTemplateThe dataset that will be used to evaluate the model.
Properties
id string required
Id
TaskMetricTemplate
TaskMetricTemplateProperties
key string
The key of the metric.
Default: None
type string required
The type of metric.
TaskProvider
TaskProviderAllowed Values:
latticeflowuser
TaskResultErrorStage
TaskResultErrorStageAllowed Values:
configurationdatasetsolverscoremetricaction
TaskScorerTemplate
TaskScorerTemplateProperties
key string
The key of the scorer.
Default: None
type string required
The type of the scorer.
display_name string
The display name of the scorer.
Default: None
purpose ScorerPurpose
The purpose of the scorer.
Default: score
metrics array[TaskMetricTemplate]
The metrics associated with this scorer, which will produce per-task metrics.
Default: None
TaskSolverTemplate
TaskSolverTemplateProperties
type string required
The type of the solver.
TextContent
TextContentA text content.
Properties
type Literal "text"
Type
Default: text
text string required
Text
TrialsConfig
TrialsConfigConfiguration for trials in a task result/specification. Only relevant for benchmark tasks.
Properties
num_trials integer required
Number of trials to run per sample.
TrialsDefinitionTemplate
TrialsDefinitionTemplateConfiguration for trials in a benchmark task definition template.
Properties
num_trials integer, string required
Number of trials to run per sample.
score_aggregators array[ScoreAggregator]
Score aggregators that compute an aggregated score given the score values for the different trials. Scores with no matching aggregator default to mean for numeric and boolean values (for other dtypes, no default aggregation is computed).
Default: None
TrustChainVerification
TrustChainVerificationHow to trust the CA trust chain.
verify_trust_chain(default) will verify the server certificate against the configured CA trust.accept_untrustedwill not perform server certificate verification. NOTE: This is a
security hazard and should be avoided.
Allowed Values:
verify_trust_chainaccept_untrusted
