Datasets
A dataset is a collection of data used to train, validate, and/or test a model. In AI GO!, datasets are used as a source of samples for AI model evaluation.
To interact with a dataset using the CLI, use the lf dataset command.
Dataset Overview
Properties
key string required
Unique identifier assigned to the entity in AI GO!.
Pattern:^[a-zA-Z0-9_\-]+$Max Length:
250display_name string required
The dataset's name displayed to the user.
description string
Short description of the dataset.
Default:Nonelong_description string
Long description of the dataset. Supports Markdown formatting.
Default:Nonesource LocalDatasetSource, URLDatasetSource, HuggingFaceDatasetSource, LangSmithDatasetSource, PhoenixDatasetSource, ClaudeCodeDatasetSource, InspectAIDatasetSource
Dataset source configuration. Required if dataset generator is not used.
Default:Nonegenerator_specification SDKDatasetGeneratorSpecification
Config for the dataset generator that will be used to generate the dataset. Required if the data file is not provided.
Default:Nonefile_path string
File containing the dataset's data. Supported formats are CSV and JSONL. Required if dataset generator is not used.
This field is deprecated and will be removed in future versions. Use 'source' with a dataset source configuration instead (e.g., 'source: {type: local, file_path: ...}').
Default:Nonetags array[string]
Tags associated with the dataset.
Default:[]display_name: "Airline User Authentication"
key: "airline-user-authentication"
description: "Dataset of valid and invalid passenger email and booking ID pairs."
source:
type: "local"
file_path: "./test_cases.csv"Email Address,Booking Reference,Complete (True / False),Frequent Flyer Number,Seat Number,Departure Date
[email protected],BKGIKAS4,False,LX872246,16B,2025-07-02
[email protected],BKGHBZK0,True,LX329258,3A,2025-07-29
[email protected],BKGMTOQ6,True,LX781453,36B,2025-07-04
[email protected],BKGSUOO0,False,LX543143,1B,2025-08-02
[email protected],BKGHCBK0,True,LX498382,22A,2025-07-04
[email protected],BKGOHCO4,True,LX865179,39C,2025-07-15
[email protected],BKGFBQC8,False,LX969693,6E,2025-08-15
[email protected],BKGIELC8,False,LX910620,3F,2025-07-18
[email protected],BKGRADC6,True,LX575435,7D,2025-08-20
[email protected],BKGPHGO0,True,LX835911,14F,2025-08-04display_name: "Judge Comparison"
key: "jailbreakbench-judge-comparison"
description: >
Jailbreakbench is an open-source robustness benchmark for jailbreaking large language
models (LLMs).
source:
type: "url"
url: "https://huggingface.co/datasets/JailbreakBench/JBB-Behaviors/raw/main/data/judge-comparison.csv"key: "harmbench_illegal"
display_name: "HarmBench: Illegal Activities"
description: >-
HarmBench is a standardized evaluation framework for automated red teaming methods
targeting large language models (LLMs).
source:
type: "huggingface"
path: "allenai/tulu-3-harmbench-eval"
split: "test"
filters:
- op: "equals"
expression: "{{SemanticCategory}}"
value: "illegal"display_name: "Legal Questions"
key: "legal-questions"
description: "Generated using 'My Question Generator'"
generator_specification:
dataset_generator_key: "question-generator"
num_samples: 5
dataset_generator_config:
topic: "Legal"display_name: "My Question Generator"
key: "question-generator"
description: "Generates questions about arbitrary topics."
config_spec:
- key: "topic"
type: "string"
display_name: "Topic"
definition:
type: "declarative_dataset_generator"
data_source:
type: "empty"
synthesizers:
- type: "llm"
model_key: "openai$gpt-4-1-nano"
system_prompt_template: "You are a helpful dataset generator."
user_prompt_template: >
Produce 5 deep knowledge questions about << config.topic >>, including the
correct answer.
sample_properties:
question:
type: "string"
answer:
type: "string"Definitions
LocalDatasetSource
LocalDatasetSourceDataset from a local file path.
Properties
type Literal "local" required
Local file dataset source.
file_path string required
Path to a local CSV or JSONL file.
...
source:
type: "local"
file_path: "./test_cases.csv"URLDatasetSource
URLDatasetSourceDataset from a remote URL.
Properties
type Literal "url" required
Remote URL dataset source.
url string required
URL pointing to a CSV or JSONL file. Must end with .csv or .jsonl extension.
filters array[FilterComparison, FilterMembership, FilterUnary]
Optional list of filters to apply after loading (AND-combined).
...
source:
type: "url"
url: "https://huggingface.co/datasets/JailbreakBench/JBB-Behaviors/raw/main/data/judge-comparison.csv"The URL must point directly to a
.csvor.jsonlfile. Filters are AND-combined and applied after the file is downloaded.
HuggingFaceDatasetSource
HuggingFaceDatasetSourceDataset from HuggingFace dataset hub.
Properties
type Literal "huggingface" required
HuggingFace dataset source.
path string required
HuggingFace dataset path (e.g., 'squad', 'glue').
split string required
Dataset split to load (e.g., 'train', 'test', 'validation').
load_dataset_kwargs object
Additional keyword arguments to pass to datasets.load_dataset().
filters array[FilterComparison, FilterMembership, FilterUnary]
Optional list of filters to apply after loading (AND-combined).
...
source:
type: "huggingface"
path: "allenai/tulu-3-harmbench-eval"
split: "test"
filters:
- op: "equals"
expression: "{{SemanticCategory}}"
value: "illegal"The
huggingfacesource requires thedatasetslibrary (uv pip install datasets). Useload_dataset_kwargsto pass any additional arguments accepted bydatasets.load_dataset().
ClaudeCodeDatasetSource
ClaudeCodeDatasetSourceDataset source from Claude Code local session files.
Properties
type Literal "claude_code" required
Claude Code trace source.
path string
Path to Claude Code session file or project directory. Defaults to ~/.claude/projects/.
Default:None
session_id string
Specific session ID to import.
Default:None
from_time string
Start time filter for sessions.
Default:None
to_time string
End time filter for sessions.
Default:None
limit integer
Maximum number of sessions to import.
Default:None
LangSmithDatasetSource
LangSmithDatasetSourceDataset source from LangSmith observability traces.
Properties
type Literal "langsmith" required
LangSmith trace source.
project string
LangSmith project name to import traces from.
Default:None
dataset string
LangSmith dataset name to import examples from.
Default:None
api_key string
LangSmith API key. Falls back to LANGSMITH_API_KEY env var.
Default:None
api_url string
LangSmith API URL. Falls back to LANGSMITH_ENDPOINT env var.
Default:None
from_time string
Start time filter for traces.
Default:None
to_time string
End time filter for traces.
Default:None
tags array[string]
Filter traces by tags.
Default:None
filter string
LangSmith filter string for traces.
Default:None
limit integer
Maximum number of traces to import.
Default:None
PhoenixDatasetSource
PhoenixDatasetSourceDataset source from Arize Phoenix observability traces.
Properties
type Literal "phoenix" required
Phoenix trace source.
project string required
Phoenix project name.
from_time string
Start time filter for traces.
Default:None
to_time string
End time filter for traces.
Default:None
trace_id string, array[string]
Specific trace ID(s) to import.
Default:None
session_id string, array[string]
Filter traces by session ID(s).
Default:None
tags array[string]
Filter traces by tags.
Default:None
metadata object
Filter traces by metadata key-value pairs on the root span (all must match).
Default:None
limit integer
Maximum number of traces to import.
Default:None
api_key string
Phoenix API key. Falls back to PHOENIX_API_KEY env var.
Default:None
base_url string
Phoenix base URL. Falls back to PHOENIX_COLLECTOR_ENDPOINT env var.
Default:None
InspectAIDatasetSource
InspectAIDatasetSourceDataset source from Inspect AI eval log files.
Properties
type Literal "inspect_ai" required
Inspect AI eval log source.
path string required
Path to a directory containing .eval log files, a single .eval file, or a transcript database directory.
limit integer
Maximum number of transcripts to import.
Default:None
SDKDatasetGeneratorSpecification
SDKDatasetGeneratorSpecificationProperties
dataset_generator_config object required
The configuration used by the dataset generator.
num_samples integer required
The number of samples to generate. At least 1 sample must be requested.
debug DatasetGenerationDebugOptions
Nonedataset_generator_key string required
Key of the dataset generator to use.
...
generator_specification:
dataset_generator_key: "question-generator"
num_samples: 5
dataset_generator_config:
topic: "Legal"The
dataset_generator_configallows instantiating the same dataset generator template in different ways. Always refer to the dataset generator definition for the available configuration options.
FilterComparison
FilterComparisonProperties
op enum FilterComparisonOp required
Possible FilterComparisonOp values
The comparison operator to apply.
Allowed Values:
equalsnot_equalsgreater_thanless_thangreater_or_equalless_or_equal
expression string required
An expression encoding what to compare against the value.
Depending on the context, it can refer to different variables:
- When filtering a dataset: it can refer to the
sampleand use dot or bracket notation to access the columns. If filtering a dataset with column names that are illegal under jinja substitution rules (e.g. containing spaces), use bracket notation to access the column. - When used within a task action: it can refer to the
sample, thesolver_outputor thescores(which is a mapping between scorer keys and their corresponding score values dict).
value string, number, integer, boolean required
The value against which the expression is compared.
...
source:
...
filters:
- op: "equals"
expression: "{{SemanticCategory}}"
value: "illegal"FilterMembership
FilterMembershipProperties
op enum FilterMembershipOp required
Possible FilterMembershipOp values
The membership operator to apply.
Allowed Values:
innot_in
expression string required
An expression encoding what to check membership against the values.
Depending on the context, it can refer to different variables:
- When filtering a dataset: it can refer to column values by name (ex:
{{ category }}). - When used within a task action: it can refer to the
sample, thesolver_outputor thescores(which is a mapping between scorer keys and their corresponding score values dict).
values array[string, number, boolean] required
The set of values to test membership against.
...
source:
...
filters:
- op: "not_in"
expression: "{{ SemanticCategory }}"
values:
- "copyright"
- "chemical_biological"FilterUnary
FilterUnaryProperties
op enum FilterUnaryOp required
Possible FilterUnaryOp values
The unary operator to apply.
Allowed Values:
existsnot_existsis_trueis_false
expression string required
An expression encoding what to apply the unary operator to.
Depending on the context, it can refer to different variables:
- When filtering a dataset: it can refer to column values by name (ex:
{{ category }}). - When used within a task action: it can refer to the
sample, thesolver_outputor thescores(which is a mapping between scorer keys and their corresponding score values dict).
...
source:
...
filters:
- op: "is_true"
expression: "{{ answer }}"