Datasets

A dataset is a collection of data used to train, validate, and/or test a model. In AI GO!, datasets are used as a source of samples for AI model evaluation.

To interact with a dataset using the CLI, use the lf dataset command.

Dataset Overview

Properties


key string required

Unique identifier assigned to the entity in AI GO!.

Pattern: ^[a-zA-Z0-9_\-]+$
Max Length: 250

display_name string required

The dataset's name displayed to the user.


description string

Short description of the dataset.

Default: None

long_description string

Long description of the dataset. Supports Markdown formatting.

Default: None

source LocalDatasetSource, URLDatasetSource, HuggingFaceDatasetSource, LangSmithDatasetSource, PhoenixDatasetSource, ClaudeCodeDatasetSource, InspectAIDatasetSource

Dataset source configuration. Required if dataset generator is not used.

Default: None

generator_specification SDKDatasetGeneratorSpecification

Config for the dataset generator that will be used to generate the dataset. Required if the data file is not provided.

Default: None

file_path string

File containing the dataset's data. Supported formats are CSV and JSONL. Required if dataset generator is not used.

This field is deprecated and will be removed in future versions. Use 'source' with a dataset source configuration instead (e.g., 'source: {type: local, file_path: ...}').

Default: None

tags array[string]

Tags associated with the dataset.

Default: []
display_name: "Airline User Authentication"
key: "airline-user-authentication"
description: "Dataset of valid and invalid passenger email and booking ID pairs."
source:
  type: "local"
  file_path: "./test_cases.csv"
Email Address,Booking Reference,Complete (True / False),Frequent Flyer Number,Seat Number,Departure Date
[email protected],BKGIKAS4,False,LX872246,16B,2025-07-02
[email protected],BKGHBZK0,True,LX329258,3A,2025-07-29
[email protected],BKGMTOQ6,True,LX781453,36B,2025-07-04
[email protected],BKGSUOO0,False,LX543143,1B,2025-08-02
[email protected],BKGHCBK0,True,LX498382,22A,2025-07-04
[email protected],BKGOHCO4,True,LX865179,39C,2025-07-15
[email protected],BKGFBQC8,False,LX969693,6E,2025-08-15
[email protected],BKGIELC8,False,LX910620,3F,2025-07-18
[email protected],BKGRADC6,True,LX575435,7D,2025-08-20
[email protected],BKGPHGO0,True,LX835911,14F,2025-08-04
display_name: "Judge Comparison"
key: "jailbreakbench-judge-comparison"
description: >
  Jailbreakbench is an open-source robustness benchmark for jailbreaking large language
  models (LLMs).
source:
  type: "url"
  url: "https://huggingface.co/datasets/JailbreakBench/JBB-Behaviors/raw/main/data/judge-comparison.csv"
key: "harmbench_illegal"
display_name: "HarmBench: Illegal Activities"
description: >-
  HarmBench is a standardized evaluation framework for automated red teaming methods
  targeting large language models (LLMs).
source:
  type: "huggingface"
  path: "allenai/tulu-3-harmbench-eval"
  split: "test"
  filters:
    - op: "equals"
      expression: "{{SemanticCategory}}"
      value: "illegal"
display_name: "Legal Questions"
key: "legal-questions"
description: "Generated using 'My Question Generator'"
generator_specification:
  dataset_generator_key: "question-generator"
  num_samples: 5
  dataset_generator_config:
    topic: "Legal"
display_name: "My Question Generator"
key: "question-generator"
description: "Generates questions about arbitrary topics."
config_spec:
  - key: "topic"
    type: "string"
    display_name: "Topic"
definition:
  type: "declarative_dataset_generator"
  data_source:
    type: "empty"
  synthesizers:
    - type: "llm"
      model_key: "openai$gpt-4-1-nano"
      system_prompt_template: "You are a helpful dataset generator."
      user_prompt_template: >
        Produce 5 deep knowledge questions about << config.topic >>, including the
        correct answer.
      sample_properties:
        question:
          type: "string"
        answer:
          type: "string"

Definitions

LocalDatasetSource

Dataset from a local file path.

Properties


type Literal "local" required

Local file dataset source.


file_path string required

Path to a local CSV or JSONL file.

...
source:
  type: "local"
  file_path: "./test_cases.csv"

URLDatasetSource

Dataset from a remote URL.

Properties


type Literal "url" required

Remote URL dataset source.


url string required

URL pointing to a CSV or JSONL file. Must end with .csv or .jsonl extension.


filters array[FilterComparison, FilterMembership, FilterUnary]

Optional list of filters to apply after loading (AND-combined).

...
source:
  type: "url"
  url: "https://huggingface.co/datasets/JailbreakBench/JBB-Behaviors/raw/main/data/judge-comparison.csv"
💡

The URL must point directly to a .csv or .jsonl file. Filters are AND-combined and applied after the file is downloaded.

HuggingFaceDatasetSource

Dataset from HuggingFace dataset hub.

Properties


type Literal "huggingface" required

HuggingFace dataset source.


path string required

HuggingFace dataset path (e.g., 'squad', 'glue').


split string required

Dataset split to load (e.g., 'train', 'test', 'validation').


load_dataset_kwargs object

Additional keyword arguments to pass to datasets.load_dataset().


filters array[FilterComparison, FilterMembership, FilterUnary]

Optional list of filters to apply after loading (AND-combined).

...
source:
  type: "huggingface"
  path: "allenai/tulu-3-harmbench-eval"
  split: "test"
  filters:
    - op: "equals"
      expression: "{{SemanticCategory}}"
      value: "illegal"
💡

The huggingface source requires the datasets library (uv pip install datasets). Use load_dataset_kwargs to pass any additional arguments accepted by datasets.load_dataset().

ClaudeCodeDatasetSource

Dataset source from Claude Code local session files.

Properties


type Literal "claude_code" required

Claude Code trace source.


path string

Path to Claude Code session file or project directory. Defaults to ~/.claude/projects/.

Default: None

session_id string

Specific session ID to import.

Default: None

from_time string

Start time filter for sessions.

Default: None

to_time string

End time filter for sessions.

Default: None

limit integer

Maximum number of sessions to import.

Default: None

LangSmithDatasetSource

Dataset source from LangSmith observability traces.

Properties


type Literal "langsmith" required

LangSmith trace source.


project string

LangSmith project name to import traces from.

Default: None

dataset string

LangSmith dataset name to import examples from.

Default: None

api_key string

LangSmith API key. Falls back to LANGSMITH_API_KEY env var.

Default: None

api_url string

LangSmith API URL. Falls back to LANGSMITH_ENDPOINT env var.

Default: None

from_time string

Start time filter for traces.

Default: None

to_time string

End time filter for traces.

Default: None

tags array[string]

Filter traces by tags.

Default: None

filter string

LangSmith filter string for traces.

Default: None

limit integer

Maximum number of traces to import.

Default: None

PhoenixDatasetSource

Dataset source from Arize Phoenix observability traces.

Properties


type Literal "phoenix" required

Phoenix trace source.


project string required

Phoenix project name.


from_time string

Start time filter for traces.

Default: None

to_time string

End time filter for traces.

Default: None

trace_id string, array[string]

Specific trace ID(s) to import.

Default: None

session_id string, array[string]

Filter traces by session ID(s).

Default: None

tags array[string]

Filter traces by tags.

Default: None

metadata object

Filter traces by metadata key-value pairs on the root span (all must match).

Default: None

limit integer

Maximum number of traces to import.

Default: None

api_key string

Phoenix API key. Falls back to PHOENIX_API_KEY env var.

Default: None

base_url string

Phoenix base URL. Falls back to PHOENIX_COLLECTOR_ENDPOINT env var.

Default: None

InspectAIDatasetSource

Dataset source from Inspect AI eval log files.

Properties


type Literal "inspect_ai" required

Inspect AI eval log source.


path string required

Path to a directory containing .eval log files, a single .eval file, or a transcript database directory.


limit integer

Maximum number of transcripts to import.

Default: None

SDKDatasetGeneratorSpecification

Properties


dataset_generator_config object required

The configuration used by the dataset generator.


num_samples integer required

The number of samples to generate. At least 1 sample must be requested.


debug DatasetGenerationDebugOptions

Default: None

dataset_generator_key string required

Key of the dataset generator to use.

...
generator_specification:
  dataset_generator_key: "question-generator"
  num_samples: 5
  dataset_generator_config:
    topic: "Legal"
💡

The dataset_generator_config allows instantiating the same dataset generator template in different ways. Always refer to the dataset generator definition for the available configuration options.

FilterComparison

Properties


op enum FilterComparisonOp required

Possible FilterComparisonOp values

The comparison operator to apply.

Allowed Values:

  • equals
  • not_equals
  • greater_than
  • less_than
  • greater_or_equal
  • less_or_equal

expression string required

An expression encoding what to compare against the value.

Depending on the context, it can refer to different variables:

  • When filtering a dataset: it can refer to the sample and use dot or bracket notation to access the columns. If filtering a dataset with column names that are illegal under jinja substitution rules (e.g. containing spaces), use bracket notation to access the column.
  • When used within a task action: it can refer to the sample, the solver_output or the scores (which is a mapping between scorer keys and their corresponding score values dict).

value string, number, integer, boolean required

The value against which the expression is compared.

...
source:
  ...
  filters:
    - op: "equals"
      expression: "{{SemanticCategory}}"
      value: "illegal"

FilterMembership

Properties


op enum FilterMembershipOp required

Possible FilterMembershipOp values

The membership operator to apply.

Allowed Values:

  • in
  • not_in

expression string required

An expression encoding what to check membership against the values.

Depending on the context, it can refer to different variables:

  • When filtering a dataset: it can refer to column values by name (ex: {{ category }}).
  • When used within a task action: it can refer to the sample, the solver_output or the scores (which is a mapping between scorer keys and their corresponding score values dict).

values array[string, number, boolean] required

The set of values to test membership against.

...
source:
  ...
  filters:
    - op: "not_in"
      expression: "{{ SemanticCategory }}"
      values:
        - "copyright"
        - "chemical_biological"

FilterUnary

Properties


op enum FilterUnaryOp required

Possible FilterUnaryOp values

The unary operator to apply.

Allowed Values:

  • exists
  • not_exists
  • is_true
  • is_false

expression string required

An expression encoding what to apply the unary operator to.

Depending on the context, it can refer to different variables:

  • When filtering a dataset: it can refer to column values by name (ex: {{ category }}).
  • When used within a task action: it can refer to the sample, the solver_output or the scores (which is a mapping between scorer keys and their corresponding score values dict).

...
source:
  ...
  filters:
    - op: "is_true"
      expression: "{{ answer }}"