My First Evaluation

📘

In this tutorial, we will run a technical evaluations for harmful content for OpenAI's GPT-4.1 Mini model using the AILuminate benchmark.

Prerequisites

  1. We need a live AI GO! deployment.
  2. We need a Python environment with installed AI GO! CLI. To do so, follow the CLI installation page.
  3. We need a configured CLI. To do so, follow CLI configuration steps .

Step 1: Download Bundle

We download the bundle for your first evaluation from the AI GO! integrations GitHub repository.

git clone https://github.com/latticeflow-one/aigo-integrations.git -b latest
cd aigo-integrations/tutorials/my_first_evaluation/

This bundle includes all the necessary files to run an evaluation - definition of the AI app, datasets, models, tasks and evaluations. We are going to use app.yaml to create the new workspace and run.yaml to create all entities and run our first evaluation.

my_first_evaluation/
├── app.yaml
├── datasets
│   └── hate
│       ├── data.jsonl
│       ├── dataset.yaml
│       └── README.md
├── README.md
├── run.yaml
└── tasks
    ├── hate
    │   ├── README.md
    │   └── task.yaml
    └── scorers
        ├── harmfulness_maaj.yaml
        └── llama_guard_system_prompt.md

Step 2: Create AI App

We create the AI app that we will use as a workspace for running the evaluations.

  1. Create the AI app by running the CLI command.
lf app add -f app.yaml
  1. Switch to the new AI app context in the CLI.
lf switch my-first-eval
  1. Observe that the new AI app is created and is in the CLI's context now.
$ lf status
Working on AI app with key 'my-first-eval'.
AI GO! URL: https://aigo.latticeflow.cloud
AI GO! API Key: ********

You can see your saved configuration at '~/.latticeflow/config.json'.
To update it, run the `lf configure` command again.
  1. View the AI app in the GUI.

Step 3: Run Evaluation

We will run the evaluation as specified in the run.yaml file, which also includes the definition of the OpenAI's GPT-4.1 Mini model that we want to evaluated.

  1. Configure the API key for the OpenAI model using the CLI.
lf integration add --provider openai --api-key $OPENAI_API_KEY
  1. Run the creation of all entities and the first evaluation.
lf run -f run.yaml
  1. Check the status of your evaluation using the CLI or open it in the UI (use the link printed in the terminal).
$ lf run run.yaml
On AI app 'my-first-eval'.
[Model(key="openai$gpt-4-1-mini-2025-04-14")] Integrated successfully
[Dataset(key="ai_luminate_hate")] Created successfully
[Task(key="ai_luminate_hate")] Created successfully
[Evaluation(ID="18")] Created successfully
Newly linked tags:
+ General Purpose AI
+ Harmful Content
[Evaluation(ID="18")] Started successfully.
----------------------------------------------------------------------------------
Evaluation overview available at:

http://titan.latticeflow.cloud:9106/ai-apps/15/evaluations

Or in the CLI using:

lf evaluation overview 18

Step 4: Explore Results

We explore the evaluation results in the UI.

  1. Open evaluations page to see all evaluation runs.

  1. Explore the aggregate metric for the task result.

  1. Explore model responses and performance for each sample used by the evaluation task via the sidebar.