Run Evaluation From AI Atlas
In this tutorial, we will run a harmful content evaluation for an OpenAI model using
lf init --atlasto download the evaluation package from AI Atlas.
Before You Begin
- You need a live AI GO! deployment.
- You need a Python environment with the AI GO! CLI installed. Follow the CLI installation page.
- You need a configured CLI. Follow the CLI configuration steps.
- You need an OpenAI API key set as
OPENAI_API_KEYin your environment.
Step 1: Create an AI App
Create an AI app to use as a workspace for the evaluation.
- Define the app in a YAML file.
display_name: "My App"
key: "my-app"- Create the app and switch to it.
lf app add -f app.yaml
lf switch my-app- Confirm the app is active.
$ lf status
Working on AI app with key 'my-app'.Step 2: Add the Model Under Test
Define and add the model you want to evaluate. This example uses OpenAI GPT-4.1 Nano.
- Define the model in a YAML file.
display_name: "OpenAI GPT-4.1 Nano"
key: "openai-gpt-4-1-nano"
task: "chat_completion"
config:
connection_type: "custom_connection"
adapter:
key: "latticeflow$openai_chat_completion"
url: "https://api.openai.com/v1/chat/completions"
api_key: $OPENAI_API_KEY
model_key: "gpt-4.1-nano"- Add the model.
lf model add -f model.yamlStep 3: Initialize the Evaluation from AI Atlas
Download the harmful_content evaluation package from AI Atlas into your working directory.
lf init --atlas harmful_contentThis creates a harmful_content/ directory containing the evaluation definition, datasets, tasks, a config.env file, and a RUN.md with evaluation-specific instructions.
Step 4: Configure the Evaluation
Open harmful_content/config.env and set the required values.
# The key of the model under test.
MODEL_KEY="openai-gpt-4-1-nano"
# The key of the model to use as judge.
JUDGE_MODEL_KEY="openai-gpt-4-1-nano"Step 5: Run the Evaluation
Run the evaluation, passing config.env via --env.
lf --env harmful_content/config.env run -f harmful_content/run.yamlYou will see output similar to:
On AI app 'my-app'.
[Dataset(key="harmful_content")] Created successfully
[Task(key="harmful_content")] Created successfully
[Evaluation(ID="1")] Created successfully
[Evaluation(ID="1")] Started successfully.
----------------------------------------------------------------------------------
Evaluation overview available at:
http://<your-aigo-url>/ai-apps/.../evaluations
Or in the CLI using:
lf overview eval --id 1
Step 6: Explore Results
- Check the evaluation status in the CLI.
lf overview eval --id 1-
Open the evaluations page in the UI to see all evaluation runs and aggregate metrics.
-
Drill into individual model responses and scores via the task result sidebar.
