System Tasks
A system task evaluates a system-level property without interacting with a model or dataset. Instead of the usual solver & scorer pipeline, a system task runs a single Python snippet — the compute_evidence_snippet — that probes a system and returns metrics directly.
Use a system task when:
- You need to verify an infrastructure or operational property (e.g. HTTPS enforcement, DNS configuration, certificate validity, endpoint availability).
- The check is self-contained — it does not require a dataset of test cases or a model to generate responses.
- You want to parameterize the check and run it across multiple configurations in a single evaluation.
Task Definition
Set definition.type to "system_task" and provide a compute_evidence_snippet with the Python code that implements the check. Declare any parameters the snippet needs in config_spec.
display_name: "Enforces HTTPS"
key: "enforces-https"
description: >
Checks whether a given URL enforces HTTPS by redirecting HTTP requests to HTTPS.
tags: ["Security"]
config_spec:
- type: "string"
key: "url"
display_name: "URL"
description: "The URL to check for HTTPS enforcement."
definition:
type: "system_task"
compute_evidence_snippet: !include "./check_https.py"Configuration parameters and secrets are available inside the snippet via the << config.KEY >> and << secrets.KEY >> placeholder syntax — see Config Specification and Use Secrets.
See the Tasks CLI reference for the full task specification.
Writing the Evidence Snippet
The snippet must define a compute_evidence function that returns a dictionary of metrics:
def compute_evidence():
...
return {"metrics": {"Metric Name": {"value": <number>, "reason": "<explanation>"}}}Each metric has a numeric value (typically 0 or 1 for pass/fail checks, but any number is valid) and a reason string. We encourage always providing a reason — it makes results interpretable in the UI and in exported evidence. Multiple metrics can be returned from a single snippet.
Example
The following snippet checks whether a URL enforces HTTPS by verifying that plain HTTP requests are redirected:
import http.client
from urllib.parse import urlparse
def compute_evidence():
url = "<< config.url >>"
parsed = urlparse(url if "://" in url else "http://" + url)
host = parsed.netloc or parsed.path
path = parsed.path if parsed.netloc else "/"
if not path:
path = "/"
try:
conn = http.client.HTTPConnection(host, timeout=10)
conn.request("GET", path)
resp = conn.getresponse()
conn.close()
location = resp.getheader("Location", "")
if resp.status in (301, 302, 307, 308) and location.lower().startswith("https://"):
return {
"metrics": {
"Enforces HTTPS": {
"value": 1,
"reason": f"HTTP {resp.status} redirects to {location}"
}
}
}
return {
"metrics": {
"Enforces HTTPS": {
"value": 0,
"reason": f"HTTP returned {resp.status} with no HTTPS redirect"
}
}
}
except OSError:
return {
"metrics": {
"Enforces HTTPS": {
"value": 1,
"reason": "HTTP connection refused — HTTPS is enforced at transport level"
}
}
}See the full runnable example for a complete system task with HSTS checks and an evaluation configuration.
Python snippet environmentThe snippet runs inside AI GO!'s fixed Python runtime (Python 3.11). Only the libraries listed in Python Snippets are available at execution time.
When to use a system task vs. a benchmark task:
Scenario Task Type Evaluate a model's responses against a dataset Benchmark task Evaluate dataset quality without a model Benchmark task ( evaluated_entity_type: dataset)Check an infrastructure or system property System task Run a self-contained probe that produces metrics directly System task A system task has no solver, no dataset, and no scorers — the
compute_evidencefunction is the entire execution pipeline.
