Policies let you define "quality gates" for your AI app and automatically check whether your evaluation results meet them. Consult the Policies guide for an overview and quickstart.

Policy Overview

A policy that groups multiple rules together.

Properties

key string required

A key for the policy within the AI app.

Pattern: ^[a-zA-Z0-9_\-]+$
Max Length: 250

display_name string required

The display name of the policy.

description string

The description of the policy.

Default: None

rules array[PolicyRule] required

The list of rules that belong to this policy.

...
policies:
  - key: "rag_quality_gate_v1"
    display_name: "RAG Quality Gate v1"
    description: "Baseline quality thresholds for production RAG systems."
    rules:
      - key: "faithfulness_is_high"
        display_name: "Faithfulness is High"
        description: "Faithfulness must exceed 70%."
        definition:
          type: "threshold"
          metric: "faithfulness"
          operator: ">"
          threshold: 0.7
        scope: "all_latest"
      - key: "faithfulness_is_measured"
        display_name: "Faithfulness is Measured"
        description: "Faithfulness metric must be computed."
        definition:
          type: "exists"
          metric: "faithfulness"
        scope: "all_latest"
      - key: "faithfulness_in_rag_eval"
        display_name: "Faithfulness in RAG Evaluation"
        description: "Faithfulness must be measured in the specific RAG evaluation."
        definition:
          type: "exists"
          metric: "faithfulness"
        scope:
          evaluation_keys:
            - "rag-faithfulness-evaluation"
          task_specification_keys:
            - "rag-faithfulness-gpt-4-1-nano"

Definitions

`PolicyRule`

A rule that specifies a condition to be evaluated against metric values.

Properties

key string required

The unique key identifying the rule within its policy.

Pattern: ^[a-zA-Z0-9_\-]+$
Max Length: 250

display_name string required

The display name of the rule.

description string

The description of the rule.

Default: None

definition ExistsRuleDefinition, ThresholdRuleDefinition required

A discriminated union of rule definitions.

scope PolicyRuleSimpleScope, PolicyRuleFinegrainedScope required

The scope of the policy rule, i.e. the set of metrics the rule will operate on.

methodology string

The methodology describing what the rule checks and why.

Default: None

...
policies:
  - key: "rag_quality_gate_v1"
    display_name: "RAG Quality Gate v1"
    description: "Baseline quality thresholds for production RAG systems."
    rules:
      - key: "faithfulness_is_high"
        display_name: "Faithfulness is High"
        description: "Faithfulness must exceed 70%."
        definition:
          type: "threshold"
          metric: "faithfulness"
          operator: ">"
          threshold: 0.7
        scope: "all_latest"
      - key: "faithfulness_is_measured"
        display_name: "Faithfulness is Measured"
        description: "Faithfulness metric must be computed."
        definition:
          type: "exists"
          metric: "faithfulness"
        scope: "all_latest"
      - key: "faithfulness_in_rag_eval"
        display_name: "Faithfulness in RAG Evaluation"
        description: "Faithfulness must be measured in the specific RAG evaluation."
        definition:
          type: "exists"
          metric: "faithfulness"
        scope:
          evaluation_keys:
            - "rag-faithfulness-evaluation"
          task_specification_keys:
            - "rag-faithfulness-gpt-4-1-nano"

`ExistsRuleDefinition`

Rule that checks if a metric value with the given name exists.

Properties

type Literal "exists" required

The type of rule.

metric string required

The metric value name whose existence should be verified, i.e. one of the tasks must produce a metric value with this name.

...
policies:
  - key: "rag_quality_gate_v1"
    display_name: "RAG Quality Gate v1"
    description: "Baseline quality thresholds for production RAG systems."
    rules:
      - key: "faithfulness_is_high"
        display_name: "Faithfulness is High"
        description: "Faithfulness must exceed 70%."
        definition:
          type: "threshold"
          metric: "faithfulness"
          operator: ">"
          threshold: 0.7
        scope: "all_latest"
      - key: "faithfulness_is_measured"
        display_name: "Faithfulness is Measured"
        description: "Faithfulness metric must be computed."
        definition:
          type: "exists"
          metric: "faithfulness"
        scope: "all_latest"
      - key: "faithfulness_in_rag_eval"
        display_name: "Faithfulness in RAG Evaluation"
        description: "Faithfulness must be measured in the specific RAG evaluation."
        definition:
          type: "exists"
          metric: "faithfulness"
        scope:
          evaluation_keys:
            - "rag-faithfulness-evaluation"
          task_specification_keys:
            - "rag-faithfulness-gpt-4-1-nano"

`ThresholdRuleDefinition`

Rule that checks if metric values with the given name meet a threshold.

Properties

type Literal "threshold" required

The type of rule.

metric string required

The name of the metric values to compare against the threshold, i.e. at least one task must produce a metric value with this name and all metric values with this name must meet the threshold.

operator enum Operator required

Possible Operator values

The comparison operator to be used for a comparison against the threshold.

Allowed Values:

>
>=
=
<
<=

threshold number required

The threshold value to compare against.

...
policies:
  - key: "rag_quality_gate_v1"
    display_name: "RAG Quality Gate v1"
    description: "Baseline quality thresholds for production RAG systems."
    rules:
      - key: "faithfulness_is_high"
        display_name: "Faithfulness is High"
        description: "Faithfulness must exceed 70%."
        definition:
          type: "threshold"
          metric: "faithfulness"
          operator: ">"
          threshold: 0.7
        scope: "all_latest"
      - key: "faithfulness_is_measured"
        display_name: "Faithfulness is Measured"
        description: "Faithfulness metric must be computed."
        definition:
          type: "exists"
          metric: "faithfulness"
        scope: "all_latest"
      - key: "faithfulness_in_rag_eval"
        display_name: "Faithfulness in RAG Evaluation"
        description: "Faithfulness must be measured in the specific RAG evaluation."
        definition:
          type: "exists"
          metric: "faithfulness"
        scope:
          evaluation_keys:
            - "rag-faithfulness-evaluation"
          task_specification_keys:
            - "rag-faithfulness-gpt-4-1-nano"

`PolicyRuleSimpleScope`

The scope of the policy rule.

The available choices and their meanings are:

all_latest - All metrics from task results in the latest evaluations per evaluation key are included.

Allowed Values:

all_latest

`PolicyRuleFinegrainedScope`

A precise scope definition that specifies the source of the metric values used by the policy rules, including the relevant evaluations, task specifications, scorers, and metrics.

Properties

evaluation_keys array[string]

The keys of the evaluations to take into account.

Default: None

task_specification_keys array[string]

The keys of the task specifications to take into account.

Default: None

scorer_keys array[string]

The keys of the scorers to take into account.

Default: None

metric_keys array[string]

The keys of the metrics to take into account.

Default: None

...
policies:
  - key: "rag_quality_gate_v1"
    display_name: "RAG Quality Gate v1"
    description: "Baseline quality thresholds for production RAG systems."
    rules:
      - key: "faithfulness_is_high"
        display_name: "Faithfulness is High"
        description: "Faithfulness must exceed 70%."
        definition:
          type: "threshold"
          metric: "faithfulness"
          operator: ">"
          threshold: 0.7
        scope: "all_latest"
      - key: "faithfulness_is_measured"
        display_name: "Faithfulness is Measured"
        description: "Faithfulness metric must be computed."
        definition:
          type: "exists"
          metric: "faithfulness"
        scope: "all_latest"
      - key: "faithfulness_in_rag_eval"
        display_name: "Faithfulness in RAG Evaluation"
        description: "Faithfulness must be measured in the specific RAG evaluation."
        definition:
          type: "exists"
          metric: "faithfulness"
        scope:
          evaluation_keys:
            - "rag-faithfulness-evaluation"
          task_specification_keys:
            - "rag-faithfulness-gpt-4-1-nano"