Policies
Policies let you define "quality gates" for your AI app and automatically check whether your evaluation results meet them. Consult the Policies guide for an overview and quickstart.
Policy Overview
A policy that groups multiple rules together.
Properties
key string required
A key for the policy within the AI app.
Pattern:^[a-zA-Z0-9_\-]+$Max Length:
250display_name string required
The display name of the policy.
description string
The description of the policy.
Default:Nonerules array[PolicyRule] required
The list of rules that belong to this policy.
...
policies:
- key: "rag_quality_gate_v1"
display_name: "RAG Quality Gate v1"
description: "Baseline quality thresholds for production RAG systems."
rules:
- key: "faithfulness_is_high"
display_name: "Faithfulness is High"
description: "Faithfulness must exceed 70%."
definition:
type: "threshold"
metric: "faithfulness"
operator: ">"
threshold: 0.7
scope: "all_latest"
- key: "faithfulness_is_measured"
display_name: "Faithfulness is Measured"
description: "Faithfulness metric must be computed."
definition:
type: "exists"
metric: "faithfulness"
scope: "all_latest"
- key: "faithfulness_in_rag_eval"
display_name: "Faithfulness in RAG Evaluation"
description: "Faithfulness must be measured in the specific RAG evaluation."
definition:
type: "exists"
metric: "faithfulness"
scope:
evaluation_keys:
- "rag-faithfulness-evaluation"
task_specification_keys:
- "rag-faithfulness-gpt-4-1-nano"Definitions
PolicyRule
PolicyRuleA rule that specifies a condition to be evaluated against metric values.
Properties
key string required
The unique key identifying the rule within its policy.
Pattern:^[a-zA-Z0-9_\-]+$Max Length:
250display_name string required
The display name of the rule.
description string
The description of the rule.
Default:Nonedefinition ExistsRuleDefinition, ThresholdRuleDefinition required
A discriminated union of rule definitions.
scope PolicyRuleSimpleScope, PolicyRuleFinegrainedScope required
The scope of the policy rule, i.e. the set of metrics the rule will operate on.
methodology string
The methodology describing what the rule checks and why.
Default:None...
policies:
- key: "rag_quality_gate_v1"
display_name: "RAG Quality Gate v1"
description: "Baseline quality thresholds for production RAG systems."
rules:
- key: "faithfulness_is_high"
display_name: "Faithfulness is High"
description: "Faithfulness must exceed 70%."
definition:
type: "threshold"
metric: "faithfulness"
operator: ">"
threshold: 0.7
scope: "all_latest"
- key: "faithfulness_is_measured"
display_name: "Faithfulness is Measured"
description: "Faithfulness metric must be computed."
definition:
type: "exists"
metric: "faithfulness"
scope: "all_latest"
- key: "faithfulness_in_rag_eval"
display_name: "Faithfulness in RAG Evaluation"
description: "Faithfulness must be measured in the specific RAG evaluation."
definition:
type: "exists"
metric: "faithfulness"
scope:
evaluation_keys:
- "rag-faithfulness-evaluation"
task_specification_keys:
- "rag-faithfulness-gpt-4-1-nano"ExistsRuleDefinition
ExistsRuleDefinitionRule that checks if a metric value with the given name exists.
Properties
type Literal "exists" required
The type of rule.
metric string required
The metric value name whose existence should be verified, i.e. one of the tasks must produce a metric value with this name.
...
policies:
- key: "rag_quality_gate_v1"
display_name: "RAG Quality Gate v1"
description: "Baseline quality thresholds for production RAG systems."
rules:
- key: "faithfulness_is_high"
display_name: "Faithfulness is High"
description: "Faithfulness must exceed 70%."
definition:
type: "threshold"
metric: "faithfulness"
operator: ">"
threshold: 0.7
scope: "all_latest"
- key: "faithfulness_is_measured"
display_name: "Faithfulness is Measured"
description: "Faithfulness metric must be computed."
definition:
type: "exists"
metric: "faithfulness"
scope: "all_latest"
- key: "faithfulness_in_rag_eval"
display_name: "Faithfulness in RAG Evaluation"
description: "Faithfulness must be measured in the specific RAG evaluation."
definition:
type: "exists"
metric: "faithfulness"
scope:
evaluation_keys:
- "rag-faithfulness-evaluation"
task_specification_keys:
- "rag-faithfulness-gpt-4-1-nano"ThresholdRuleDefinition
ThresholdRuleDefinitionRule that checks if metric values with the given name meet a threshold.
Properties
type Literal "threshold" required
The type of rule.
metric string required
The name of the metric values to compare against the threshold, i.e. at least one task must produce a metric value with this name and all metric values with this name must meet the threshold.
operator enum Operator required
Possible Operator values
The comparison operator to be used for a comparison against the threshold.
Allowed Values:
>>==<<=
threshold number required
The threshold value to compare against.
...
policies:
- key: "rag_quality_gate_v1"
display_name: "RAG Quality Gate v1"
description: "Baseline quality thresholds for production RAG systems."
rules:
- key: "faithfulness_is_high"
display_name: "Faithfulness is High"
description: "Faithfulness must exceed 70%."
definition:
type: "threshold"
metric: "faithfulness"
operator: ">"
threshold: 0.7
scope: "all_latest"
- key: "faithfulness_is_measured"
display_name: "Faithfulness is Measured"
description: "Faithfulness metric must be computed."
definition:
type: "exists"
metric: "faithfulness"
scope: "all_latest"
- key: "faithfulness_in_rag_eval"
display_name: "Faithfulness in RAG Evaluation"
description: "Faithfulness must be measured in the specific RAG evaluation."
definition:
type: "exists"
metric: "faithfulness"
scope:
evaluation_keys:
- "rag-faithfulness-evaluation"
task_specification_keys:
- "rag-faithfulness-gpt-4-1-nano"PolicyRuleSimpleScope
PolicyRuleSimpleScopeThe scope of the policy rule.
The available choices and their meanings are:
- all_latest - All metrics from task results in the latest evaluations per evaluation key are included.
Allowed Values:
all_latest
PolicyRuleFinegrainedScope
PolicyRuleFinegrainedScopeA precise scope definition that specifies the source of the metric values used by the policy rules, including the relevant evaluations, task specifications, scorers, and metrics.
Properties
evaluation_keys array[string]
The keys of the evaluations to take into account.
Default:Nonetask_specification_keys array[string]
The keys of the task specifications to take into account.
Default:Nonescorer_keys array[string]
The keys of the scorers to take into account.
Default:Nonemetric_keys array[string]
The keys of the metrics to take into account.
Default:None...
policies:
- key: "rag_quality_gate_v1"
display_name: "RAG Quality Gate v1"
description: "Baseline quality thresholds for production RAG systems."
rules:
- key: "faithfulness_is_high"
display_name: "Faithfulness is High"
description: "Faithfulness must exceed 70%."
definition:
type: "threshold"
metric: "faithfulness"
operator: ">"
threshold: 0.7
scope: "all_latest"
- key: "faithfulness_is_measured"
display_name: "Faithfulness is Measured"
description: "Faithfulness metric must be computed."
definition:
type: "exists"
metric: "faithfulness"
scope: "all_latest"
- key: "faithfulness_in_rag_eval"
display_name: "Faithfulness in RAG Evaluation"
description: "Faithfulness must be measured in the specific RAG evaluation."
definition:
type: "exists"
metric: "faithfulness"
scope:
evaluation_keys:
- "rag-faithfulness-evaluation"
task_specification_keys:
- "rag-faithfulness-gpt-4-1-nano"