Azure Foundry

Just want to use the integration? If you only need to plug an Azure AI Foundry agent into AI GO!, you can use the ready-made integration directly — see the full integration in our registry and GitHub repo: <REGISTRY_URL> / <GITHUB_URL>. This tutorial walks through how that integration is built so you can adapt it to your own stateful agent.

This tutorial demonstrates how to connect an Azure AI Foundry agent to AI GO! as a custom-inference model, so it can be evaluated like any other model — while preserving conversation state across turns.

The hard part of integrating a stateful agent is not calling the endpoint once; it is making multi-turn conversations work. Foundry keeps conversation state server-side in a conversation, and AI GO! drives the conversation one user turn at a time. The integration must keep both sides in sync so that turn 3 remembers what happened in turns 1 and 2.

This pattern applies to any stateful agent runtime — Azure AI Foundry, a custom agent server, or a hosted assistant API — where the conversation lives behind a session/conversation identifier and each call returns a multi-step trace.

What you will build

A complete custom-inference model integration that:

  1. Forwards each user turn to a Foundry agent through its OpenAI-compatible Responses API (via azure-ai-projects)
  2. Maintains conversation continuity across turns by round-tripping Foundry's conversation_id
  3. Returns the full per-turn agent trace — MCP tool calls, tool outputs, and the final reply — in AI GO!'s Open Responses format, ready for trace-aware scorers

By the end, you will have a model you can register, test, and point any multi-turn evaluation at.


Step 1: Understand the integration

The multi-turn challenge

AI GO! sends a chat-completion request for every turn. The body contains the whole conversation so far; the last entry is the current user turn:

{
  "messages": [
    { "role": "user", "content": "Hi, can you help me see my orders?" },
    { "role": "assistant", "content": "Sure! What's your email and order ID?",
      "conversation_id": "conv_01ABcDeFgHiJkLmNoPqRsTuV" },
    { "role": "user", "content": "[email protected], order ORD-1001" }
  ]
}

Foundry, however, does not want the whole history replayed — it already has it stored against the conversation. It only wants the new user message, plus the conversation_id that identifies the conversation.

The mechanism for keeping both sides in sync is passing the conversation_id through the conversation itself. AI GO! messages allow extra fields, and any custom field we set on an assistant message is echoed back unchanged on the next turn — this is the supported way to carry custom data between requests. So on each response we attach the conversation_id to the assistant message; on the next request we read it back and reuse the same conversation. The first turn (no prior assistant message) creates a fresh conversation.

Example request bodies (AI GO! → run_inference)

First turn — just the new user message:

{
  "messages": [
    { "role": "user", "content": "Search the catalog for shoes." }
  ]
}

Subsequent turns echo the prior assistant message with its conversation_id, so we keep using the same Foundry conversation:

{
  "messages": [
    { "role": "user", "content": "Hello!" },
    { "role": "assistant", "content": "Hello! How can I assist you today?",
      "conversation_id": "conv_01ABcDeFgHiJkLmNoPqRsTuV" },
    { "role": "user", "content": "What is my conversation id?" }
  ]
}

The three-stage pipeline

A single inference runs as a three-stage pipeline:

ChatCompletionInput  ->  ModelInput  ->  RawModelOutput  ->  OpenResponsesModelOutput
   (from AI GO!)         convert_        query_model        convert_model_output
                         user_input

run_inference ties the stages together: parse the request, convert it to what the Foundry call needs, run the agent, and convert the response output back into the Open Responses format AI GO! expects.

The output shape

The Foundry Responses API returns a list of output items — a list of available MCP tools, the tool calls the agent made, and the final assistant message. We translate the meaningful ones into Open Responses trace items:

TypeDescription
messageA text message (here, the assistant's reply)
function_callAn MCP tool the agent invoked (name, arguments)
function_call_outputThe result of a tool call, linked by call_id

The informational mcp_list_tools item (the catalog of available tools) is skipped. Emitting the call/output items (rather than just the final text) is what lets downstream trace-aware scorers inspect how the agent reached its answer.


Step 2: Build the inference handler

The integration lives in a single Python file, run_inference.py, that defines a run_inference(body, environment) function. AI GO! calls it once per turn.

The entry point

run_inference is the function AI GO! calls. It wires the three stages together — parse and convert the request, call the Foundry agent, and convert the response back into the Open Responses format AI GO! expects:

def run_inference(body: str, environment: dict[str, Any]) -> str:
    model_input = convert_user_input(
        ChatCompletionInput.model_validate(json.loads(body))
    )
    raw = query_model(model_input, environment)
    model_output = convert_model_output(raw)
    return model_output.model_dump_json()

The rest of this step implements each stage in turn.

Model-side types

We model the two intermediate payloads explicitly so the data flow stays legible:

class ModelInput(BaseModel):
    """Request payload the Foundry Responses call needs."""
    conversation_id: str = ""        # empty on the first turn, reused afterwards
    user_message: str


class RawModelOutput(BaseModel):
    """The Foundry response output items, plus the conversation id and usage."""
    conversation_id: str
    output: list
    usage: dict = Field(default_factory=dict)

Stage 1 — convert_user_input

This is where multi-turn continuity is established. We pull the latest user message and recover the conversation_id echoed by the previous assistant turn:

def convert_user_input(data: ChatCompletionInput) -> ModelInput:
    messages = data.messages
    last_user = next(m for m in reversed(messages) if m.role == "user")

    conversation_id = ""
    for msg in reversed(messages):
        if msg.role == "assistant":
            conversation_id = getattr(msg, "conversation_id", "") or ""
            break

    return ModelInput(
        conversation_id=conversation_id,
        user_message=_message_text(last_user.content),
    )

On the first turn there is no prior assistant message, so conversation_id stays empty — signalling that a new conversation must be created.

Stage 2 — query_model

Open a conversation on the first turn, then call the agent through the OpenAI-compatible Responses client. The agent_reference extra_body targets the deployed Foundry agent by name:

def query_model(model_input: ModelInput, environment: dict[str, Any]) -> RawModelOutput:
    credential = _build_credential(environment)
    project_client = AIProjectClient(
        endpoint=environment["AZURE_AI_PROJECT_ENDPOINT"],
        credential=credential,
    )

    with project_client, project_client.get_openai_client() as openai_client:
        conversation_id = model_input.conversation_id
        if not conversation_id:
            conversation = openai_client.conversations.create()
            conversation_id = conversation.id

        response = openai_client.responses.create(
            conversation=conversation_id,
            input=model_input.user_message,
            extra_body={
                "agent_reference": {
                    "name": environment["AZURE_FOUNDRY_AGENT_NAME"],
                    "type": "agent_reference",
                }
            },
        )

    output = [item.model_dump(mode="json") for item in response.output]
    usage = response.usage.model_dump(mode="json") if response.usage else {}
    return RawModelOutput(conversation_id=conversation_id, output=output, usage=usage)

We send only the new user message — Foundry appends it to the conversation it already holds.

Authentication (_build_credential)

Authentication uses a service principal in the LatticeFlow runtime and falls back to your local az login token during development. Create the service principal and grant it access to the Foundry project:

az ad sp create-for-rbac --name "aigo-foundry"
# then grant it the "Azure AI User" role on the Foundry project

The three resulting values (AZURE_TENANT_ID / AZURE_CLIENT_ID / AZURE_CLIENT_SECRET) are injected via model.yaml. The helper builds a credential from them when present, otherwise uses DefaultAzureCredential:

def _build_credential(environment: dict[str, Any]) -> Any:
    tenant_id = environment.get("AZURE_TENANT_ID")
    client_id = environment.get("AZURE_CLIENT_ID")
    client_secret = environment.get("AZURE_CLIENT_SECRET")
    if tenant_id and client_id and client_secret:
        return ClientSecretCredential(tenant_id, client_id, client_secret)
    return DefaultAzureCredential()
Example raw model output (Foundry → query_model)

query_model returns the response output items plus the conversation_id we used, so convert_model_output can echo it back:

{
  "conversation_id": "conv_01ABcDeFgHiJkLmNoPqRsTuV",
  "output": [
    { "type": "mcp_list_tools", "id": "mcpl_01...", "server_label": "retail-tools",
      "tools": [] },
    { "type": "mcp_call", "id": "mcp_01...", "server_label": "retail-tools",
      "name": "search_products", "arguments": "{\"query\": \"shoes\"}",
      "output": "[]", "status": "completed", "error": null },
    { "type": "message", "id": "msg_01...", "role": "assistant",
      "status": "completed",
      "content": [{ "type": "output_text",
                    "text": "I couldn't find any shoes in the catalog...",
                    "annotations": [] }] }
  ],
  "usage": { "input_tokens": 201, "output_tokens": 22 }
}

Stage 3 — convert_model_output

The conversion is a one-liner — all of the work lives in the converter, which consumes the response output items directly:

def convert_model_output(raw: RawModelOutput) -> OpenResponsesModelOutput:
    return OpenResponsesConverter().build(raw.output, raw.conversation_id, raw.usage)

The Open Responses converter

build walks the output items: each mcp_call becomes a function_call + function_call_output pair, the assistant message text is joined into a single reply, and mcp_list_tools (and other informational items) are skipped. The assistant message is appended last and carries the conversation_id to round-trip:

class OpenResponsesConverter:
    def build(
        self,
        output_items: list[dict[str, Any]],
        conversation_id: str = "",
        usage: dict[str, Any] | None = None,
        **kwargs: Any,
    ) -> OpenResponsesModelOutput:
        items: list[TraceItem] = []
        answer_chunks: list[str] = []

        for item in output_items:
            item_type = item.get("type")
            if item_type == "mcp_call":
                call_id = item["id"]
                items.append(self.build_function_call(item, call_id))
                items.append(self.build_function_call_output(item, call_id))
            elif item_type == "message" and item.get("role") == "assistant":
                chunk = self._join_output_text(item.get("content"))
                if chunk:
                    answer_chunks.append(chunk)

        items.append(
            self.build_assistant_message("".join(answer_chunks), conversation_id)
        )

        return OpenResponsesModelOutput(items=items, usage=self.build_usage(usage or {}))

The per-item builders construct each trace item. The conversation_id to round-trip is attached to the assistant message as an extra field:

    def build_function_call(self, item: dict[str, Any], call_id: str) -> FunctionCall:
        return FunctionCall(
            id=str(uuid.uuid4()),
            call_id=call_id,
            name=item.get("name") or "",
            arguments=item.get("arguments") or "{}",
            status=FunctionCallStatus.completed,
        )

    def build_function_call_output(
        self, item: dict[str, Any], call_id: str
    ) -> FunctionCallOutput:
        return FunctionCallOutput(
            id=str(uuid.uuid4()),
            call_id=call_id,
            output=self._mcp_call_output(item),
            status=FunctionCallOutputStatusEnum.completed,
        )

    def build_assistant_message(self, text: str, conversation_id: str) -> Message:
        return Message(
            id=str(uuid.uuid4()),
            status=MessageStatus.completed,
            role=MessageRole.assistant,
            content=[OutputTextContent(text=text, annotations=[])],
            # Carried as an extra field so the next turn can reuse the conversation.
            conversation_id=conversation_id,
        )
Flattening MCP content (_join_output_text, _mcp_call_output)

Foundry messages carry content as a list of typed blocks, and an mcp_call carries a stringified output or an error. These helpers flatten them into the single strings the Open Responses items expect:

  @staticmethod
  def _join_output_text(content_blocks: list[dict[str, Any]] | None) -> str:
      if not content_blocks:
          return ""
      parts: list[str] = []
      for block in content_blocks:
          if isinstance(block, dict) and block.get("type") == "output_text":
              parts.append(block.get("text") or "")
      return "".join(parts)

  @staticmethod
  def _mcp_call_output(item: dict[str, Any]) -> str:
      output = item.get("output")
      if isinstance(output, str) and output:
          return output
      error = item.get("error")
      if error:
          return json.dumps({"error": error})
      if output is None:
          return ""
      return json.dumps(output)
Example Open Responses output (convert_model_output)

The converted result returned to AI GO! — tool call, tool output, and the final assistant message carrying the conversation_id for the next turn:

{
  "items": [
    { "type": "function_call", "id": "5bf9...", "call_id": "mcp_01...",
      "name": "search_products", "arguments": "{\"query\": \"shoes\"}",
      "status": "completed" },
    { "type": "function_call_output", "id": "9da4...",
      "call_id": "mcp_01...", "output": "[]", "status": "completed" },
    { "type": "message", "id": "44c2...", "status": "completed",
      "role": "assistant",
      "content": [{ "type": "output_text",
                    "text": "I couldn't find any shoes in the catalog...",
                    "annotations": [] }],
      "conversation_id": "conv_01ABcDeFgHiJkLmNoPqRsTuV" }
  ],
  "usage": { "num_prompt_tokens": 201, "num_completion_tokens": 22 }
}

Step 3: Wire up the model

Three small files connect the snippet to AI GO!.

model.yaml

The model uses connection_type: custom_inference with the identity chat-completion adapter (the snippet already returns Open Responses, so no adapter transform is needed). The endpoint and agent name are plain config; the service-principal credentials are stored as server-side secrets:

display_name: "Customer Support Agent (Azure AI Foundry)"
key: "customer-support-agent-azure-foundry"
description: >-
  A customer-support agent deployed on Azure AI Foundry, driven via its
  OpenAI-compatible Responses API.
rate_limit: 15
task: "chat_completion"
config:
  connection_type: "custom_inference"
  adapter:
    key: "latticeflow$identity_chat_completion"
  run_inference_snippet: !include "./run_inference.py"
  environment:
    AZURE_AI_PROJECT_ENDPOINT: $AZURE_AI_PROJECT_ENDPOINT
    AZURE_FOUNDRY_AGENT_NAME: $AZURE_FOUNDRY_AGENT_NAME
    # DefaultAzureCredential reads the service-principal env vars below
    # (EnvironmentCredential). Create one with `az ad sp create-for-rbac`
    # and grant it `Azure AI User` on the Foundry project.
    AZURE_TENANT_ID: "<< secrets.AZURE_TENANT_ID >>"
    AZURE_CLIENT_ID: "<< secrets.AZURE_CLIENT_ID >>"
    AZURE_CLIENT_SECRET: "<< secrets.AZURE_CLIENT_SECRET >>"
  timeout: 120
secrets:
  AZURE_TENANT_ID: $AZURE_TENANT_ID
  AZURE_CLIENT_ID: $AZURE_CLIENT_ID
  AZURE_CLIENT_SECRET: $AZURE_CLIENT_SECRET
  • !include "./run_inference.py" inlines the snippet at registration time.
  • << secrets.* >> references server-side secrets; the secrets block uploads them from your .env.
  • rate_limit: 15 keeps concurrency moderate — lower it if you see throttling or timeouts.

app.yaml

The model needs an app to live in:

display_name: "Azure AI Foundry App"
key: "azure-foundry-app"
tags: ["Agents", "Azure"]
description: >
  Custom-inference integration for an Azure AI Foundry agent, exposing its
  OpenAI-compatible Responses API as an AI GO! model with Open Responses traces.

.env

AZURE_AI_PROJECT_ENDPOINT=https://<your-resource>.services.ai.azure.com/api/projects/<project>
AZURE_FOUNDRY_AGENT_NAME=<your-agent-name>
AZURE_TENANT_ID=<sp-tenant-id>
AZURE_CLIENT_ID=<sp-client-id>
AZURE_CLIENT_SECRET=<sp-client-secret>

Step 4: Register and test

# Create and switch to the app
lf add app -f app.yaml
lf switch azure-foundry-app

# Register the model (uploads the secrets and inlines run_inference.py)
lf add model -f model.yaml

# Verify the agent is reachable and returns well-formed Open Responses
lf test model customer-support-agent-azure-foundry

lf test model sends a single "Hello!" turn and shows each pipeline stage. A successful run ends with the parsed Open Responses output:

3. Running inference.
   Status code: 200
4. Transforming model output.
   {"items":[{"id":"...","status":"completed","role":"assistant",
              "content":[{"text":"Hello! How can I help today?","annotations":[]}],
              "conversation_id":"conv_baf9205d..."}],
    "usage":{"num_completion_tokens":12,"num_prompt_tokens":424}}
   ...
   output="items=[Message(type='message', ..., role=<MessageRole.assistant>,
            content=[OutputTextContent(text='Hello! How can I help today?', ...)],
            conversation_id='conv_baf9205d...')] usage=ModelUsage(...)"
Successfully tested configuration of model with key 'customer-support-agent-azure-foundry'.

Two things confirm the integration is correct:

  • The assistant reply parses as a Message with role=assistant.
  • A conversation_id is present on the message — the value the next turn reads back to continue the same Foundry conversation.

Your model is now registered and can be pointed at any multi-turn evaluation.


Adapting this pattern

The integration generalizes to any stateful agent runtime.

Different agent platforms

Swap the SDK calls in query_model for your platform's API and adjust the converter to read its output shape. Keep the contract identical: send only the new user turn + a conversation id, return the current turn's trace items.

Single-turn agents

If your agent is stateless, drop the conversation_id machinery entirely: convert_user_input returns just the user message, query_model makes one call, and build_assistant_message omits the conversation_id extra field.

Carrying state other than a conversation id

This echo mechanism works for any opaque state. Whatever you attach to the assistant message (a session token, a cursor, a serialized memory blob) comes back on the next request — read it in convert_user_input and forward it to your endpoint.

Plugging into an evaluation

Because the snippet emits full traces (tool calls, outputs, and replies), the model drops straight into trace-aware evaluations — multi-turn solvers, function-call-coverage scorers, or model-as-a-judge scorers over open_responses traces. Point a task_specification at this model key and run lf run -f run.yaml.