Claude Managed Agents
Just want to use the integration? If you only need to plug a Claude Managed Agents deployment into AI GO!, you can use the ready-made integration directly — see the full integration in our registry and GitHub repo:
<REGISTRY_URL>/<GITHUB_URL>. This tutorial walks through how that integration is built so you can adapt it to your own stateful agent.
This tutorial demonstrates how to connect a Claude Managed Agents deployment to AI GO! as a custom-inference model, so it can be evaluated like any other model — while preserving conversation state across turns.
The hard part of integrating a stateful agent is not calling the endpoint once; it is making multi-turn conversations work. Claude Managed Agents keeps conversation state server-side in a session, and AI GO! drives the conversation one user turn at a time. The integration must keep both sides in sync so that turn 3 remembers what happened in turns 1 and 2.
This pattern applies to any stateful agent runtime — Claude Managed Agents, LangGraph, a custom agent server — where the conversation lives behind a session identifier and each call returns a multi-step trace.
What you will build
A complete custom-inference model integration that:
- Forwards each user turn to a Claude Managed Agents deployment via the Anthropic SDK (
client.beta.sessions) - Maintains conversation continuity across turns by round-tripping the Managed Agents
session_id - Returns the full per-turn agent trace — MCP tool calls, tool outputs, and the final reply — in AI GO!'s Open Responses format, ready for trace-aware scorers
By the end, you will have a model you can register, test, and point any multi-turn evaluation at.
Step 1: Understand the integration
The multi-turn challenge
AI GO! sends a chat-completion request for every turn. The body contains the whole conversation so far; the last entry is the current user turn:
{
"messages": [
{ "role": "user", "content": "Hi, can you help me see my orders?" },
{ "role": "assistant", "content": "Sure! What's your email and order ID?",
"session_id": "sesn_01ABcDeFgHiJkLmNoPqRsTuV" },
{ "role": "user", "content": "[email protected], order ORD-1001" }
]
}Claude Managed Agents, however, does not want the whole history replayed — it already has it stored in the session. It only wants the new user message, plus the session_id that identifies the conversation.
The mechanism for keeping both sides in sync is passing the session_id through the conversation itself. AI GO! messages allow extra fields, and any custom field we set on an assistant message is echoed back unchanged on the next turn — this is the supported way to carry custom data between requests. So on each response we attach the session_id to the assistant message; on the next request we read it back and reuse the same session. The first turn (no prior assistant message) creates a fresh session.
Example request bodies (AI GO! → run_inference)
First turn — just the new user message:
{
"messages": [
{ "role": "user", "content": "Search the catalog for shoes." }
]
}Subsequent turns echo the prior assistant message with its session_id, so we keep using the same Managed Agents session:
{
"messages": [
{ "role": "user", "content": "Hello!" },
{ "role": "assistant", "content": "Hello! How can I assist you today?",
"session_id": "sesn_01ABcDeFgHiJkLmNoPqRsTuV" },
{ "role": "user", "content": "What is my session id?" }
]
}The three-stage pipeline
A single inference runs as a three-stage pipeline:
ChatCompletionInput -> ModelInput -> RawModelOutput -> OpenResponsesModelOutput
(from AI GO!) convert_ query_model convert_model_output
user_input
run_inference ties the stages together: parse the request, convert it to what the session API accepts, drive the session, and convert the resulting event stream back into the Open Responses format AI GO! expects.
The output shape
Claude Managed Agents drives the turn as a stream of session events — tool calls, tool results, partial messages, and span/usage markers. We translate the events produced by the current turn into Open Responses trace items:
| Type | Description |
|---|---|
message | A text message (here, the assistant's reply) |
function_call | An MCP tool the agent invoked (name, arguments) |
function_call_output | The result of a tool call, linked by call_id |
Emitting these items (rather than just the final text) is what lets downstream trace-aware scorers inspect how the agent reached its answer.
Step 2: Build the inference handler
The integration lives in a single Python file, run_inference.py, that defines a run_inference(body, environment) function. AI GO! calls it once per turn.
The entry point
run_inference is the function AI GO! calls. It wires the three stages together — parse and convert the request, drive the session, and convert the resulting events back into the Open Responses format AI GO! expects:
def run_inference(body: str, environment: dict[str, Any]) -> str:
model_input = convert_user_input(
ChatCompletionInput.model_validate(json.loads(body))
)
raw = query_model(model_input, environment)
model_output = convert_model_output(raw)
return model_output.model_dump_json()The rest of this step implements each stage in turn.
Model-side types
We model the two intermediate payloads explicitly so the data flow stays legible:
class ModelInput(BaseModel):
"""Request payload the Managed Agents session accepts."""
session_id: str = "" # empty on the first turn, reused afterwards
user_message: str
class RawModelOutput(BaseModel):
"""The collected session events, plus the session id we used."""
session_id: str
events: listStage 1 — convert_user_input
convert_user_inputThis is where multi-turn continuity is established. We pull the latest user message and recover the session_id echoed by the previous assistant turn:
def convert_user_input(data: ChatCompletionInput) -> ModelInput:
messages = data.messages
last_user = next(m for m in reversed(messages) if m.role == "user")
session_id = ""
for msg in reversed(messages):
if msg.role == "assistant":
session_id = getattr(msg, "session_id", "") or ""
break
return ModelInput(
session_id=session_id, user_message=_message_text(last_user.content)
)On the first turn there is no prior assistant message, so session_id stays empty — signalling that a new session must be created.
Stage 2 — query_model
query_modelCreate a session on the first turn, send the new user message as a user.message event, then stream the session events until the turn goes idle:
def query_model(model_input: ModelInput, environment: dict[str, Any]) -> RawModelOutput:
client = anthropic.Anthropic(api_key=environment["ANTHROPIC_API_KEY"])
session_id = model_input.session_id
if not session_id:
session_id = _create_session(client, environment)
events: list[dict[str, Any]] = []
with client.beta.sessions.events.stream(session_id) as stream:
client.beta.sessions.events.send(
session_id,
events=[
{
"type": "user.message",
"content": [{"type": "text", "text": model_input.user_message}],
}
],
)
for event in stream:
events.append(_serialise_event(event))
if event.type == "session.status_idle":
stop_reason = getattr(event, "stop_reason", None)
stop_type = getattr(stop_reason, "type", None)
if stop_type in _TERMINAL_STOP_REASONS:
break
for thread_id in _collect_subthread_ids(events):
events.extend(_fetch_subthread_tool_events(client, session_id, thread_id))
return RawModelOutput(session_id=session_id, events=events)We send only the new user message — the agent appends it to the session it already holds. The Anthropic SDK auto-attaches the anthropic-beta: managed-agents-... header for any call through client.beta.sessions.*.
Coordinator topologies. A multi-agent (coordinator) deployment runs its tool calls on a spawned specialist's subthread, not on the primary stream. After the turn goes idle, we collect any referenced subthread ids and pull their tool-call events so the trace reflects what actually ran. For single-agent deployments this is a no-op.
Session helpers (_create_session, subthread enrichment)
def _create_session(client: anthropic.Anthropic, environment: dict[str, Any]) -> str:
session = client.beta.sessions.create(
agent=environment["CLAUDE_AGENT_ID"],
environment_id=environment["CLAUDE_ENVIRONMENT_ID"],
vault_ids=[environment["CLAUDE_VAULT_ID"]],
)
return session.id
def _serialise_event(event: Any) -> dict[str, Any]:
"""Convert an SDK event model to a plain dict for the raw payload."""
return event.model_dump(mode="json")
def _collect_subthread_ids(events: list[dict[str, Any]]) -> list[str]:
"""Return all subthread ids referenced from the primary thread, in order."""
seen: list[str] = []
for event in events:
thread_id = (
event.get("session_thread_id")
or event.get("to_session_thread_id")
or event.get("from_session_thread_id")
)
if thread_id and thread_id not in seen:
seen.append(thread_id)
return seen
def _fetch_subthread_tool_events(
client: anthropic.Anthropic, session_id: str, thread_id: str
) -> list[dict[str, Any]]:
"""List a subthread's tool-call events (use + result) for trace enrichment."""
return [
_serialise_event(event)
for event in client.beta.sessions.threads.events.list(
thread_id, session_id=session_id
)
if getattr(event, "type", None) in _SUBTHREAD_TOOL_EVENT_TYPES
]Example raw model output (Managed Agents → query_model)
query_model returns the collected event stream plus the session_id we used, so convert_model_output can echo it back:
{
"session_id": "sesn_01ABcDeFgHiJkLmNoPqRsTuV",
"events": [
{ "type": "agent.mcp_tool_use", "id": "evt_01...",
"name": "search_products", "input": { "query": "shoes" },
"mcp_server_name": "retail-tools" },
{ "type": "agent.mcp_tool_result", "id": "evt_02...",
"mcp_tool_use_id": "evt_01...",
"content": [{ "type": "text", "text": "[]" }],
"is_error": false },
{ "type": "agent.message", "id": "evt_03...",
"content": [{ "type": "text",
"text": "I couldn't find any shoes in the catalog..." }] },
{ "type": "span.model_request_end", "id": "evt_04...",
"model_usage": { "input_tokens": 201, "output_tokens": 22 } },
{ "type": "session.status_idle", "id": "evt_05...",
"stop_reason": { "type": "end_turn" } }
]
}Stage 3 — convert_model_output
convert_model_outputThe conversion is a one-liner — all of the work lives in the converter, which consumes the event stream directly:
def convert_model_output(raw: RawModelOutput) -> OpenResponsesModelOutput:
return OpenResponsesConverter().build(raw.events, raw.session_id)The Open Responses converter
build walks the events in stream order: each MCP tool-use event becomes a function_call, each tool-result event becomes a function_call_output, the agent.message text chunks are joined, and per-request usage is summed. The assistant message is appended last and carries the session_id to round-trip:
class OpenResponsesConverter:
def build(
self, events: list[dict[str, Any]], session_id: str = "", **kwargs: Any
) -> OpenResponsesModelOutput:
items: list[TraceItem] = []
num_prompt_tokens = 0
num_completion_tokens = 0
answer_chunks: list[str] = []
for event in events:
event_type = event.get("type")
if event_type == "agent.mcp_tool_use":
items.append(self.build_function_call(event))
elif event_type == "agent.mcp_tool_result":
items.append(self.build_function_call_output(event))
elif event_type == "agent.message":
chunk = self._join_text_blocks(event.get("content"))
if chunk:
answer_chunks.append(chunk)
elif event_type == "span.model_request_end":
usage = event.get("model_usage") or {}
num_prompt_tokens += usage.get("input_tokens", 0) or 0
num_completion_tokens += usage.get("output_tokens", 0) or 0
items.append(self.build_assistant_message("".join(answer_chunks), session_id))
return OpenResponsesModelOutput(
items=items,
usage=self.build_usage(num_prompt_tokens, num_completion_tokens),
)The per-item builders construct each trace item. The session_id to round-trip is attached to the assistant message as an extra field:
def build_function_call(self, event: dict[str, Any]) -> FunctionCall:
return FunctionCall(
id=str(uuid.uuid4()),
call_id=event["id"],
name=event["name"],
arguments=json.dumps(event.get("input") or {}),
status=FunctionCallStatus.completed,
)
def build_function_call_output(self, event: dict[str, Any]) -> FunctionCallOutput:
return FunctionCallOutput(
id=str(uuid.uuid4()),
call_id=event["mcp_tool_use_id"],
output=self._tool_result_output(event),
status=FunctionCallOutputStatusEnum.completed,
)
def build_assistant_message(self, text: str, session_id: str) -> Message:
return Message(
id=str(uuid.uuid4()),
status=MessageStatus.completed,
role=MessageRole.assistant,
content=[OutputTextContent(text=text, annotations=[])],
# Carried as an extra field so the next turn can reuse the session.
session_id=session_id,
)Flattening MCP tool content (_join_text_blocks, _tool_result_output)
MCP messages carry content as a list of typed blocks. These helpers flatten the text blocks into the single strings the Open Responses items expect:
@staticmethod
def _join_text_blocks(blocks: list[dict[str, Any]] | None) -> str:
if not blocks:
return ""
parts: list[str] = []
for block in blocks:
if isinstance(block, dict) and block.get("type") == "text":
parts.append(block.get("text") or "")
return "".join(parts)
@classmethod
def _tool_result_output(cls, event: dict[str, Any]) -> str:
text = cls._join_text_blocks(event.get("content"))
if text:
return text
if event.get("is_error"):
return json.dumps({"error": True, "content": event.get("content")})
return json.dumps(event.get("content") or [])Example Open Responses output (convert_model_output)
The converted result returned to AI GO! — tool call, tool output, and the final assistant message carrying the session_id for the next turn:
{
"items": [
{ "type": "function_call", "id": "5bf9...", "call_id": "evt_01...",
"name": "search_products", "arguments": "{\"query\": \"shoes\"}",
"status": "completed" },
{ "type": "function_call_output", "id": "9da4...",
"call_id": "evt_01...", "output": "[]", "status": "completed" },
{ "type": "message", "id": "44c2...", "status": "completed",
"role": "assistant",
"content": [{ "type": "output_text",
"text": "I couldn't find any shoes in the catalog...",
"annotations": [] }],
"session_id": "sesn_01ABcDeFgHiJkLmNoPqRsTuV" }
],
"usage": { "num_prompt_tokens": 201, "num_completion_tokens": 22 }
}Step 3: Wire up the model
Three small files connect the snippet to AI GO!.
model.yaml
model.yamlThe model uses connection_type: custom_inference with the identity chat-completion adapter (the snippet already returns Open Responses, so no adapter transform is needed). The agent/environment/vault ids and the API key are injected via environment:
display_name: "Customer Support Agent (Claude Managed Agents)"
key: "customer-support-agent-claude"
description: >-
A customer-support agent deployed on Claude Managed Agents, driven via the
beta sessions API.
rate_limit: 5
task: "chat_completion"
config:
connection_type: "custom_inference"
adapter:
key: "latticeflow$identity_chat_completion"
run_inference_snippet: !include "./run_inference.py"
environment:
CLAUDE_AGENT_ID: $CLAUDE_AGENT_ID
CLAUDE_ENVIRONMENT_ID: $CLAUDE_ENVIRONMENT_ID
CLAUDE_VAULT_ID: $CLAUDE_VAULT_ID
ANTHROPIC_API_KEY: "<< secrets.ANTHROPIC_API_KEY >>"
timeout: 180
secrets:
ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY!include "./run_inference.py"inlines the snippet at registration time.<< secrets.ANTHROPIC_API_KEY >>references a server-side secret; thesecretsblock uploads it from$ANTHROPIC_API_KEYin your.env.rate_limit: 5keeps concurrency low — stateful agent sessions rarely tolerate high request volume. Lower it further if you see timeouts.
app.yaml
app.yamlThe model needs an app to live in:
display_name: "Claude Managed Agents App"
key: "claude-managed-agents-app"
tags: ["Agents", "Claude"]
description: >
Custom-inference integration for a Claude Managed Agents deployment, exposing
its session API as an AI GO! model with Open Responses traces..env
.envANTHROPIC_API_KEY=sk-ant-...
CLAUDE_AGENT_ID=agent_...
CLAUDE_ENVIRONMENT_ID=env_...
CLAUDE_VAULT_ID=vlt_...Step 4: Register and test
# Create and switch to the app
lf add app -f app.yaml
lf switch claude-managed-agents-app
# Register the model (uploads the secret and inlines run_inference.py)
lf add model -f model.yaml
# Verify the session API is reachable and returns well-formed Open Responses
lf test model customer-support-agent-claudelf test model sends a single "Hello!" turn and shows each pipeline stage. A successful run ends with the parsed Open Responses output:
3. Running inference.
Status code: 200
4. Transforming model output.
{"items":[{"id":"...","status":"completed","role":"assistant",
"content":[{"text":"Hi there! How can I help you today? ...","annotations":[]}],
"session_id":"sesn_015T6..."}],
"usage":{"num_completion_tokens":49,"num_prompt_tokens":6}}
...
output="items=[Message(type='message', ..., role=<MessageRole.assistant>,
content=[OutputTextContent(text='Hi there! How can I help you today? ...', ...)],
session_id='sesn_015T6...')] usage=ModelUsage(...)"
Successfully tested configuration of model with key 'customer-support-agent-claude'.
Two things confirm the integration is correct:
- The assistant reply parses as a
Messagewithrole=assistant. - A
session_idis present on the message — the value the next turn reads back to continue the same Managed Agents session.
Your model is now registered and can be pointed at any multi-turn evaluation.
Adapting this pattern
The integration generalizes to any stateful agent runtime.
Different agent platforms
Swap the SDK calls in query_model for your platform's API and adjust the converter to read its event/message shape. Keep the contract identical: send only the new user turn + a session id, return the current turn's trace items.
Single-turn agents
If your agent is stateless, drop the session_id machinery entirely: convert_user_input returns just the user message, query_model makes one call, and build_assistant_message omits the session_id extra field.
Carrying state other than a session id
This echo mechanism works for any opaque state. Whatever you attach to the assistant message (a session token, a cursor, a serialized memory blob) comes back on the next request — read it in convert_user_input and forward it to your endpoint.
Coordinator / multi-agent topologies
When an agent delegates to specialists, its tool calls happen on spawned subthreads. Collecting those subthread tool events (as query_model does here) keeps the emitted trace faithful to what actually ran, so trace-aware scorers see the real tool usage rather than just the coordinator's final reply.
Plugging into an evaluation
Because the snippet emits full traces (tool calls, outputs, and replies), the model drops straight into trace-aware evaluations — multi-turn solvers, function-call-coverage scorers, or model-as-a-judge scorers over open_responses traces. Point a task_specification at this model key and run lf run -f run.yaml.
