Integrate Stateful Model Endpoint

This guide explains how to integrate a stateful model endpoint into AI GO! by creating a custom model adapter capable of managing the state across multiple interactions.

Stateful Endpoints

Stateful endpoints keep information from previous interactions. This sequence of interactions and its state can be referenced by some identifier.

For example, OpenAI endpoints /chat/completions and /responses show the difference between a stateless and a stateful endpoint:

  • stateless /chat/completions (link): The entire conversation history has to be submitted at each turn of the conversation. The result is the new assistant message.
  • stateful /responses (link): The previous response ID and only the new user messages have to be submitted at each turn of the conversation. The result is the new assistant message and a new response ID, which can be used to continue this conversation from this point.
ℹ️

Models that are compatible with the stateful OpenAI Responses API can directly use the built-in latticeflow$openai_responses adapter in which case there is no need to follow this guide.

Breakdown of a Simple Stateful Model Adapter

We explain how a stateful model adapter can be defined and highlight the sections that implement the correct handling of the state. For the clarity of the explanation, we make the following simplifying assumptions:

  • The conversation starts with a single user message (and no system message).
  • In each turn of the conversation, the user and the assistant produce exactly one message.
  • In the API response, the last output will contain the assistant response (and not some reasoning, tool call, etc.).

See the section below for a real-world example.

Input Adapter

The input adapter always receives all previous messages from the solver. This includes all past system, user, and assistant messages. To construct the next request, we have to find the previous request ID (if any), as well as identify which messages are new in this turn of the conversation.

Due to our simplifying assumptions, the structure of the conversation will always be either:

  • new conversation: [{"role": "user", ...}]
  • existing conversation: [{"role": "user", ...}, ..., {"role": "assistant", ...}, {"role": "user", ...}]
  1. Send the previous response ID by retrieving the ID of the last assistant response. This is skipped if we are at the start of a new conversation (indicated by there being only a single message), as in this case there is no previous response ID.
    {
      ...
      {% if input.messages|length > 1 %}
      "previous_response_id": {{ input.messages[-2].response_id | tojson }},
      {% endif %}
      ...
    }
  2. Send the new messages. In our case this is just the latest message.
    {
      ...
      "input": [
        {
          "role": "{{ input.messages[-1].role }}",
          "content": {{ input.messages[-1].content | tojson }}
        }
      ]
    }
    
Full Implementation
{
  "model": "{{ model_info.model_key }}",
  {% if input.messages|length > 1 %}
  "previous_response_id": {{ input.messages[-2].response_id | tojson }},
  {% endif %}
  "input": [
    {
      "role": "{{ input.messages[-1].role }}",
      "content": {{ input.messages[-1].content | tojson }}
    }
  ]
}

Output Adapter

Upon receiving the endpoint response, we extract the assistant response and include the response ID as part of the message. It is essential that this ID is included as part of the message, as this makes it available to the input adapter when handling the next request.

{
  "choices": [
    {
      "message": {
        "response_id": {{ response.id | tojson }},
        ..
      }
    }
  ]
}
Full Implementation
{% set response = body | fromjson %}
{% set message = response.output | selectattr("type", "equalto", "message") | list | last %}
{
  "choices": [
    {
      "message": {
        "response_id": {{ response.id | tojson }},
        "role": "assistant",
        "content": {{ (message.content | last).text | tojson }}
      }
    }
  ]
}

Testing the Adapter

  1. Test the model as usual, as described here.
  2. Check that the state is passed correctly between turns by verifying that the model can recall information from previous messages. The conversation memory evaluation can be used for this.

Breakdown of OpenAI Responses Adapter

We use the OpenAI Responses endpoint to demonstrate how a real-world model adapter can be implemented. A similar approach can be used for other stateful endpoints.

Input Adapter

The input adapter always receives all previous messages from the solver. This includes all past system, user, and assistant messages. To construct the next request, we have to find the previous request ID (if any), as well as identify which messages are new in this turn of the conversation.

  1. Find the last assistant response, if any. This will be helpful for the next two steps. If this is the beginning of a new conversation, there will be no such response.
    {% set ns = namespace(last_response_index=none) %}
    {% for message in input.messages %}
      {% if message.response_id is defined %}
        {% set ns.last_response_index = loop.index0 %}
      {% endif %}
    {% endfor %}
  2. Pass the previous response ID as part of the request body. We extract this from the last assistant response (identified in Step 1). If this is the beginning of a new conversation, we do not pass this field.
    {% if ns.last_response_index is not none %}
    "previous_response_id": {{ input.messages[ns.last_response_index].response_id | tojson }},
    {% endif %}
  3. Pass the new messages as part of the request body; specifically all messages that follow the last assistant response (identified in Step 1). If this is the beginning of a new conversation, we send all messages.
    "input": [
      {% for message in input.messages[ns.last_response_index + 1 if ns.last_response_index is not none else 0:] %}
      ...
      {% endfor %}
    ]
Full Implementation
{# Step 1 #}
{% set ns = namespace(last_response_index=none) %}
{% for message in input.messages %}
  {% if message.response_id is defined %}
    {% set ns.last_response_index = loop.index0 %}
  {% endif %}
{% endfor %}

{
  "model": "{{ model_info.model_key }}",
  {# Step 2 #}
  {% if ns.last_response_index is not none %}
  "previous_response_id": {{ input.messages[ns.last_response_index].response_id | tojson }},
  {% endif %}
  "input": [
    {# Step 3 #}
    {% for message in input.messages[ns.last_response_index + 1 if ns.last_response_index is not none else 0:] %}
    {
      "role": "{{ message.role }}",
      "content": {{ message.content | tojson }}
    }{% if not loop.last %},{% endif %}
    {% endfor %}
  ]
  {% if input.response_format is defined and input.response_format is not none %}
  ,"text": {"format": {{ input.response_format | tojson }}}
  {% endif %}
  {% for key, value in kwargs.items() %}
  ,"{{ key }}": {{ value | tojson }}
  {% endfor %}
}

Output Adapter

Upon receiving the endpoint response, we extract the assistant response and include the response ID as part of the message. It is essential that this ID is included as part of the message, as this makes it available to the input adapter when handling the next request.

{
  "choices": [
    {
      "message": {
        {# Include the response ID #}
        {% if response.id is defined %}
        "response_id": {{ response.id | tojson }},
        {% endif %}
        ...
      }
    }
  ]
}
Full Implementation
{% if status_code != 200 %}
{"error": "A non-200 status code ({{ status_code }}) was returned by the model."}
{% else %}
{% set response = body | fromjson %}

{% set output_text_contents = [] %}
{% set refusal_contents = [] %}
{% for message in response.output | selectattr("type", "equalto", "message") %}
{% set _ = output_text_contents.extend(message.content | selectattr("type", "equalto", "output_text")) %}
{% set _ = refusal_contents.extend(message.content | selectattr("type", "equalto", "refusal")) %}
{% endfor %}

{
"choices": [
  {
    "message": {
      {% if response.id is defined %}
      "response_id": {{ response.id | tojson }},
      {% endif %}
      {# The Responses API guarantees that the role is assistant for all outputs. #}
      "role": "assistant",
      "content": {{ (output_text_contents | map(attribute="text") | join(" ")) | tojson }}
      {% if refusal_contents | length > 0 %}
      ,"refusal": {{ (refusal_contents | map(attribute="refusal") | join(" ")) | tojson }}
      {% endif %}
    }
  }
]
{% if response.usage is defined and response.usage is not none %}
,"usage": {
  "num_prompt_tokens": {{ response.usage.input_tokens }},
  "num_completion_tokens": {{ response.usage.output_tokens }}
}
{% endif %}
}
{% endif %}