Short version: expose OpenHands as a very capable OpenAI-compatible “model”. The client sends a normal Chat Completions request; the agent-server resolves a profile, runs or resumes an OpenHands conversation, and returns the final assistant text in OpenAI shape. Keep tool calls internal unless a later client really needs them.
The OpenAI protocol is the integration surface many products already understand. Chat UIs, voice systems, IDE extensions, evaluation harnesses, and other agents can all point at a base URL and API key. If the OpenHands agent-server exposes that surface, these clients get a software agent rather than a bare model without bespoke adapters.
The gateway should stay a thin ingress layer over the existing agent-server. It should not become a second runtime, a second settings system, or a copy of the native conversation APIs.
flowchart LR
C[OpenAI-compatible client] -->|Bearer key| R[/v1 gateway router/]
R --> M[model id to LLM profile]
R --> S[session or conversation resolver]
M --> A[Agent settings + LLM]
S --> V[OpenHands conversation]
A --> V
V --> T[Agent loop: tools, browser, terminal, files]
T --> F[Final assistant response]
F --> O[OpenAI response shape]
| Surface | End-goal behavior | Why |
|---|---|---|
GET /v1/models | List saved profiles as openhands_{profile}. | OpenAI clients need a model picker, and profiles are already the user-facing LLM configuration unit. |
POST /v1/chat/completions | Default to one-shot agent runs; optionally resume a server conversation through X-OpenHands-ServerConversation-ID. | Maximizes compatibility while preserving real OpenHands history when a caller opts in. |
POST /v1/responses | Map previous_response_id to an OpenHands conversation when enough clients need it. | Responses is the more natural stateful OpenAI API, but it is less urgent than Chat Completions compatibility. |
| Streaming | Stream final assistant text first; internal tool-call streaming is optional and very low priority. | Voice and chat clients mainly need low perceived latency, not full agent internals. |
| Native APIs | Keep using existing conversation, event, profile, settings, and auth services underneath. | One source of truth and less long-term maintenance. |
No OpenHands conversation header. The server creates a short-lived conversation from the latest user message, runs the agent, returns the final text, and cleans up.
Best for existing chat widgets, eval jobs, and simple “ask the agent” integrations.
The caller sends X-OpenHands-ServerConversation-ID. The server appends the latest user message to that real conversation and returns the next final response.
Best for voice calls, Slack threads, IDE sessions, and any client that can keep a stable thread id.
sequenceDiagram
participant Client as OpenAI client
participant Gateway as "/v1/chat/completions"
participant Conv as ConversationService
participant Agent as OpenHands agent
Client->>Gateway: messages + model openhands_work
Gateway->>Gateway: resolve work profile
alt no conversation header
Gateway->>Conv: start ephemeral conversation
else X-OpenHands-ServerConversation-ID
Gateway->>Conv: load existing conversation
Gateway->>Conv: send latest user message
end
Conv->>Agent: run agent loop
Agent-->>Conv: final response + metrics
Conv-->>Gateway: state and final text
Gateway-->>Client: chat.completion + conversation id header
| OpenAI field | OpenHands mapping |
|---|---|
model | openhands_{profile_name} resolves to an LLMProfileStore entry. |
messages[*].role = system | Append as an agent context suffix, preserving the base OpenHands identity and safety prompt. |
Latest user message | Becomes the OpenHands user message to run. |
| Prior assistant/user history | Ignored for one-shot requests; represented by real events when the conversation id header is used. |
choices[0].message.content | The agent final response text. |
usage | Conversation metrics where available; otherwise honest zeroes rather than invented numbers. |
Compatibility boundary: this is not a promise to emulate every OpenAI edge case. The useful contract is: common OpenAI SDK clients can list a model, send text or image content, receive a normal assistant response, and optionally continue a server-owned conversation.
response_id and previous_response_id to OpenHands conversations so stateful OpenAI clients can use their native flow.Hermes proves the pattern, but its gateway adapter grew into a broad platform server with sessions, streaming, cron APIs, platform delivery, and many adapter-specific concerns. OpenHands already has a dedicated agent-server with profile, settings, conversation, event, and workspace services. The OpenAI layer should reuse those seams and stay small.
Design bias: optimize for the most common path first: “I have an OpenAI-compatible client; give me a base URL, API key, and model id that calls an OpenHands agent.” Everything else should be incremental.