← Home

OpenHands OpenAI-Compatible Gateway architecture note

End-goal design for OpenHands/software-agent-sdk issue #3540 and the first gateway PR: make the full OpenHands agent runtime callable from clients that already speak the OpenAI API. Drafted from the issue discussion, linked PR, and the Hermes gateway precedent, June 2026.

Short version: expose OpenHands as a very capable OpenAI-compatible “model”. The client sends a normal Chat Completions request; the agent-server resolves a profile, runs or resumes an OpenHands conversation, and returns the final assistant text in OpenAI shape. Keep tool calls internal unless a later client really needs them.

Why this matters

The OpenAI protocol is the integration surface many products already understand. Chat UIs, voice systems, IDE extensions, evaluation harnesses, and other agents can all point at a base URL and API key. If the OpenHands agent-server exposes that surface, these clients get a software agent rather than a bare model without bespoke adapters.

Open WebUI LibreChat AnythingLLM ElevenLabs IDE extensions eval harnesses agent-to-agent

End-state shape

The gateway should stay a thin ingress layer over the existing agent-server. It should not become a second runtime, a second settings system, or a copy of the native conversation APIs.

flowchart LR
  C[OpenAI-compatible client] -->|Bearer key| R[/v1 gateway router/]
  R --> M[model id to LLM profile]
  R --> S[session or conversation resolver]
  M --> A[Agent settings + LLM]
  S --> V[OpenHands conversation]
  A --> V
  V --> T[Agent loop: tools, browser, terminal, files]
  T --> F[Final assistant response]
  F --> O[OpenAI response shape]

Surface	End-goal behavior	Why
`GET /v1/models`	List saved profiles as `openhands_{profile}`.	OpenAI clients need a model picker, and profiles are already the user-facing LLM configuration unit.
`POST /v1/chat/completions`	Default to one-shot agent runs; optionally resume a server conversation through `X-OpenHands-ServerConversation-ID`.	Maximizes compatibility while preserving real OpenHands history when a caller opts in.
`POST /v1/responses`	Map `previous_response_id` to an OpenHands conversation when enough clients need it.	Responses is the more natural stateful OpenAI API, but it is less urgent than Chat Completions compatibility.
Streaming	Stream final assistant text first; internal tool-call streaming is optional and very low priority.	Voice and chat clients mainly need low perceived latency, not full agent internals.
Native APIs	Keep using existing conversation, event, profile, settings, and auth services underneath.	One source of truth and less long-term maintenance.

Request modes

One-shot compatibility

No OpenHands conversation header. The server creates a short-lived conversation from the latest user message, runs the agent, returns the final text, and cleans up.

Best for existing chat widgets, eval jobs, and simple “ask the agent” integrations.

Server-owned memory

The caller sends X-OpenHands-ServerConversation-ID. The server appends the latest user message to that real conversation and returns the next final response.

Best for voice calls, Slack threads, IDE sessions, and any client that can keep a stable thread id.

sequenceDiagram
  participant Client as OpenAI client
  participant Gateway as "/v1/chat/completions"
  participant Conv as ConversationService
  participant Agent as OpenHands agent

  Client->>Gateway: messages + model openhands_work
  Gateway->>Gateway: resolve work profile
  alt no conversation header
    Gateway->>Conv: start ephemeral conversation
  else X-OpenHands-ServerConversation-ID
    Gateway->>Conv: load existing conversation
    Gateway->>Conv: send latest user message
  end
  Conv->>Agent: run agent loop
  Agent-->>Conv: final response + metrics
  Conv-->>Gateway: state and final text
  Gateway-->>Client: chat.completion + conversation id header

Data mapping

OpenAI field	OpenHands mapping
`model`	`openhands_{profile_name}` resolves to an `LLMProfileStore` entry.
`messages[*].role = system`	Append as an agent context suffix, preserving the base OpenHands identity and safety prompt.
Latest `user` message	Becomes the OpenHands user message to run.
Prior assistant/user history	Ignored for one-shot requests; represented by real events when the conversation id header is used.
`choices[0].message.content`	The agent final response text.
`usage`	Conversation metrics where available; otherwise honest zeroes rather than invented numbers.

Compatibility boundary: this is not a promise to emulate every OpenAI edge case. The useful contract is: common OpenAI SDK clients can list a model, send text or image content, receive a normal assistant response, and optionally continue a server-owned conversation.

Implementation ladder

Land the thin Chat Completions router. Non-streaming, profile-backed models, Bearer session auth, text/image input, final text output, and conversation id reuse.
Make it easy to try. Add docs and examples for OpenAI SDK clients, Open WebUI-style base URL setup, voice webhook usage, and curl smoke tests. Starts with docs PR 554.
Improve operational fit. Add predictable timeouts, clearer errors, cleanup policies, request tracing, and metrics that help users debug a gateway run.
Stream final assistant text. Start with final-message deltas or coarse progress that ordinary clients can display. Do not expose tool internals until a concrete client needs them.
Add Responses API. Map response_id and previous_response_id to OpenHands conversations so stateful OpenAI clients can use their native flow.
Only then consider tool-call streaming. It is useful for specialist UIs, but it adds protocol ambiguity and is not required for the mainstream integrations this unlocks.

Comparison with Hermes

Hermes proves the pattern, but its gateway adapter grew into a broad platform server with sessions, streaming, cron APIs, platform delivery, and many adapter-specific concerns. OpenHands already has a dedicated agent-server with profile, settings, conversation, event, and workspace services. The OpenAI layer should reuse those seams and stay small.

Design bias: optimize for the most common path first: “I have an OpenAI-compatible client; give me a base URL, API key, and model id that calls an OpenHands agent.” Everything else should be incremental.

Open questions

What should the public docs recommend as the canonical conversation id strategy for Slack threads, voice calls, browser chats, and IDE sessions?
Should conversation reuse require an existing conversation id, or should the gateway create deterministic ids from a caller-provided external thread key?
Which clients should be certified first: OpenAI Python SDK, curl, Open WebUI, LibreChat, ElevenLabs, and one IDE extension are likely enough.
How much native event metadata, if any, should be exposed outside the normal assistant message?