← Home

Hermes Agent Gateway architecture study

How Hermes Agent (NousResearch, 165K ☆) structures its multi-platform gateway, and what we can learn for OpenHands.
Studied from source, June 2026.

Repo Structure

Hermes is a monorepo. The core components, by size:

Component	Directory	Lines	What it does
CLI	`hermes_cli/`	118K	Setup, config, commands, proxy server, auth, profiles
Tools	`tools/`	67K	Browser, code execution, approval, TTS/STT, MCP, security
Agent	`agent/`	61K	Conversation loop, context compression, LLM adapters, credential pool
Gateway platforms	`gateway/platforms/`	50K	20 core platform adapters (see below)
Gateway core	`gateway/`	30K	Runner, sessions, delivery, hooks, config, stream dispatch
Plugin platforms	`plugins/platforms/`	17K	8 plugin-based adapters (Discord, Teams, IRC, etc.)
Skills	`skills/`	12K	GitHub, DevOps, creative, media, email
Cron	`cron/`	4K	Scheduled jobs and automation

The single largest file in the entire codebase is gateway/run.py at 19,911 lines — the GatewayRunner class that orchestrates everything.

Gateway Architecture

The gateway is a long-running process that connects to multiple messaging platforms through a unified adapter pattern. Each platform implements a BasePlatformAdapter with connect(), disconnect(), send(), and message handling.

flowchart TB
    subgraph Gateway ["GatewayRunner (gateway/run.py)"]
        direction TB
        MH["_handle_message()"]
        SC["Slash command\ndispatch"]
        AG["AIAgent\ncreation"]
        SS["SessionStore\n(SQLite)"]
        DC["Delivery +\nStream dispatch"]
    end

    subgraph Core ["Core platforms (gateway/platforms/)"]
        TG["Telegram\n6,081 lines"]
        SL["Slack\n3,519 lines"]
        WA["WhatsApp\n1,387 lines"]
        WH["Webhook\n934 lines"]
        AS["API Server\n4,257 lines"]
        EM["Email\n773 lines"]
        SIG["Signal\n1,543 lines"]
        MAT["Matrix\n2,983 lines"]
        MORE["+ 12 more"]
    end

    subgraph Plugins ["Plugin platforms (plugins/platforms/)"]
        DIS["Discord"]
        IRC["IRC"]
        TMS["Teams"]
        GC["Google Chat"]
        LN["LINE"]
        MM["Mattermost"]
        NTF["Ntfy"]
        SIM["SimplEx"]
    end

    TG & SL & WA & WH & AS & EM & SIG & MAT & MORE --> MH
    DIS & IRC & TMS & GC & LN & MM --> MH
    MH --> SC & AG
    AG --> SS
    AG --> DC

Two tiers of platforms

Hermes has a split architecture for platform adapters:

Core platforms (gateway/platforms/) — built-in, loaded directly by the gateway runner via _create_adapter(). Includes Telegram, Slack, WhatsApp, the API server, webhooks, and 15 others.
Plugin platforms (plugins/platforms/) — register via PlatformRegistry and are discovered at runtime. Includes Discord, Teams, IRC, Google Chat, LINE, Mattermost, Ntfy, SimplEx.

Both types inherit from BasePlatformAdapter (4,813 lines) which provides the shared interface: connection lifecycle, message guards, typing indicators, delivery helpers, interrupt handling, and text debouncing.

Message flow

sequenceDiagram
    participant P as Platform Adapter
    participant G as GatewayRunner
    participant S as SessionStore
    participant A as AIAgent

    P->>P: receive raw event
    P->>P: normalize to MessageEvent
    P->>P: active session guard
    alt agent running for this session
        P->>P: queue in _pending_messages
        P->>P: set interrupt event
    else
        P->>G: _handle_message(event)
        G->>G: resolve session key
        G->>G: check authorization
        alt slash command
            G->>G: dispatch command handler
        else user message
            G->>S: load/create session
            G->>A: create AIAgent + run_conversation
            A-->>G: final response
            G->>P: deliver response
        end
    end

Session keys

Every conversation is identified by a deterministic session key:

agent:main:{platform}:{chat_type}:{chat_id}

For example: agent:main:telegram:private:123456789. Thread-aware platforms include thread IDs in the chat_id. Keys are always constructed via build_session_key(), never manually.

Key Platforms Compared

Slack

	Hermes	SmolPaws
Library	`slack-bolt` (Python, async)	`@slack/bolt` (TypeScript)
Transport	Socket Mode	Socket Mode
Lines	3,519	686
Thread tracking	`_mentioned_threads` set — once mentioned, responds to all thread replies	Same pattern via `MentionedThreadTracker`
Auth	Allowlists, DM pairing, global allow-all	Allowlist + guest rate limiter
Streaming	Progressive message editing (update sent message in-place)	Not yet

The patterns are strikingly similar. Hermes has ~5x the code due to assistant threads, slash command handling, streaming via message editing, file uploads, and reaction management.

GitHub

	Hermes	SmolPaws
Architecture	Generic webhook adapter (934 lines) with configurable routes. No dedicated GitHub adapter.	Dedicated Cloudflare Worker (`apps/github/`, 1,718 lines)
Trigger	Webhook POST with HMAC validation per route	Webhook POST via Cloudflare Worker, @mention or own-thread detection
Delivery	Configurable: `github_comment`, or forward to another platform	Direct via `gh` CLI or API
GitHub skills	Rich skill set: PR workflow, code review, issues, repo management	Basic: comment on PRs/issues

Hermes treats GitHub as a webhook source, not a first-class platform. The webhook adapter is generic — it handles GitHub, GitLab, JIRA, Stripe, etc. through the same configurable route system. SmolPaws has a dedicated GitHub ingress with its own @mention logic and self-loop guards.

Discord

	Hermes	SmolPaws
Architecture	Plugin platform (`plugins/platforms/discord/`), `discord.py` library	Ingress app (`apps/discord/`, 642 lines), `discord.js`
Voice	Voice channel support with mixer	Not yet
Bot commands	Slash command sync with state tracking	Basic message handling

Discord is notable as a plugin platform in Hermes, not a core one. It uses the plugin registry to self-register, demonstrating the extensibility model.

The OpenAI-Compatible API Server

The key insight: Hermes exposes the agent as an OpenAI-compatible endpoint. Any client that speaks /v1/chat/completions can talk to the full agent runtime — tools, memory, skills, and all. From the caller's perspective, it looks like a very capable "model."

Built by Teknium (Nous Research founder) in March 2026. Three PR attempts: #828, #956, landed in #1756. The motivation was reach — the PR body listed star counts of OpenAI-compatible frontends that would instantly work:

Frontend	Stars
Open WebUI	126K
NextChat	87K
LobeChat	73K
AnythingLLM	56K
ChatBox	39K
LibreChat	34K

Since landing on March 17, the file grew from its initial implementation to 4,257 lines in under 3 months — streaming, Responses API, session management, CORS, security hardening, cron jobs API.

Endpoints

Method	Path	Purpose
POST	`/v1/chat/completions`	Stateless Chat Completions (opt-in session via header)
POST	`/v1/responses`	Stateful Responses API with `previous_response_id` chaining
GET	`/v1/models`	Lists the agent as an available model
POST	`/v1/runs`	Async execution with SSE event streaming
*	`/api/sessions/*`	Full session CRUD + chat + fork
GET	`/health`	Health check

How chat completions works

sequenceDiagram
    participant C as OpenAI Client
    participant AS as API Server Adapter
    participant A as AIAgent

    C->>AS: POST /v1/chat/completions
    AS->>AS: parse messages, extract system + user
    AS->>AS: derive session_id from fingerprint
    AS->>A: _run_agent(user_message, history, system_prompt)
    Note over A: full agent loop: tools, memory, skills
    A-->>AS: result + usage
    AS->>C: OpenAI-format response

Comparison with OpenHands PR #3545

	Hermes	OpenHands PR #3545
Lines	4,257	~530
Streaming	Full SSE with tool progress events	Not yet (returns 400)
Session reuse	`X-Hermes-Session-Id` header	`X-OpenHands-ServerConversation-ID` header
Responses API	Yes, with `previous_response_id`	Not yet (planned)
Auth	`API_SERVER_KEY` bearer token	`X-Session-API-Key` or `Authorization: Bearer`
Models	Single `hermes-agent` model	Profile-backed `openhands_{profile}` models
Token usage	Real counts via agent metrics	Real counts via `state.stats` (PR #3546)
Ephemeral cleanup	No (sessions persist)	Yes (deletes conversation after response)

OpenHands PR #3545 is a lean v1 — non-streaming, stateless by default, with optional conversation reuse. Hermes has 3 months of iteration and 8x more code. The OpenHands PR's profile-backed model system is a nice touch: each LLM profile becomes a distinct "model" on the /v1/models endpoint.

Issue and scoping: #3540. PR: #3545.

Other Notable Platforms

Platform	Type	Lines	Notes
Telegram	Core	6,081	Largest adapter. Bot API, inline queries, forum topics, DM topics, file handling, network retry layer.
Feishu / Lark	Core	5,163	Enterprise messaging. Comment threads, meeting invites, separate comment rules engine.
Yuanbao	Core	4,941	Tencent's AI assistant platform. Protobuf protocol, sticker/media support.
Matrix	Core	2,983	Decentralized protocol. E2E encryption support, room management.
Weixin / WeChat	Core	2,247	China's dominant messenger. Official account API.
WeCom	Core	1,635	WeChat for enterprise. Callback crypto, webhook integration.
Signal	Core	1,543	Privacy-focused. Rate limiting layer for Signal's strict API limits.
BlueBubbles	Core	1,038	iMessage bridge. Makes Hermes accessible via iMessage on macOS.
Home Assistant	Core	449	Smart home integration. Voice assistant pipeline.
SMS	Core	379	Via Twilio or similar. Bare-bones text messaging.
Email	Core	773	IMAP/SMTP. Polls inbox, sends replies.

Terminology

Concept	Hermes	OpenClaw	SmolPaws
Messaging service	Platform	Channel	Ingress app
Central router	Gateway	Gateway	Agent server (direct)
Adapter base	`BasePlatformAdapter`	`api.registerChannel()`	No shared base
Implementation	`gateway/platforms/` + `plugins/platforms/`	`extensions/`	`apps/`
OpenAI endpoint	Yes (platform adapter)	No	Proposed (#3540)

Takeaways

The gateway pattern works. Both Hermes and OpenClaw converged on the same architecture: a central gateway process with pluggable platform adapters. The terminology differs but the bones are identical.
The OpenAI-compatible endpoint is the most interesting platform. It dissolves the boundary between "asking a model" and "asking an agent." Expose one endpoint, unlock hundreds of frontends plus voice platforms like ElevenLabs.
SmolPaws is architecturally simpler. No shared gateway, no adapter base class. Each ingress talks to the agent-server directly. This keeps things small but means each ingress reimplements connection management, auth patterns, and delivery logic independently.
Hermes is big. 350K+ lines of Python across the main components. The GatewayRunner alone is 20K lines. This is a mature, feature-rich system — but also a reminder that complexity accumulates fast once you support 28 platforms.

SmolPaws Slack → · SmolPaws Discord → · ← Home