← Home

Hermes Agent Gateway architecture study

How Hermes Agent (NousResearch, 165K ☆) structures its multi-platform gateway, and what we can learn for OpenHands.
Studied from source, June 2026.

Repo Structure

Hermes is a monorepo. The core components, by size:

ComponentDirectoryLinesWhat it does
CLIhermes_cli/118KSetup, config, commands, proxy server, auth, profiles
Toolstools/67KBrowser, code execution, approval, TTS/STT, MCP, security
Agentagent/61KConversation loop, context compression, LLM adapters, credential pool
Gateway platformsgateway/platforms/50K20 core platform adapters (see below)
Gateway coregateway/30KRunner, sessions, delivery, hooks, config, stream dispatch
Plugin platformsplugins/platforms/17K8 plugin-based adapters (Discord, Teams, IRC, etc.)
Skillsskills/12KGitHub, DevOps, creative, media, email
Croncron/4KScheduled jobs and automation

The single largest file in the entire codebase is gateway/run.py at 19,911 lines — the GatewayRunner class that orchestrates everything.

Gateway Architecture

The gateway is a long-running process that connects to multiple messaging platforms through a unified adapter pattern. Each platform implements a BasePlatformAdapter with connect(), disconnect(), send(), and message handling.

flowchart TB
    subgraph Gateway ["GatewayRunner (gateway/run.py)"]
        direction TB
        MH["_handle_message()"]
        SC["Slash command\ndispatch"]
        AG["AIAgent\ncreation"]
        SS["SessionStore\n(SQLite)"]
        DC["Delivery +\nStream dispatch"]
    end

    subgraph Core ["Core platforms (gateway/platforms/)"]
        TG["Telegram\n6,081 lines"]
        SL["Slack\n3,519 lines"]
        WA["WhatsApp\n1,387 lines"]
        WH["Webhook\n934 lines"]
        AS["API Server\n4,257 lines"]
        EM["Email\n773 lines"]
        SIG["Signal\n1,543 lines"]
        MAT["Matrix\n2,983 lines"]
        MORE["+ 12 more"]
    end

    subgraph Plugins ["Plugin platforms (plugins/platforms/)"]
        DIS["Discord"]
        IRC["IRC"]
        TMS["Teams"]
        GC["Google Chat"]
        LN["LINE"]
        MM["Mattermost"]
        NTF["Ntfy"]
        SIM["SimplEx"]
    end

    TG & SL & WA & WH & AS & EM & SIG & MAT & MORE --> MH
    DIS & IRC & TMS & GC & LN & MM --> MH
    MH --> SC & AG
    AG --> SS
    AG --> DC
    

Two tiers of platforms

Hermes has a split architecture for platform adapters:

Both types inherit from BasePlatformAdapter (4,813 lines) which provides the shared interface: connection lifecycle, message guards, typing indicators, delivery helpers, interrupt handling, and text debouncing.

Message flow

sequenceDiagram
    participant P as Platform Adapter
    participant G as GatewayRunner
    participant S as SessionStore
    participant A as AIAgent

    P->>P: receive raw event
    P->>P: normalize to MessageEvent
    P->>P: active session guard
    alt agent running for this session
        P->>P: queue in _pending_messages
        P->>P: set interrupt event
    else
        P->>G: _handle_message(event)
        G->>G: resolve session key
        G->>G: check authorization
        alt slash command
            G->>G: dispatch command handler
        else user message
            G->>S: load/create session
            G->>A: create AIAgent + run_conversation
            A-->>G: final response
            G->>P: deliver response
        end
    end
    

Session keys

Every conversation is identified by a deterministic session key:

agent:main:{platform}:{chat_type}:{chat_id}

For example: agent:main:telegram:private:123456789. Thread-aware platforms include thread IDs in the chat_id. Keys are always constructed via build_session_key(), never manually.

Key Platforms Compared

Slack

HermesSmolPaws
Libraryslack-bolt (Python, async)@slack/bolt (TypeScript)
TransportSocket ModeSocket Mode
Lines3,519686
Thread tracking_mentioned_threads set — once mentioned, responds to all thread repliesSame pattern via MentionedThreadTracker
AuthAllowlists, DM pairing, global allow-allAllowlist + guest rate limiter
StreamingProgressive message editing (update sent message in-place)Not yet

The patterns are strikingly similar. Hermes has ~5x the code due to assistant threads, slash command handling, streaming via message editing, file uploads, and reaction management.

GitHub

HermesSmolPaws
ArchitectureGeneric webhook adapter (934 lines) with configurable routes. No dedicated GitHub adapter.Dedicated Cloudflare Worker (apps/github/, 1,718 lines)
TriggerWebhook POST with HMAC validation per routeWebhook POST via Cloudflare Worker, @mention or own-thread detection
DeliveryConfigurable: github_comment, or forward to another platformDirect via gh CLI or API
GitHub skillsRich skill set: PR workflow, code review, issues, repo managementBasic: comment on PRs/issues

Hermes treats GitHub as a webhook source, not a first-class platform. The webhook adapter is generic — it handles GitHub, GitLab, JIRA, Stripe, etc. through the same configurable route system. SmolPaws has a dedicated GitHub ingress with its own @mention logic and self-loop guards.

Discord

HermesSmolPaws
ArchitecturePlugin platform (plugins/platforms/discord/), discord.py libraryIngress app (apps/discord/, 642 lines), discord.js
VoiceVoice channel support with mixerNot yet
Bot commandsSlash command sync with state trackingBasic message handling

Discord is notable as a plugin platform in Hermes, not a core one. It uses the plugin registry to self-register, demonstrating the extensibility model.

The OpenAI-Compatible API Server

The key insight: Hermes exposes the agent as an OpenAI-compatible endpoint. Any client that speaks /v1/chat/completions can talk to the full agent runtime — tools, memory, skills, and all. From the caller's perspective, it looks like a very capable "model."

Built by Teknium (Nous Research founder) in March 2026. Three PR attempts: #828, #956, landed in #1756. The motivation was reach — the PR body listed star counts of OpenAI-compatible frontends that would instantly work:

FrontendStars
Open WebUI126K
NextChat87K
LobeChat73K
AnythingLLM56K
ChatBox39K
LibreChat34K

Since landing on March 17, the file grew from its initial implementation to 4,257 lines in under 3 months — streaming, Responses API, session management, CORS, security hardening, cron jobs API.

Endpoints

MethodPathPurpose
POST/v1/chat/completionsStateless Chat Completions (opt-in session via header)
POST/v1/responsesStateful Responses API with previous_response_id chaining
GET/v1/modelsLists the agent as an available model
POST/v1/runsAsync execution with SSE event streaming
*/api/sessions/*Full session CRUD + chat + fork
GET/healthHealth check

How chat completions works

sequenceDiagram
    participant C as OpenAI Client
    participant AS as API Server Adapter
    participant A as AIAgent

    C->>AS: POST /v1/chat/completions
    AS->>AS: parse messages, extract system + user
    AS->>AS: derive session_id from fingerprint
    AS->>A: _run_agent(user_message, history, system_prompt)
    Note over A: full agent loop: tools, memory, skills
    A-->>AS: result + usage
    AS->>C: OpenAI-format response
    

Comparison with OpenHands PR #3545

HermesOpenHands PR #3545
Lines4,257~530
StreamingFull SSE with tool progress eventsNot yet (returns 400)
Session reuseX-Hermes-Session-Id headerX-OpenHands-ServerConversation-ID header
Responses APIYes, with previous_response_idNot yet (planned)
AuthAPI_SERVER_KEY bearer tokenX-Session-API-Key or Authorization: Bearer
ModelsSingle hermes-agent modelProfile-backed openhands_{profile} models
Token usageReal counts via agent metricsReal counts via state.stats (PR #3546)
Ephemeral cleanupNo (sessions persist)Yes (deletes conversation after response)

OpenHands PR #3545 is a lean v1 — non-streaming, stateless by default, with optional conversation reuse. Hermes has 3 months of iteration and 8x more code. The OpenHands PR's profile-backed model system is a nice touch: each LLM profile becomes a distinct "model" on the /v1/models endpoint.

Issue and scoping: #3540. PR: #3545.

Other Notable Platforms

PlatformTypeLinesNotes
TelegramCore6,081Largest adapter. Bot API, inline queries, forum topics, DM topics, file handling, network retry layer.
Feishu / LarkCore5,163Enterprise messaging. Comment threads, meeting invites, separate comment rules engine.
YuanbaoCore4,941Tencent's AI assistant platform. Protobuf protocol, sticker/media support.
MatrixCore2,983Decentralized protocol. E2E encryption support, room management.
Weixin / WeChatCore2,247China's dominant messenger. Official account API.
WeComCore1,635WeChat for enterprise. Callback crypto, webhook integration.
SignalCore1,543Privacy-focused. Rate limiting layer for Signal's strict API limits.
BlueBubblesCore1,038iMessage bridge. Makes Hermes accessible via iMessage on macOS.
Home AssistantCore449Smart home integration. Voice assistant pipeline.
SMSCore379Via Twilio or similar. Bare-bones text messaging.
EmailCore773IMAP/SMTP. Polls inbox, sends replies.

Terminology

ConceptHermesOpenClawSmolPaws
Messaging servicePlatformChannelIngress app
Central routerGatewayGatewayAgent server (direct)
Adapter baseBasePlatformAdapterapi.registerChannel()No shared base
Implementationgateway/platforms/ + plugins/platforms/extensions/apps/
OpenAI endpointYes (platform adapter)NoProposed (#3540)

Takeaways

SmolPaws Slack →  ·  SmolPaws Discord →  ·  ← Home