WebSocket upgrade not tunneled — clients reconnect-loop through proxy #28

Open
opened 2026-04-26 15:09:31 +00:00 by sameh-farouk · 0 comments
Member

Problem

proxy.rs::proxy_handler ends with a single line forwarding every request via crate::domain::forward_to_upstream, which is a plain HTTP request/response forwarder — no Upgrade: websocket detection, no hyper::upgrade::on(&mut req), no bidirectional splice after 101 Switching Protocols.

Result: when a browser opens a WebSocket through the proxy:

  1. Browser sends Connection: upgrade + Upgrade: websocket
  2. Proxy forwards it as a plain HTTP request to hero_router
  3. hero_router replies 101 Switching Protocols (its own proxy_ws_tunnel handles the UDS side correctly)
  4. Proxy returns the 101 response back to the browser — but does not upgrade its own connection
  5. Browser thinks WS is open, sends a frame; the proxy is still in HTTP/1.1 mode → connection drops
  6. Browser's ws.onclose fires, schedules reconnect
  7. Loop

In collab specifically this manifests as a flood of ws/user/{id} requests in the network tab, each followed immediately by a close, plus a flood of message.list / mention.list / read.mark calls (the catch-up cycle fires on every reconnect).

Severity

Anything WebSocket-shaped behind hero_proxy is broken:

  • collab chat (/hero_collab/ui/ws/user/{id})
  • collab canvas (/hero_collab/ui/ws/canvas/{id})
  • hero_proc PTY (/hero_proc/ui/api/services/{name}/pty)
  • hero_router terminal

Verification

# Same WS upgrade headers, two paths:
curl -si --max-time 3 \
  -H "Connection: Upgrade" \
  -H "Upgrade: websocket" \
  -H "Sec-WebSocket-Version: 13" \
  -H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
  http://127.0.0.1:9988/hero_collab/ui/ws/user/1
# → 101 + presence event payload streams back

curl -si --max-time 3 \
  -H "Connection: Upgrade" \
  -H "Upgrade: websocket" \
  -H "Sec-WebSocket-Version: 13" \
  -H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
  http://127.0.0.1:9997/hero_collab/ui/ws/user/1
# → 101 returned, but connection drops before any frame flows

Workaround

Hit hero_router directly on :9988 (its proxy_ws_tunnel handles UDS WS correctly). Not viable for any deployment that exposes only the proxy.

Fix shape

  1. Detect Connection: upgrade + Upgrade: websocket (is_ws_upgrade).
  2. If matched, branch to a TCP-targeted variant of hero_router's proxy_ws_tunnel:
    • Grab hyper::upgrade::on(&mut req) for the client side before consuming the request.
    • TCP-connect to the upstream (router_url).
    • HTTP/1.1 handshake with_upgrades() enabled.
    • Send the upgrade request preserving all incoming headers (Host overridden to upstream addr); body is Empty<Bytes>.
    • Validate response is 101 Switching Protocols; bail with the upstream's status otherwise.
    • Forward the 101 + all upstream response headers to the client.
    • Spawn a task that awaits both upgrades, then tokio::io::copy_bidirectional.
  3. Otherwise (non-WS), continue with the existing forward_to_upstream path.

Out of scope (separate follow-up)

  • dispatch_domain_route (host-matched path with bearer/OAuth/signature/IP auth) still uses regular HTTP forwarding. No domain route fronts a WebSocket service today; can be addressed when one does. UDS-targeted variant of the tunnel needed for that path.
  • Cosmetic: response gets duplicate vary / CORS headers (both router and proxy add them). Functionally harmless; ~5 LOC fix.

PR

Fix in feat/ws-upgrade-tunnel. Pattern mirrors hero_router's proxy_ws_tunnel but TCP-targeted instead of UDS-targeted. ~210 LOC added (helpers + branch). No removals.

## Problem `proxy.rs::proxy_handler` ends with a single line forwarding every request via `crate::domain::forward_to_upstream`, which is a plain HTTP request/response forwarder — no `Upgrade: websocket` detection, no `hyper::upgrade::on(&mut req)`, no bidirectional splice after `101 Switching Protocols`. Result: when a browser opens a WebSocket through the proxy: 1. Browser sends `Connection: upgrade` + `Upgrade: websocket` 2. Proxy forwards it as a plain HTTP request to hero_router 3. hero_router replies `101 Switching Protocols` (its own `proxy_ws_tunnel` handles the UDS side correctly) 4. Proxy returns the 101 response back to the browser — **but does not upgrade its own connection** 5. Browser thinks WS is open, sends a frame; the proxy is still in HTTP/1.1 mode → connection drops 6. Browser's `ws.onclose` fires, schedules reconnect 7. Loop In collab specifically this manifests as a flood of `ws/user/{id}` requests in the network tab, each followed immediately by a close, plus a flood of `message.list` / `mention.list` / `read.mark` calls (the catch-up cycle fires on every reconnect). ## Severity Anything WebSocket-shaped behind hero_proxy is broken: - collab chat (`/hero_collab/ui/ws/user/{id}`) - collab canvas (`/hero_collab/ui/ws/canvas/{id}`) - hero_proc PTY (`/hero_proc/ui/api/services/{name}/pty`) - hero_router terminal ## Verification ```bash # Same WS upgrade headers, two paths: curl -si --max-time 3 \ -H "Connection: Upgrade" \ -H "Upgrade: websocket" \ -H "Sec-WebSocket-Version: 13" \ -H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \ http://127.0.0.1:9988/hero_collab/ui/ws/user/1 # → 101 + presence event payload streams back curl -si --max-time 3 \ -H "Connection: Upgrade" \ -H "Upgrade: websocket" \ -H "Sec-WebSocket-Version: 13" \ -H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \ http://127.0.0.1:9997/hero_collab/ui/ws/user/1 # → 101 returned, but connection drops before any frame flows ``` ## Workaround Hit hero_router directly on `:9988` (its `proxy_ws_tunnel` handles UDS WS correctly). Not viable for any deployment that exposes only the proxy. ## Fix shape 1. Detect `Connection: upgrade` + `Upgrade: websocket` (`is_ws_upgrade`). 2. If matched, branch to a TCP-targeted variant of hero_router's `proxy_ws_tunnel`: - Grab `hyper::upgrade::on(&mut req)` for the client side **before** consuming the request. - TCP-connect to the upstream (`router_url`). - HTTP/1.1 handshake `with_upgrades()` enabled. - Send the upgrade request preserving all incoming headers (Host overridden to upstream addr); body is `Empty<Bytes>`. - Validate response is `101 Switching Protocols`; bail with the upstream's status otherwise. - Forward the 101 + all upstream response headers to the client. - Spawn a task that awaits both upgrades, then `tokio::io::copy_bidirectional`. 3. Otherwise (non-WS), continue with the existing `forward_to_upstream` path. ## Out of scope (separate follow-up) - `dispatch_domain_route` (host-matched path with bearer/OAuth/signature/IP auth) still uses regular HTTP forwarding. No domain route fronts a WebSocket service today; can be addressed when one does. UDS-targeted variant of the tunnel needed for that path. - Cosmetic: response gets duplicate `vary` / CORS headers (both router and proxy add them). Functionally harmless; ~5 LOC fix. ## PR Fix in `feat/ws-upgrade-tunnel`. Pattern mirrors hero_router's `proxy_ws_tunnel` but TCP-targeted instead of UDS-targeted. ~210 LOC added (helpers + branch). No removals.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_proxy#28
No description provided.