fix(embedder): proxy openrpc returns 502 JSON + EmbedderdClient async #25

Merged
salmaelsoly merged 2 commits from development_proxy_openrpc_error into development 2026-04-28 08:55:01 +00:00
Member

Summary

Two related fixes in the embedder backend, both about returning structured JSON / not panicking when something is off.

1. embedder_proxy::openrpc_handler no longer silently masks upstream failures (closes #21)

openrpc_handler previously returned 200 OK + "{}" when the upstream OpenRPC GET failed. It now matches the existing error pattern from the same file's POST /rpc handler: HTTP 502 + Content-Type: application/json + {"error": {"message": "Failed to fetch upstream OpenRPC spec: <e>", "code": "upstream_unavailable"}}. The dead let _ = forward_to_upstream(_, "") "preserves prior semantics" call (and its 2-line comment) is dropped — its result was always discarded.

2. EmbedderdClient converted from blocking to async to stop tokio worker panics

hero_embedder_server worker threads were panicking with Cannot drop a runtime in a context where blocking is not allowed whenever embed_cached / rerank_via (called from per-request axum handlers) reached the daemon delegation code. The cause: EmbedderdClient wrapped reqwest::blocking::Client, which spins up an inner tokio runtime per call and panics on drop when there's already a multi-threaded runtime in scope. tokio respawned workers automatically so the server stayed "up", but every panicking request dropped its hyper connection mid-flight, causing hero_router to return 502 with client error (SendRequest) to the dashboard.

PR #24 already fixed the startup path of this by wrapping discover_embedderd in tokio::task::spawn_blocking, but the per-request paths (embed_cached, rerank_via, all 8 of their callers across api/{embed, index, search, corpus}.rs) were still calling the blocking client directly from async context. This PR converts the entire client + cascade to async:

  • EmbedderdClient::{is_reachable, embed, rerank} are now async fn and use reqwest::Client (not reqwest::blocking::Client).
  • AppState::{rerank_via, embed_cached} are now async fn; their bodies .await the client calls.
  • All 8 callers in api/embed.rs, api/index.rs, api/search.rs (×3), api/corpus.rs (×3) gain .await. Every caller was already in a pub async fn handle_* or tokio::spawn(async move { ... }) context, so the cascade is clean.
  • discover_embedderd becomes async fn; the now-redundant tokio::task::spawn_blocking(discover_embedderd) wrapper from PR #24 is dropped, and std::thread::sleep becomes tokio::time::sleep.

Verification

$ service_embedder start --reset                # cold from a fully stopped state
=== hero_embedder registered & started (mode: all) ===
  state      : running ✓
  daemon     : http://127.0.0.1:8092
  rpc.sock   : $HERO_SOCKET_DIR/hero_embedder/rpc.sock
  ui.sock    : $HERO_SOCKET_DIR/hero_embedder/ui.sock

$ hero_proc job list hero_embedder
ID   ACTION                  PHASE     PID         SERVICE
227  hero_embedder_ui        running   1270724     hero_embedder
226  hero_embedder_server    running   1270725     hero_embedder
225  hero_embedderd          running   1270788     hero_embedder

$ for i in $(seq 1 50); do curl -sS -o /dev/null -w '%{http_code} ' \
    -X POST -H 'content-type: application/json' \
    -d "{\"jsonrpc\":\"2.0\",\"id\":$i,\"method\":\"info\",\"params\":{}}" \
    http://127.0.0.1:9151/hero_embedder/ui/rpc; done | tr ' ' '\n' | sort | uniq -c
     50 200

$ hero_proc service logs hero_embedder | grep -E 'tokio-rt-worker.*panicked|Cannot drop a runtime'
(empty)

50/50 sequential info calls return 200, zero worker panics in the live binary.

Manual proxy verification (issue #21 fix):

$ curl -sS --unix-socket /tmp/test_embedder_proxy.sock \
    -w 'status=%{http_code}\nctype=%{content_type}\n' \
    http://localhost/openrpc.json
status=502
ctype=application/json

{
  "error": {
    "code": "upstream_unavailable",
    "message": "Failed to fetch upstream OpenRPC spec: Failed to connect to upstream: /tmp/dead_embedderd.sock"
  }
}

Test plan

  • cargo check --workspace --bins clean (8 crates)
  • cargo test -p hero_embedder_proxy 9/9 passing
  • Cold restart: all three jobs reach running on first try (PR #24's retry-with-backoff still works)
  • 50 sequential info calls: 50/50 200 OK
  • No tokio-rt-worker panics or Cannot drop a runtime in server logs after the patch
  • Dashboard panels populate, browser console has zero errors
  • Manual proxy verification: dead-upstream curl → 502 + structured JSON

Notes

  • The doc comment in embedderd_client.rs previously claimed "callers spawn_blocking where it matters" — that turned out to be false. The async conversion removes that whole class of bug at the source rather than playing whack-a-mole with spawn_blocking per call site.
  • error.code for the proxy's openrpc 502 is the string "upstream_unavailable" rather than a JSON-RPC numeric code — /openrpc.json is a plain HTTP GET, not a JSON-RPC method, so the -32603 namespace doesn't apply there. The jsonrpc_handler and raw_rpc_handler in the same file continue to use -32603 for legitimate JSON-RPC envelopes (untouched).
  • Closes #21. Resolves the worker-panic class of the Backend unavailable (HTTP 502): … client error (SendRequest) reports surfaced by the merged PR #22's UI fail-soft.
## Summary Two related fixes in the embedder backend, both about returning structured JSON / not panicking when something is off. ### 1. `embedder_proxy::openrpc_handler` no longer silently masks upstream failures (closes #21) `openrpc_handler` previously returned `200 OK` + `"{}"` when the upstream OpenRPC GET failed. It now matches the existing error pattern from the same file's `POST /rpc` handler: HTTP 502 + `Content-Type: application/json` + `{"error": {"message": "Failed to fetch upstream OpenRPC spec: <e>", "code": "upstream_unavailable"}}`. The dead `let _ = forward_to_upstream(_, "")` "preserves prior semantics" call (and its 2-line comment) is dropped — its result was always discarded. ### 2. `EmbedderdClient` converted from blocking to async to stop tokio worker panics `hero_embedder_server` worker threads were panicking with `Cannot drop a runtime in a context where blocking is not allowed` whenever `embed_cached` / `rerank_via` (called from per-request axum handlers) reached the daemon delegation code. The cause: `EmbedderdClient` wrapped `reqwest::blocking::Client`, which spins up an inner tokio runtime per call and panics on drop when there's already a multi-threaded runtime in scope. tokio respawned workers automatically so the server stayed "up", but every panicking request dropped its hyper connection mid-flight, causing `hero_router` to return 502 with `client error (SendRequest)` to the dashboard. PR #24 already fixed the *startup* path of this by wrapping `discover_embedderd` in `tokio::task::spawn_blocking`, but the per-request paths (`embed_cached`, `rerank_via`, all 8 of their callers across `api/{embed, index, search, corpus}.rs`) were still calling the blocking client directly from async context. This PR converts the entire client + cascade to async: - `EmbedderdClient::{is_reachable, embed, rerank}` are now `async fn` and use `reqwest::Client` (not `reqwest::blocking::Client`). - `AppState::{rerank_via, embed_cached}` are now `async fn`; their bodies `.await` the client calls. - All 8 callers in `api/embed.rs`, `api/index.rs`, `api/search.rs` (×3), `api/corpus.rs` (×3) gain `.await`. Every caller was already in a `pub async fn handle_*` or `tokio::spawn(async move { ... })` context, so the cascade is clean. - `discover_embedderd` becomes `async fn`; the now-redundant `tokio::task::spawn_blocking(discover_embedderd)` wrapper from PR #24 is dropped, and `std::thread::sleep` becomes `tokio::time::sleep`. ## Verification ``` $ service_embedder start --reset # cold from a fully stopped state === hero_embedder registered & started (mode: all) === state : running ✓ daemon : http://127.0.0.1:8092 rpc.sock : $HERO_SOCKET_DIR/hero_embedder/rpc.sock ui.sock : $HERO_SOCKET_DIR/hero_embedder/ui.sock $ hero_proc job list hero_embedder ID ACTION PHASE PID SERVICE 227 hero_embedder_ui running 1270724 hero_embedder 226 hero_embedder_server running 1270725 hero_embedder 225 hero_embedderd running 1270788 hero_embedder $ for i in $(seq 1 50); do curl -sS -o /dev/null -w '%{http_code} ' \ -X POST -H 'content-type: application/json' \ -d "{\"jsonrpc\":\"2.0\",\"id\":$i,\"method\":\"info\",\"params\":{}}" \ http://127.0.0.1:9151/hero_embedder/ui/rpc; done | tr ' ' '\n' | sort | uniq -c 50 200 $ hero_proc service logs hero_embedder | grep -E 'tokio-rt-worker.*panicked|Cannot drop a runtime' (empty) ``` 50/50 sequential `info` calls return 200, zero worker panics in the live binary. Manual proxy verification (issue #21 fix): ``` $ curl -sS --unix-socket /tmp/test_embedder_proxy.sock \ -w 'status=%{http_code}\nctype=%{content_type}\n' \ http://localhost/openrpc.json status=502 ctype=application/json { "error": { "code": "upstream_unavailable", "message": "Failed to fetch upstream OpenRPC spec: Failed to connect to upstream: /tmp/dead_embedderd.sock" } } ``` ## Test plan - [x] `cargo check --workspace --bins` clean (8 crates) - [x] `cargo test -p hero_embedder_proxy` 9/9 passing - [x] Cold restart: all three jobs reach `running` on first try (PR #24's retry-with-backoff still works) - [x] 50 sequential `info` calls: 50/50 200 OK - [x] No `tokio-rt-worker` panics or `Cannot drop a runtime` in server logs after the patch - [x] Dashboard panels populate, browser console has zero errors - [x] Manual proxy verification: dead-upstream curl → 502 + structured JSON ## Notes - The doc comment in `embedderd_client.rs` previously claimed "callers spawn_blocking where it matters" — that turned out to be false. The async conversion removes that whole class of bug at the source rather than playing whack-a-mole with `spawn_blocking` per call site. - `error.code` for the proxy's openrpc 502 is the string `"upstream_unavailable"` rather than a JSON-RPC numeric code — `/openrpc.json` is a plain HTTP GET, not a JSON-RPC method, so the `-32603` namespace doesn't apply there. The `jsonrpc_handler` and `raw_rpc_handler` in the same file continue to use `-32603` for legitimate JSON-RPC envelopes (untouched). - Closes #21. Resolves the worker-panic class of the `Backend unavailable (HTTP 502): … client error (SendRequest)` reports surfaced by the merged PR #22's UI fail-soft.
fix(embedder_proxy): return 502 JSON when upstream openrpc.json fetch fails
All checks were successful
Test / test (pull_request) Successful in 3m25s
aee3243cc9
fix(embedder_lib): convert EmbedderdClient to async to stop tokio worker panics
All checks were successful
Test / test (pull_request) Successful in 3m24s
27293440d2
salmaelsoly changed title from fix(embedder_proxy): return 502 JSON when upstream openrpc.json fetch fails to fix(embedder): proxy openrpc returns 502 JSON + EmbedderdClient async 2026-04-28 07:39:06 +00:00
salmaelsoly merged commit 117597437d into development 2026-04-28 08:55:01 +00:00
salmaelsoly deleted branch development_proxy_openrpc_error 2026-04-28 08:55:07 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_embedder!25
No description provided.