_admin and _ui health checks probe localhost:80 but daemons bind UDS only, causing restart-loop #21
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The action specs for
hero_assistance_adminandhero_assistance_uiincrates/hero_assistance/src/main.rs(lines 295-307 for_ui, lines 338-350 for_admin) configure hero_proc health checks withhttp_url: "http://localhost/health", but neither daemon binds TCP by default. Both bind UDS only (admin.sockandapp.sockrespectively). Every health probe attempt fails because nothing on the host serveshttp://localhost:80/health, and after the retry budget elapses (start_period 5s plus 3 retries against a 5s timeout) hero_proc kills the daemon and restarts it. Observed cadence is roughly 30 to 35 seconds per restart cycle, confirmed today against the currentdevelopmentHEAD (49ea76a7). The_serveraction usesopenrpc_socket: Some(server_sock)instead and stays alive correctly. This blocks #18 acceptance independently of lhumina_code/hero_router#109: even once hero_router routing is fixed, the operator Admin and customer UI panes are only reachable during the brief alive windows between restarts. Likely fix paths: switch both health checks to a hero_proc UDS-aware probe againstadmin.sockandapp.sockmirroring the pattern_serveralready uses, or have_adminand_uibind a localhost loopback TCP port by default so the existing probe has something to hit.Closed via squash-merge
ee2be7d3ondevelopment(PR #22). Both_ui(lines 295-307) and_admin(lines 338-350) HealthCheck blocks incrates/hero_assistance/src/main.rsnow useopenrpc_socket: Some(<their UDS path>)mirroring_server's working pattern.HealthDef::OpenRpcSocketis a connect-only probe perhero_proc_server/src/types/config_ext.rs:30, so the daemons do not need to expose/rpcor/openrpc.jsonfor the probe itself; this matters becausehero_assistance_uionly exposes/rpc(not/openrpc.json).New unit test
phase24c_build_service_definition_health_checks_use_uds_connect_probepins the contract across all three actions.Live verify on the rebuilt + reinstalled binaries:
hero_assistance --startbrought up all three daemons; after 5.5 minutes under hero_proc supervision the job list still showedrunningphase forhero_assistance_server(PID 3679580),hero_assistance_ui(PID 3679536), andhero_assistance_admin(PID 3679504) — same PIDs, no restart cycle,ps -o etimeconfirmed ~6 minutes of uptime per process.curl --unix-socketagainstrpc.sock,app.sock, andadmin.sockall returned HTTP 200 with the expected{"service":"hero_assistance","status":"ok","version":"0.5.0"}health JSON.Pre-merge gate:
cargo fmt --check+cargo clippy --release --workspace --all-targets -- -D warnings+cargo build --workspace --releaseall clean. Workspace tests 255 pass / 2 fail / 14 ignored (+1 from the new pin test vs the 254/1/14 baseline; the 2 fails are documented pre-existing flakesphase24b_ui_add_access_fails_when_hero_proc_unreachable+ the transientphase10_multi_project_merged_stream_tags_by_project_id).Unblocks row 2 of #18 acceptance.