fix(service_osis): point health-check at hero_osis_base/rpc.sock #177

Merged
sameh-farouk merged 1 commit from fix/service-osis-health-check-socket-path into development 2026-04-30 06:46:10 +00:00
Member

Summary

The hero_osis action's health-check + kill_other.socket referenced hero_osis/rpc.sock, which the unified backend never creates — the binary binds per-domain sockets only (hero_osis_base/rpc.sock, hero_osis_business/rpc.sock, hero_osis_calendar/rpc.sock, ...). hero_proc kept SIGTERMing the backend after the start grace window, causing a crash loop while hero_osis_ui (whose path was already correct) survived — making the service appear partially-running with a stale socket file.

Repro

  • service_osis startproc service status hero_osis shows restarts > 0, hero_osis_ui alive but no hero_osis (the backend) in pgrep.
  • WASM/UI calls to /hero_osis_base/rpc return 404 Socket 'rpc.sock' not found.
  • Running ~/hero/bin/hero_osis standalone (no hero_proc) stays up cleanly and binds all 16 domain sockets — confirms the binary is fine; the action spec is the bug.

Fix

Switch both kill_other.socket and health_checks.openrpc_socket to hero_osis_base/rpc.sock. The base domain is registered first by the unified server, and registration is atomic (all 16 domains together), so a healthy base socket is a sufficient liveness signal for the whole binary.

Test plan

  • service_osis start --reset on a previously-loop-crashing dev box → restarts: 0 after >30s
  • All sampled per-domain sockets (base, business, calendar, files, projects) reply HTTP/1.1 200 OK
  • curl POST http://[router]:9988/hero_osis_base/rpc -d '{...domain.list...}' returns valid JSON in <2 ms (was 404 before)
  • No regression in hero_osis_ui (already-correct path left untouched)
## Summary The `hero_osis` action's health-check + `kill_other.socket` referenced `hero_osis/rpc.sock`, which the unified backend **never creates** — the binary binds per-domain sockets only (`hero_osis_base/rpc.sock`, `hero_osis_business/rpc.sock`, `hero_osis_calendar/rpc.sock`, ...). hero_proc kept SIGTERMing the backend after the start grace window, causing a crash loop while `hero_osis_ui` (whose path was already correct) survived — making the service appear partially-running with a stale socket file. ## Repro - `service_osis start` → `proc service status hero_osis` shows `restarts > 0`, `hero_osis_ui` alive but no `hero_osis` (the backend) in `pgrep`. - WASM/UI calls to `/hero_osis_base/rpc` return `404 Socket 'rpc.sock' not found`. - Running `~/hero/bin/hero_osis` standalone (no hero_proc) stays up cleanly and binds all 16 domain sockets — confirms the binary is fine; the action spec is the bug. ## Fix Switch both `kill_other.socket` and `health_checks.openrpc_socket` to `hero_osis_base/rpc.sock`. The `base` domain is registered first by the unified server, and registration is atomic (all 16 domains together), so a healthy `base` socket is a sufficient liveness signal for the whole binary. ## Test plan - [x] `service_osis start --reset` on a previously-loop-crashing dev box → `restarts: 0` after >30s - [x] All sampled per-domain sockets (`base`, `business`, `calendar`, `files`, `projects`) reply `HTTP/1.1 200 OK` - [x] `curl POST http://[router]:9988/hero_osis_base/rpc -d '{...domain.list...}'` returns valid JSON in <2 ms (was 404 before) - [x] No regression in `hero_osis_ui` (already-correct path left untouched)
fix(service_osis): point health-check at hero_osis_base/rpc.sock
All checks were successful
Build and Publish Skills / build-and-publish (pull_request) Successful in 3s
2f7a2cc275
The unified hero_osis backend binds per-domain sockets (hero_osis_base,
hero_osis_business, etc.) and never creates a singular hero_osis/rpc.sock.
The action spec's health-check + kill_other targeted the non-existent
path, so hero_proc kept SIGTERMing the backend after the start grace
window — verified by running the binary standalone (stays up cleanly
and binds all 16 domain sockets) versus under hero_proc supervision
(restarts: 4, hero_osis_ui survives but hero_osis backend crash-loops).

Switch both fields to hero_osis_base/rpc.sock — root domain, registered
first by the unified server, always present when the process is healthy.
sameh-farouk merged commit 6b65f40f93 into development 2026-04-30 06:46:10 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_skills!177
No description provided.