Verify hero_proc + hero_router + hero_codescalers start together cleanly #12

Closed
opened 2026-04-26 12:20:43 +00:00 by mahmoud · 2 comments
Owner

Parent: #8

Context

These three services must be running together as the baseline stack. Currently there may be socket mismatches or registration issues.

What to verify

  1. hero_proc starts and its socket is up: $HERO_SOCKET_DIR/hero_proc/rpc.sock
  2. hero_router starts and maps TCP → correct Unix sockets
  3. hero_codescalers_server registers under the correct socket: $HERO_SOCKET_DIR/hero_codescalers_server<N>/rpc.sock
  4. hero_codescalers_ui is reachable via hero_router
  5. hero_router dashboard shows hero_codescalers as a clickable service

Relevant paths

  • hero_codescalers/crates/hero_codescalers_server/ — RPC daemon
  • hero_codescalers/crates/hero_codescalers_ui/ — web dashboard
  • Hero Router service (separate repo hero_router)

Acceptance Criteria

  • All three services start without errors
  • hero_router shows hero_codescalers as connected
  • Clicking through hero_router reaches hero_codescalers UI
  • hero_proc dashboard shows hero_codescalers as a registered service
Parent: #8 ## Context These three services must be running together as the baseline stack. Currently there may be socket mismatches or registration issues. ## What to verify 1. `hero_proc` starts and its socket is up: `$HERO_SOCKET_DIR/hero_proc/rpc.sock` 2. `hero_router` starts and maps TCP → correct Unix sockets 3. `hero_codescalers_server` registers under the correct socket: `$HERO_SOCKET_DIR/hero_codescalers_server<N>/rpc.sock` 4. `hero_codescalers_ui` is reachable via hero_router 5. hero_router dashboard shows hero_codescalers as a clickable service ## Relevant paths - `hero_codescalers/crates/hero_codescalers_server/` — RPC daemon - `hero_codescalers/crates/hero_codescalers_ui/` — web dashboard - Hero Router service (separate repo `hero_router`) ## Acceptance Criteria - [ ] All three services start without errors - [ ] hero_router shows hero_codescalers as connected - [ ] Clicking through hero_router reaches hero_codescalers UI - [ ] hero_proc dashboard shows hero_codescalers as a registered service
Author
Owner

Verification on kristof4 — all acceptance criteria pass

Stack: hero_proc → hero_router → hero_codescalers (instance 0), running as user mahmoud.

Pre-condition fix

The pre-existing build on kristof4 was at commit 84dc730 (still iroh-KVS, required HERO_CODESCALERS_KVS_NAMESPACE_SECRET). Pulled to c0c5957 (sled-backed local store, no secret needed) via service_codescalers install --update --reset and restarted. After that the server boots cleanly:

INFO hero_codescalers_server: Registered self node: kristof4
INFO hero_codescalers_server: Running as actor mahmoud (uid=1003, is_root=false)
INFO hero_codescalers_server: hero_codescalers listening on unix:.../rpc.sock

Acceptance criteria

  • hero_proc starts and its socket is up.

    /home/mahmoud/hero/var/sockets/hero_proc/rpc.sock  → ✓ present
    
  • hero_router starts and maps TCP → correct Unix sockets.

    PID 634706: hero_router --port 0 --address 4a0:6976:8fa7:efc:2::1 --ui-port 9988
    
  • hero_codescalers_server registers under the correct socket.

    /home/mahmoud/hero/var/sockets/hero_codescalers_server/rpc.sock  → ✓
    /home/mahmoud/hero/var/sockets/hero_codescalers_server/ui.sock   → ✓
    
  • hero_codescalers_ui is reachable via hero_router.

    http://[4a0:6976:8fa7:efc:2::1]:9988/hero_codescalers_server/ui/  → HTTP 200
    
  • hero_router dashboard lists codescalers as a clickable service.

    href="http://[4a0:6976:8fa7:efc:2::1]:9988/hero_codescalers_server/ui/"
    
  • hero_proc dashboard shows the registered service.

    ● hero_codescalers running 1034151 12.5M  —  hero_codescalers_server, hero_codescalers_ui  —  system
    
  • Direct health check via the binary.

    $ hero_codescalers --server .../rpc.sock health
    { "service": "hero_codescalers", "status": "ok", "version": "0.1.0" }
    

Notes

  • The pre-existing crash loop on this box was caused by a stale binary, not a missing secret. README says SECRET_CODESCALERS is required; with the sled refactor (c0c5957) it's no longer used. Worth updating the README in a follow-up.
  • The router exposes codescalers at /hero_codescalers_server/ui/ (matches the socket dir name hero_codescalers_server), not /hero_codescalers/ui/.
  • Registered self node: kristof4 confirms the per-server self-registration that the meeting brief described — each kristof box will show its own state, and admins reach each box independently via that server's hero_router.

Next

Moving to #9 — verify nu Action → Job pipeline by triggering a real codescalers feature and confirming the job lands in hero_proc with correct tags + nu interpreter.

## Verification on kristof4 — all acceptance criteria pass Stack: hero_proc → hero_router → hero_codescalers (instance 0), running as user `mahmoud`. ### Pre-condition fix The pre-existing build on kristof4 was at commit `84dc730` (still iroh-KVS, required `HERO_CODESCALERS_KVS_NAMESPACE_SECRET`). Pulled to `c0c5957` (sled-backed local store, no secret needed) via `service_codescalers install --update --reset` and restarted. After that the server boots cleanly: ``` INFO hero_codescalers_server: Registered self node: kristof4 INFO hero_codescalers_server: Running as actor mahmoud (uid=1003, is_root=false) INFO hero_codescalers_server: hero_codescalers listening on unix:.../rpc.sock ``` ### Acceptance criteria - [x] **hero_proc starts and its socket is up.** ``` /home/mahmoud/hero/var/sockets/hero_proc/rpc.sock → ✓ present ``` - [x] **hero_router starts and maps TCP → correct Unix sockets.** ``` PID 634706: hero_router --port 0 --address 4a0:6976:8fa7:efc:2::1 --ui-port 9988 ``` - [x] **hero_codescalers_server registers under the correct socket.** ``` /home/mahmoud/hero/var/sockets/hero_codescalers_server/rpc.sock → ✓ /home/mahmoud/hero/var/sockets/hero_codescalers_server/ui.sock → ✓ ``` - [x] **hero_codescalers_ui is reachable via hero_router.** ``` http://[4a0:6976:8fa7:efc:2::1]:9988/hero_codescalers_server/ui/ → HTTP 200 ``` - [x] **hero_router dashboard lists codescalers as a clickable service.** ``` href="http://[4a0:6976:8fa7:efc:2::1]:9988/hero_codescalers_server/ui/" ``` - [x] **hero_proc dashboard shows the registered service.** ``` ● hero_codescalers running 1034151 12.5M — hero_codescalers_server, hero_codescalers_ui — system ``` - [x] **Direct health check via the binary.** ``` $ hero_codescalers --server .../rpc.sock health { "service": "hero_codescalers", "status": "ok", "version": "0.1.0" } ``` ### Notes - The pre-existing crash loop on this box was caused by a stale binary, not a missing secret. README says `SECRET_CODESCALERS` is required; with the sled refactor (c0c5957) it's no longer used. Worth updating the README in a follow-up. - The router exposes codescalers at `/hero_codescalers_server/ui/` (matches the socket dir name `hero_codescalers_server`), not `/hero_codescalers/ui/`. - `Registered self node: kristof4` confirms the per-server self-registration that the meeting brief described — each kristof box will show its own state, and admins reach each box independently via that server's hero_router. ### Next Moving to #9 — verify nu Action → Job pipeline by triggering a real codescalers feature and confirming the job lands in hero_proc with correct tags + nu interpreter.
Author
Owner

Closed — verified

The three-process stack boots cleanly on kristof4 in this order:

  1. service_proc start --root --reset — supervisor on /root/hero/var/sockets/hero_proc/rpc.sock.
  2. service_router start --root --reset — TCP entry point bound to [<host-TUN>]:9988, scans for sibling sockets and exposes them under /<service>/{ui,rpc}.
  3. service_codescalers start --root — registers hero_codescalers_server and hero_codescalers_ui as hero_proc-managed services; UI socket gets picked up by the router on its next scan and surfaces at /hero_codescalers_server/ui/.

All three run as root, share HERO_SOCKET_DIR=/root/hero/var/sockets, and survive supervisor restart cleanly. Verified that hero_codescalers is per-server admin (PR #140 enforces --root and refuses per-user start).

Stack is now the canonical setup for everything in #8.

## Closed — verified The three-process stack boots cleanly on kristof4 in this order: 1. `service_proc start --root --reset` — supervisor on `/root/hero/var/sockets/hero_proc/rpc.sock`. 2. `service_router start --root --reset` — TCP entry point bound to `[<host-TUN>]:9988`, scans for sibling sockets and exposes them under `/<service>/{ui,rpc}`. 3. `service_codescalers start --root` — registers `hero_codescalers_server` and `hero_codescalers_ui` as hero_proc-managed services; UI socket gets picked up by the router on its next scan and surfaces at `/hero_codescalers_server/ui/`. All three run as root, share `HERO_SOCKET_DIR=/root/hero/var/sockets`, and survive supervisor restart cleanly. Verified that `hero_codescalers` is per-server admin (PR #140 enforces `--root` and refuses per-user start). Stack is now the canonical setup for everything in #8.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_codescalers#12
No description provided.