service_db.nu — hero_db server + UI lifecycle module #87

Closed
opened 2026-04-19 19:44:16 +00:00 by mahmoud · 3 comments
Owner

Child of #75.

Objective

Add tools/modules/services/service_db.nu implementing the standard install | start | stop | status lifecycle for the hero_db service (server + UI).

Scope

  • Repo: ssh://git@forge.ourworld.tf/lhumina_code/hero_db.git
  • Binaries (per buildenv.sh): hero_db, hero_db_server, hero_db_ui
  • Runtime actions: hero_db_server, hero_db_ui
  • TOML: lhumina_code/hero_zero/services/hero_db.toml
  • Sockets:
    • $HERO_SOCKET_DIR/hero_db/rpc.sock — HTTP/1.1 OpenRPC management (server)
    • $HERO_SOCKET_DIR/hero_db/resp.sock — RESP2 Unix socket (server)
    • $HERO_SOCKET_DIR/hero_db/ui.sock — UI admin
  • TCP: server also binds 0.0.0.0:6378 for RESP2 (unprivileged port — no root required). HERO_DB_PORT env var overrides.
  • Env: RUST_LOG=info for both — TOML declares no other env. HERO_DB_DATA_DIR defaults to ~/.hero_db (server-managed, no preflight needed).
  • Dependencies: none in TOML.
  • Workspace layout: virtual workspace (confirmed at Cargo.toml:1-10, [workspace]-only).
  • Subcommand behaviour (from TOML): server invoked as bare hero_db_server; UI invoked as hero_db_ui serve. Matches the service_books.nu pattern for the UI.
  • --root flag supported but optional; user-level default.

Acceptance criteria

  • use services/mod.nu * makes service_db available.
  • service_db install [--root] [--update] clones lhumina_code/hero_db, builds all 3 binaries in release mode, installs to ~/hero/bin/ (or /root/hero/bin/ with --root).
  • service_db start [--reset] [--root] [--update] registers both runtime actions + the service, starts, prints all three socket paths plus the TCP port in the summary. Idempotent without --reset.
  • service_db status [--root] reports state.
  • service_db stop [--root] cleanly unregisters.
  • Smoke-tested on Hetzner: install → start --reset → status shows running with 0 restarts → stop.

Template & references

  • Template: service_whiteboard.nu (PR #83) / service_collab.nu (PR #85) — virtual-workspace baseline.
  • service_books.nu (PR #81) — reference for the serve subcommand pattern on the UI (used verbatim here).
  • Spec phase should also verify whether the server needs extra kill_other.port: [6378] to catch stale TCP binds on re-register (it probably does, given the TCP listener).

Expected deviations from the baseline template

  • Server kill_other must cover THREE artifacts: rpc.sock, resp.sock, and TCP port 6378.
  • UI script uses serve subcommand (per TOML). Server script is bare binary.
  • Everything else should match service_whiteboard.nu verbatim.
Child of #75. ## Objective Add `tools/modules/services/service_db.nu` implementing the standard `install | start | stop | status` lifecycle for the **hero_db** service (server + UI). ## Scope - **Repo**: `ssh://git@forge.ourworld.tf/lhumina_code/hero_db.git` - **Binaries** (per `buildenv.sh`): `hero_db`, `hero_db_server`, `hero_db_ui` - **Runtime actions**: `hero_db_server`, `hero_db_ui` - **TOML**: `lhumina_code/hero_zero/services/hero_db.toml` - **Sockets**: - `$HERO_SOCKET_DIR/hero_db/rpc.sock` — HTTP/1.1 OpenRPC management (server) - `$HERO_SOCKET_DIR/hero_db/resp.sock` — RESP2 Unix socket (server) - `$HERO_SOCKET_DIR/hero_db/ui.sock` — UI admin - **TCP**: server also binds `0.0.0.0:6378` for RESP2 (unprivileged port — no root required). `HERO_DB_PORT` env var overrides. - **Env**: `RUST_LOG=info` for both — TOML declares no other env. `HERO_DB_DATA_DIR` defaults to `~/.hero_db` (server-managed, no preflight needed). - **Dependencies**: none in TOML. - **Workspace layout**: virtual workspace (confirmed at `Cargo.toml:1-10`, `[workspace]`-only). - **Subcommand behaviour** (from TOML): server invoked as bare `hero_db_server`; UI invoked as `hero_db_ui serve`. Matches the `service_books.nu` pattern for the UI. - `--root` flag supported but optional; user-level default. ## Acceptance criteria - [ ] `use services/mod.nu *` makes `service_db` available. - [ ] `service_db install [--root] [--update]` clones `lhumina_code/hero_db`, builds all 3 binaries in release mode, installs to `~/hero/bin/` (or `/root/hero/bin/` with `--root`). - [ ] `service_db start [--reset] [--root] [--update]` registers both runtime actions + the service, starts, prints all three socket paths plus the TCP port in the summary. Idempotent without `--reset`. - [ ] `service_db status [--root]` reports state. - [ ] `service_db stop [--root]` cleanly unregisters. - [ ] Smoke-tested on Hetzner: install → start --reset → status shows running with 0 restarts → stop. ## Template & references - Template: `service_whiteboard.nu` (PR #83) / `service_collab.nu` (PR #85) — virtual-workspace baseline. - `service_books.nu` (PR #81) — reference for the `serve` subcommand pattern on the UI (used verbatim here). - Spec phase should also verify whether the server needs extra `kill_other.port: [6378]` to catch stale TCP binds on re-register (it probably does, given the TCP listener). ## Expected deviations from the baseline template - Server `kill_other` must cover THREE artifacts: `rpc.sock`, `resp.sock`, and TCP port `6378`. - UI `script` uses `serve` subcommand (per TOML). Server `script` is bare binary. - Everything else should match `service_whiteboard.nu` verbatim.
mahmoud self-assigned this 2026-04-19 19:44:24 +00:00
mahmoud added this to the ACTIVE project 2026-04-19 19:44:26 +00:00
mahmoud added this to the now milestone 2026-04-19 19:44:28 +00:00
Author
Owner

Implementation Spec for Issue #87

Objective

Add a service_db Nushell lifecycle module that registers, starts, stops, and queries the status of the hero_db service under hero_proc. The module supervises two binaries — hero_db_server (OpenRPC + RESP2 store) and hero_db_ui (HTTP dashboard over a Unix socket) — and ships the hero_db CLI alongside them without registering it as an action. Shape follows the service_whiteboard.nu / service_collab.nu baseline (pure two-binary virtual workspace, no preflight) with two narrow deviations borrowed from service_books.nu for the UI serve subcommand and an expanded kill_other on the server action so hero_proc can reclaim the RPC socket, the RESP2 Unix socket, and the RESP2 TCP port together.

Requirements

  • Service: hero_db, context core, class system, critical false.
  • Forge location: lhumina_code/hero_db. Virtual workspace with members crates/hero_db{,_server,_sdk,_ui,_app,_examples}; no root [package]. Plain cargo build --release produces every bin target — use svc_cargo_install unmodified (no --workspace hand-roll like service_books).
  • Binaries (SVX_BINARIES): hero_db, hero_db_server, hero_db_ui (matches buildenv.sh).
  • Actions (SVX_ACTIONS): hero_db_server, hero_db_ui. The hero_db CLI is installed but not registered (whiteboard/collab convention).
  • Server bind points (from hero_db_server/src/main.rs):
    • $HERO_SOCKET_DIR/hero_db/rpc.sock — OpenRPC (hero_proc health-checks this)
    • $HERO_SOCKET_DIR/hero_db/resp.sock — RESP2 Unix
    • TCP 0.0.0.0:6378 — RESP2 TCP (overridable via HERO_DB_PORT; not overridden here)
  • UI socket: $HERO_SOCKET_DIR/hero_db/ui.sock.
  • Env: RUST_LOG: "info" only on both actions. HERO_DB_PORT, HERO_DB_DATA_DIR, HERO_DB_ENCRYPTION_KEY, REDIS_ADMIN_SECRET all left at server defaults (data dir ~/.hero_db, no encryption, port 6378).
  • Dependencies: none. hero_db.toml has no depends_on; both binaries pre-create the socket dir and unlink stale sockets before bind. No preflight helper warranted.
  • Commands: install, start [--reset --update], stop, status. All accept --root (-r). Identical UX to service_whiteboard / service_collab.

Files to Modify / Create

File Change
tools/modules/services/service_db.nu New. Copy-rename of service_whiteboard.nu with the two deviations below.
tools/modules/services/mod.nu Modify. Append export use service_db.nu.

Implementation Plan

Each step maps 1:1 onto a block of service_whiteboard.nu. Deviations are called out explicitly.

Step 1: Header comment block (whiteboard lines 1–39)

Substitute every hero_whiteboardhero_db. Rewrite the functional description to:

  • hero_db_server — encrypted Redis-backed store exposing OpenRPC (HTTP/1.1 over Unix) plus RESP2 on both a Unix socket and TCP 6378.
  • hero_db_ui — HTTP dashboard over Unix socket.

Explicitly list all three server bind points so an operator reading the header understands why kill_other lists extras:

Binds:
  $HERO_SOCKET_DIR/hero_db/rpc.sock   — OpenRPC management (HTTP/1.1 over Unix)
  $HERO_SOCKET_DIR/hero_db/resp.sock  — RESP2 over Unix
  TCP 0.0.0.0:6378                    — RESP2 over TCP (override with HERO_DB_PORT)
  $HERO_SOCKET_DIR/hero_db/ui.sock    — UI dashboard

Keep the "No external dependencies / both binaries remove stale sockets before bind" paragraph verbatim. Keep the CLI note: "hero_db is the CLI; it is shipped alongside the runtime binaries but is NOT registered as a hero_proc action."

Step 2: Imports

Identical to whiteboard: use ../clients/proc.nu * + use ./lib.nu *. Do NOT import forge.nu — unlike service_books, hero_db uses the standard svc_cargo_install path.

Step 3: Constants

const SVX_SERVICE_NAME = "hero_db"
const SVX_FORGE_LOC    = "lhumina_code/hero_db"
const SVX_BINARIES     = ["hero_db" "hero_db_server" "hero_db_ui"]
const SVX_ACTIONS      = ["hero_db_server" "hero_db_ui"]

Step 4: svx_server_actionDEVIATION #1

Copy whiteboard's server action. Keep script: $bin (TOML uses bare binary), env: {RUST_LOG: "info"}, retry policy, stop signal/timeout, and health check unchanged.

Only change: kill_other must cover all three artifacts the binary binds. Replace the single-socket list with:

kill_other: {
    action: ""
    process_filters: []
    port: [6378]
    socket: [
        $"($sock_base)/hero_db/rpc.sock"
        $"($sock_base)/hero_db/resp.sock"
    ]
}

Rationale: on restart, hero_proc must reclaim any of the three listeners that may be stuck (stale process, TIME_WAIT port, orphaned socket inode) before the fresh hero_db_server can bind. Port 6378 is a literal because HERO_DB_PORT is not overridden here — if a future operator sets it, the action spec must be updated alongside. health_checks stays pinned to rpc.sock, which is the only endpoint the hero_proc OpenRPC probe can speak.

Step 5: svx_ui_actionDEVIATION #2

Copy whiteboard's UI action. Keep retry policy, env, timeouts, kill_other (single socket: ui.sock), and health_checks (pinned to ui.sock) unchanged.

Only change: invocation is script: $"($bin) serve". This mirrors hero_db.toml's exec = "__HERO_BIN__/hero_db_ui serve" verbatim. hero_db_ui has no clap and ignores the argument at runtime, but we stay faithful to the published TOML contract — same rationale as service_books.nu.

Step 6: svx_service_config

Identical structure. Description: "Hero DB — encrypted Redis-backed store with graph/vector/stream/ontology APIs and dashboard".

Step 7: svx_drop_registration

Identical to whiteboard — stop service, delete service, delete each action, all wrapped in try { ... } catch { }.

Step 8: install

Copy whiteboard verbatim. hero_db is a pure virtual workspace — cargo build --release builds every bin target in one pass and svc_cargo_install's release-dir preflight catches any misnamed binary before copy. Do NOT add the --workspace pre-step from service_books.

Step 9: start

Copy whiteboard verbatim, substituting names. No embedder preflight (books-only).

Update the final summary block to reflect the extras so operators can probe them directly:

  service  : hero_db
  actions  : hero_db_server, hero_db_ui
  state    : running | NOT running
  rpc  sock: $sock_base/hero_db/rpc.sock
  resp sock: $sock_base/hero_db/resp.sock
  resp tcp : 127.0.0.1:6378
  ui   sock: $sock_base/hero_db/ui.sock
  ui   url : http+unix://$ui_sock/
             served by hero_db_ui; reach the UI via hero_router
  commands :
    proc service status hero_db(flag)
    proc logs tail hero_db_server(flag)
    proc logs tail hero_db_ui(flag)

The resp sock and resp tcp lines are informational — they're additional bind points on the server process, not separate hero_proc actions.

Step 10: stop

Identical to whiteboard. svc_proc_healthy guard, then svx_drop_registration. No service-specific cleanup.

Step 11: status

Identical to whiteboard. svc_require_proc "service_db" $root then proc service status $SVX_SERVICE_NAME --root=$root.

Step 12: mod.nu

Append export use service_db.nu. Following the existing merge-order convention in the file (no alphabetical enforcement), append as a new line after service_collab.nu.

Smoke Test Plan (Hetzner, --root)

After service_proc start --root:

nu -c 'use tools/modules/services/service_db.nu; service_db install --root'
nu -c 'use tools/modules/services/service_db.nu; service_db start --root'
nu -c 'use tools/modules/services/service_db.nu; service_db status --root'

SOCK_BASE=/root/hero/var/sockets

# OpenRPC over UDS (what hero_proc health-checks)
curl --unix-socket "$SOCK_BASE/hero_db/rpc.sock" \
     -H 'Content-Type: application/json' \
     -d '{"jsonrpc":"2.0","id":1,"method":"rpc.discover"}' \
     http://localhost/

# UI over UDS
curl --unix-socket "$SOCK_BASE/hero_db/ui.sock" http://localhost/ -i

# RESP2 Unix
redis-cli -s "$SOCK_BASE/hero_db/resp.sock" ping   # expect PONG

# RESP2 TCP
redis-cli -h 127.0.0.1 -p 6378 ping               # expect PONG

# Reset path
nu -c 'use tools/modules/services/service_db.nu; service_db start --reset --root'
# Re-run the four probes — all four must still succeed, proving kill_other
# reclaimed rpc.sock + resp.sock + port 6378 across restart.

# Stop + verify unregistered
nu -c 'use tools/modules/services/service_db.nu; service_db stop --root'
proc service status hero_db --root  # expect "not found"

Expected: all four probes (rpc.sock, ui.sock, resp.sock, TCP 6378) succeed on initial start AND after --reset. Stop leaves no hero_db service / action entries in hero_proc.

Acceptance Criteria

  • service_db install [--root] clones lhumina_code/hero_db, runs cargo build --release, copies hero_db, hero_db_server, hero_db_ui into the correct bin dir.
  • service_db start [--root] brings up the service with both actions registered, health checks passing on rpc.sock and ui.sock, idempotent without --reset.
  • service_db start --reset [--root] drops prior registration and restarts cleanly even when stale sockets / TCP listeners are present.
  • service_db stop [--root] removes the service + both actions from hero_proc; re-run is a safe no-op.
  • service_db status [--root] returns the hero_proc status record when up; clean actionable error via svc_require_proc when hero_proc is down.
  • --root routes through root's hero_proc socket, uses /root/hero/bin + /root/hero/var/sockets, and validates passwordless sudo up front.
  • All four probes succeed on the Hetzner smoke run on first start and after --reset.
  • mod.nu exports service_db.

Notes

  • Workspace shape: pure virtual workspace (no root [package]), so plain cargo build --release at the root Cargo.toml builds every bin target. No --workspace accommodation needed. install is a verbatim copy of whiteboard's, not books'.
  • UI serve subcommand: hero_db_ui has no clap — the argument is ignored at runtime. We keep it in script anyway because the published hero_zero/services/hero_db.toml includes it and this module's job is to mirror that contract. If hero_db_ui later grows a real CLI, the TOML (and this module) already match.
  • TCP 6378 and resp.sock as extras: hero_db_server binds three artifacts in one process. They are not separate hero_proc actions — one action (hero_db_server), one health probe (rpc.sock). The RESP2 listeners ride on the same process lifecycle. The expanded kill_other list is the hero_proc-side reclaim mechanism that keeps restarts clean.
  • No preflight: unlike service_books, hero_db has no soft dependency and no depends_on. Both binaries create_dir_all the per-service socket dir and unlink stale sockets before bind. A preflight helper here would be dead code.
  • Env unset by design: HERO_DB_PORT left unset so the default (6378) matches the literal in kill_other.port. Any future override must update both places together — the kill_other.port list is the canonical enforcement point on restart.
  • No shared helper extraction: the serve-subcommand + multi-socket kill_other pattern is specific to hero_db here. Defer any refactor to lib.nu until a second service confirms which pattern is the outlier.
## Implementation Spec for Issue #87 ### Objective Add a `service_db` Nushell lifecycle module that registers, starts, stops, and queries the status of the `hero_db` service under `hero_proc`. The module supervises two binaries — `hero_db_server` (OpenRPC + RESP2 store) and `hero_db_ui` (HTTP dashboard over a Unix socket) — and ships the `hero_db` CLI alongside them without registering it as an action. Shape follows the `service_whiteboard.nu` / `service_collab.nu` baseline (pure two-binary virtual workspace, no preflight) with two narrow deviations borrowed from `service_books.nu` for the UI `serve` subcommand and an expanded `kill_other` on the server action so hero_proc can reclaim the RPC socket, the RESP2 Unix socket, and the RESP2 TCP port together. ### Requirements - **Service**: `hero_db`, context `core`, class `system`, critical `false`. - **Forge location**: `lhumina_code/hero_db`. Virtual workspace with members `crates/hero_db{,_server,_sdk,_ui,_app,_examples}`; no root `[package]`. Plain `cargo build --release` produces every bin target — use `svc_cargo_install` unmodified (no `--workspace` hand-roll like `service_books`). - **Binaries** (`SVX_BINARIES`): `hero_db`, `hero_db_server`, `hero_db_ui` (matches `buildenv.sh`). - **Actions** (`SVX_ACTIONS`): `hero_db_server`, `hero_db_ui`. The `hero_db` CLI is installed but not registered (whiteboard/collab convention). - **Server bind points** (from `hero_db_server/src/main.rs`): - `$HERO_SOCKET_DIR/hero_db/rpc.sock` — OpenRPC (hero_proc health-checks this) - `$HERO_SOCKET_DIR/hero_db/resp.sock` — RESP2 Unix - TCP `0.0.0.0:6378` — RESP2 TCP (overridable via `HERO_DB_PORT`; not overridden here) - **UI socket**: `$HERO_SOCKET_DIR/hero_db/ui.sock`. - **Env**: `RUST_LOG: "info"` only on both actions. `HERO_DB_PORT`, `HERO_DB_DATA_DIR`, `HERO_DB_ENCRYPTION_KEY`, `REDIS_ADMIN_SECRET` all left at server defaults (data dir `~/.hero_db`, no encryption, port 6378). - **Dependencies**: none. `hero_db.toml` has no `depends_on`; both binaries pre-create the socket dir and unlink stale sockets before bind. No preflight helper warranted. - **Commands**: `install`, `start [--reset --update]`, `stop`, `status`. All accept `--root (-r)`. Identical UX to `service_whiteboard` / `service_collab`. ### Files to Modify / Create | File | Change | | --- | --- | | `tools/modules/services/service_db.nu` | **New.** Copy-rename of `service_whiteboard.nu` with the two deviations below. | | `tools/modules/services/mod.nu` | **Modify.** Append `export use service_db.nu`. | ### Implementation Plan Each step maps 1:1 onto a block of `service_whiteboard.nu`. Deviations are called out explicitly. #### Step 1: Header comment block (whiteboard lines 1–39) Substitute every `hero_whiteboard` → `hero_db`. Rewrite the functional description to: - `hero_db_server` — encrypted Redis-backed store exposing OpenRPC (HTTP/1.1 over Unix) plus RESP2 on both a Unix socket and TCP 6378. - `hero_db_ui` — HTTP dashboard over Unix socket. Explicitly list all three server bind points so an operator reading the header understands why `kill_other` lists extras: ``` Binds: $HERO_SOCKET_DIR/hero_db/rpc.sock — OpenRPC management (HTTP/1.1 over Unix) $HERO_SOCKET_DIR/hero_db/resp.sock — RESP2 over Unix TCP 0.0.0.0:6378 — RESP2 over TCP (override with HERO_DB_PORT) $HERO_SOCKET_DIR/hero_db/ui.sock — UI dashboard ``` Keep the "No external dependencies / both binaries remove stale sockets before bind" paragraph verbatim. Keep the CLI note: "`hero_db` is the CLI; it is shipped alongside the runtime binaries but is NOT registered as a hero_proc action." #### Step 2: Imports Identical to whiteboard: `use ../clients/proc.nu *` + `use ./lib.nu *`. Do NOT import `forge.nu` — unlike `service_books`, hero_db uses the standard `svc_cargo_install` path. #### Step 3: Constants ```nu const SVX_SERVICE_NAME = "hero_db" const SVX_FORGE_LOC = "lhumina_code/hero_db" const SVX_BINARIES = ["hero_db" "hero_db_server" "hero_db_ui"] const SVX_ACTIONS = ["hero_db_server" "hero_db_ui"] ``` #### Step 4: `svx_server_action` — **DEVIATION #1** Copy whiteboard's server action. Keep `script: $bin` (TOML uses bare binary), `env: {RUST_LOG: "info"}`, retry policy, stop signal/timeout, and health check unchanged. **Only change**: `kill_other` must cover all three artifacts the binary binds. Replace the single-socket list with: ```nu kill_other: { action: "" process_filters: [] port: [6378] socket: [ $"($sock_base)/hero_db/rpc.sock" $"($sock_base)/hero_db/resp.sock" ] } ``` Rationale: on restart, hero_proc must reclaim any of the three listeners that may be stuck (stale process, TIME_WAIT port, orphaned socket inode) before the fresh `hero_db_server` can bind. Port `6378` is a literal because `HERO_DB_PORT` is not overridden here — if a future operator sets it, the action spec must be updated alongside. `health_checks` stays pinned to `rpc.sock`, which is the only endpoint the hero_proc OpenRPC probe can speak. #### Step 5: `svx_ui_action` — **DEVIATION #2** Copy whiteboard's UI action. Keep retry policy, env, timeouts, `kill_other` (single socket: `ui.sock`), and `health_checks` (pinned to `ui.sock`) unchanged. **Only change**: invocation is `script: $"($bin) serve"`. This mirrors `hero_db.toml`'s `exec = "__HERO_BIN__/hero_db_ui serve"` verbatim. `hero_db_ui` has no clap and ignores the argument at runtime, but we stay faithful to the published TOML contract — same rationale as `service_books.nu`. #### Step 6: `svx_service_config` Identical structure. Description: `"Hero DB — encrypted Redis-backed store with graph/vector/stream/ontology APIs and dashboard"`. #### Step 7: `svx_drop_registration` Identical to whiteboard — stop service, delete service, delete each action, all wrapped in `try { ... } catch { }`. #### Step 8: `install` Copy whiteboard verbatim. `hero_db` is a pure virtual workspace — `cargo build --release` builds every bin target in one pass and `svc_cargo_install`'s release-dir preflight catches any misnamed binary before copy. Do NOT add the `--workspace` pre-step from `service_books`. #### Step 9: `start` Copy whiteboard verbatim, substituting names. No embedder preflight (books-only). Update the final summary block to reflect the extras so operators can probe them directly: ``` service : hero_db actions : hero_db_server, hero_db_ui state : running | NOT running rpc sock: $sock_base/hero_db/rpc.sock resp sock: $sock_base/hero_db/resp.sock resp tcp : 127.0.0.1:6378 ui sock: $sock_base/hero_db/ui.sock ui url : http+unix://$ui_sock/ served by hero_db_ui; reach the UI via hero_router commands : proc service status hero_db(flag) proc logs tail hero_db_server(flag) proc logs tail hero_db_ui(flag) ``` The `resp sock` and `resp tcp` lines are informational — they're additional bind points on the server process, not separate hero_proc actions. #### Step 10: `stop` Identical to whiteboard. `svc_proc_healthy` guard, then `svx_drop_registration`. No service-specific cleanup. #### Step 11: `status` Identical to whiteboard. `svc_require_proc "service_db" $root` then `proc service status $SVX_SERVICE_NAME --root=$root`. #### Step 12: `mod.nu` Append `export use service_db.nu`. Following the existing merge-order convention in the file (no alphabetical enforcement), append as a new line after `service_collab.nu`. ### Smoke Test Plan (Hetzner, `--root`) After `service_proc start --root`: ```sh nu -c 'use tools/modules/services/service_db.nu; service_db install --root' nu -c 'use tools/modules/services/service_db.nu; service_db start --root' nu -c 'use tools/modules/services/service_db.nu; service_db status --root' SOCK_BASE=/root/hero/var/sockets # OpenRPC over UDS (what hero_proc health-checks) curl --unix-socket "$SOCK_BASE/hero_db/rpc.sock" \ -H 'Content-Type: application/json' \ -d '{"jsonrpc":"2.0","id":1,"method":"rpc.discover"}' \ http://localhost/ # UI over UDS curl --unix-socket "$SOCK_BASE/hero_db/ui.sock" http://localhost/ -i # RESP2 Unix redis-cli -s "$SOCK_BASE/hero_db/resp.sock" ping # expect PONG # RESP2 TCP redis-cli -h 127.0.0.1 -p 6378 ping # expect PONG # Reset path nu -c 'use tools/modules/services/service_db.nu; service_db start --reset --root' # Re-run the four probes — all four must still succeed, proving kill_other # reclaimed rpc.sock + resp.sock + port 6378 across restart. # Stop + verify unregistered nu -c 'use tools/modules/services/service_db.nu; service_db stop --root' proc service status hero_db --root # expect "not found" ``` Expected: all four probes (rpc.sock, ui.sock, resp.sock, TCP 6378) succeed on initial start AND after `--reset`. Stop leaves no hero_db service / action entries in hero_proc. ### Acceptance Criteria - [ ] `service_db install [--root]` clones `lhumina_code/hero_db`, runs `cargo build --release`, copies `hero_db`, `hero_db_server`, `hero_db_ui` into the correct bin dir. - [ ] `service_db start [--root]` brings up the service with both actions registered, health checks passing on `rpc.sock` and `ui.sock`, idempotent without `--reset`. - [ ] `service_db start --reset [--root]` drops prior registration and restarts cleanly even when stale sockets / TCP listeners are present. - [ ] `service_db stop [--root]` removes the service + both actions from hero_proc; re-run is a safe no-op. - [ ] `service_db status [--root]` returns the hero_proc status record when up; clean actionable error via `svc_require_proc` when hero_proc is down. - [ ] `--root` routes through root's hero_proc socket, uses `/root/hero/bin` + `/root/hero/var/sockets`, and validates passwordless sudo up front. - [ ] All four probes succeed on the Hetzner smoke run on first start and after `--reset`. - [ ] `mod.nu` exports `service_db`. ### Notes - **Workspace shape**: pure virtual workspace (no root `[package]`), so plain `cargo build --release` at the root `Cargo.toml` builds every bin target. No `--workspace` accommodation needed. `install` is a verbatim copy of whiteboard's, not books'. - **UI `serve` subcommand**: `hero_db_ui` has no clap — the argument is ignored at runtime. We keep it in `script` anyway because the published `hero_zero/services/hero_db.toml` includes it and this module's job is to mirror that contract. If `hero_db_ui` later grows a real CLI, the TOML (and this module) already match. - **TCP 6378 and `resp.sock` as extras**: `hero_db_server` binds three artifacts in one process. They are not separate hero_proc actions — one action (`hero_db_server`), one health probe (`rpc.sock`). The RESP2 listeners ride on the same process lifecycle. The expanded `kill_other` list is the hero_proc-side reclaim mechanism that keeps restarts clean. - **No preflight**: unlike `service_books`, hero_db has no soft dependency and no `depends_on`. Both binaries `create_dir_all` the per-service socket dir and unlink stale sockets before bind. A preflight helper here would be dead code. - **Env unset by design**: `HERO_DB_PORT` left unset so the default (6378) matches the literal in `kill_other.port`. Any future override must update both places together — the `kill_other.port` list is the canonical enforcement point on restart. - **No shared helper extraction**: the `serve`-subcommand + multi-socket `kill_other` pattern is specific to hero_db here. Defer any refactor to `lib.nu` until a second service confirms which pattern is the outlier.
Author
Owner

Implementation summary

Changes

  • Added tools/modules/services/service_db.nu — ~330 lines, copy-rename of service_whiteboard.nu with the two spec-approved deviations applied (expanded server kill_other covering rpc.sock + resp.sock + TCP 6378, and UI script: "<bin> serve" mirroring the TOML).
  • Updated tools/modules/services/mod.nu — added export use service_db.nu.

End-to-end smoke test on Hetzner

# Assertion Result
1a service_db status --root with hero_proc down → actionable error pointing to service_proc start --root PASS
1b service_db stop --root with hero_proc down → benign warning, no error PASS
1c service_db start --root with hero_proc down → actionable error PASS
2a service_proc start --root healthy PASS
2b service_db install --root produced 3 binaries in /root/hero/bin/ PASS
2c service_db start --reset --root registers + starts PASS
2d rpc.sock present (OpenRPC) PASS
2e resp.sock present (RESP2 Unix) PASS
2f ui.sock present (UI) PASS
2g TCP 6378 listening, owned by hero_db_server PASS
2h curl --unix-socket rpc.sock rpc.discover returns OpenRPC doc PASS
2i curl --unix-socket ui.sock / → HTTP 200 (198k body) PASS
2j redis-cli -s resp.sock ping → PONG PASS
2k redis-cli -h 127.0.0.1 -p 6378 ping → PONG PASS
2l status{name: hero_db, state: running, restarts: 0, pid: 3597906} PASS
2m Idempotent start (no --reset) prints "already running" PASS
2n 15 s observation — current_run_id stable at 13, restarts: 0, state running PASS
2o start --reset --root while running — all 5 probes (rpc/resp/ui/resp-ping/tcp-ping) pass again after restart, proving kill_other reclaimed all three server bind points PASS
2p service_db stop --root stops + unregisters PASS
2q Post-stop status returns expected service 'hero_db' not found PASS
2r Post-stop: no hero_db_server/hero_db_ui processes, TCP 6378 released, socket files removed (/root/hero/var/sockets/hero_db/ empty) PASS

Deviations from baseline template — confirmed behaving as intended

  • Server kill_other on rpc.sock + resp.sock + port 6378: verified end-to-end by 2o — running a --reset restart against a live service reclaimed every bind point cleanly. No stale-listener / EADDRINUSE on the new process.
  • UI script: "<bin> serve": the serve argument is ignored by hero_db_ui's main (no clap), but the module stays faithful to hero_zero/services/hero_db.toml's exec line. Confirmed the UI starts and serves correctly.

Observation: better than whiteboard on shutdown

Both hero_db_server and hero_db_ui clean up their Unix sockets on SIGTERM — after stop, the per-service socket directory is empty. No stale inodes left behind (unlike hero_whiteboard, which required kill_other.socket cleanup on next start).

Acceptance criteria

  • Module loadable via use services/mod.nu * or use services/service_db.nu *.
  • install builds 3 binaries and places them in ~/hero/bin/ (or /root/hero/bin/ with --root).
  • start registers both actions + the service, starts, surfaces all four endpoints (rpc.sock, resp.sock, resp tcp, ui.sock) plus UI URL in the summary.
  • start --reset tears down and reclaims all three server bind points cleanly.
  • status reports the hero_proc record.
  • stop cleanly unregisters.
  • --root works end-to-end with passwordless sudo.
  • Smoke-tested end-to-end on the Hetzner box — every assertion green.
## Implementation summary ### Changes - Added `tools/modules/services/service_db.nu` — ~330 lines, copy-rename of `service_whiteboard.nu` with the two spec-approved deviations applied (expanded server `kill_other` covering `rpc.sock` + `resp.sock` + TCP 6378, and UI `script: "<bin> serve"` mirroring the TOML). - Updated `tools/modules/services/mod.nu` — added `export use service_db.nu`. ### End-to-end smoke test on Hetzner | # | Assertion | Result | |---|---|---| | 1a | `service_db status --root` with hero_proc down → actionable error pointing to `service_proc start --root` | PASS | | 1b | `service_db stop --root` with hero_proc down → benign warning, no error | PASS | | 1c | `service_db start --root` with hero_proc down → actionable error | PASS | | 2a | `service_proc start --root` healthy | PASS | | 2b | `service_db install --root` produced 3 binaries in `/root/hero/bin/` | PASS | | 2c | `service_db start --reset --root` registers + starts | PASS | | 2d | `rpc.sock` present (OpenRPC) | PASS | | 2e | `resp.sock` present (RESP2 Unix) | PASS | | 2f | `ui.sock` present (UI) | PASS | | 2g | TCP 6378 listening, owned by `hero_db_server` | PASS | | 2h | `curl --unix-socket rpc.sock` `rpc.discover` returns OpenRPC doc | PASS | | 2i | `curl --unix-socket ui.sock /` → HTTP 200 (198k body) | PASS | | 2j | `redis-cli -s resp.sock ping` → PONG | PASS | | 2k | `redis-cli -h 127.0.0.1 -p 6378 ping` → PONG | PASS | | 2l | `status` → `{name: hero_db, state: running, restarts: 0, pid: 3597906}` | PASS | | 2m | Idempotent `start` (no `--reset`) prints "already running" | PASS | | 2n | 15 s observation — `current_run_id` stable at 13, `restarts: 0`, state `running` | PASS | | 2o | `start --reset --root` while running — all 5 probes (rpc/resp/ui/resp-ping/tcp-ping) pass again after restart, proving `kill_other` reclaimed all three server bind points | PASS | | 2p | `service_db stop --root` stops + unregisters | PASS | | 2q | Post-stop `status` returns expected `service 'hero_db' not found` | PASS | | 2r | Post-stop: no `hero_db_server`/`hero_db_ui` processes, TCP 6378 released, socket files removed (`/root/hero/var/sockets/hero_db/` empty) | PASS | ### Deviations from baseline template — confirmed behaving as intended - **Server `kill_other` on rpc.sock + resp.sock + port 6378**: verified end-to-end by 2o — running a `--reset` restart against a live service reclaimed every bind point cleanly. No stale-listener / EADDRINUSE on the new process. - **UI `script: "<bin> serve"`**: the `serve` argument is ignored by `hero_db_ui`'s main (no clap), but the module stays faithful to `hero_zero/services/hero_db.toml`'s `exec` line. Confirmed the UI starts and serves correctly. ### Observation: better than whiteboard on shutdown Both `hero_db_server` and `hero_db_ui` clean up their Unix sockets on `SIGTERM` — after `stop`, the per-service socket directory is empty. No stale inodes left behind (unlike `hero_whiteboard`, which required `kill_other.socket` cleanup on next start). ### Acceptance criteria - [x] Module loadable via `use services/mod.nu *` or `use services/service_db.nu *`. - [x] `install` builds 3 binaries and places them in `~/hero/bin/` (or `/root/hero/bin/` with `--root`). - [x] `start` registers both actions + the service, starts, surfaces all four endpoints (rpc.sock, resp.sock, resp tcp, ui.sock) plus UI URL in the summary. - [x] `start --reset` tears down and reclaims all three server bind points cleanly. - [x] `status` reports the hero_proc record. - [x] `stop` cleanly unregisters. - [x] `--root` works end-to-end with passwordless sudo. - [x] Smoke-tested end-to-end on the Hetzner box — every assertion green.
Author
Owner

PR opened: #88

PR opened: https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/88
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_skills#87
No description provided.