feat(service_manager): Hero Service Manager + all 33 services (closes #90) #91

Closed
mik-tf wants to merge 2 commits from development_mik_service_manager into development
Owner

Closes #90.

Plan posted upstream: #90 (comment)

What this delivers

A third Unix socket — service.sock alongside rpc.sock and ui.sock — serving 14 service.* JSON-RPC methods over its own OpenRPC domain. Wraps hero_proc_sdk plus a build/install pipeline behind agent-introspectable methods.

$HERO_SOCKET_DIR/hero_router/
├── rpc.sock        ← existing: router.* JSON-RPC
├── ui.sock         ← existing: admin dashboard
└── service.sock    ← NEW: service.* JSON-RPC

RPC surface (service.*)

list / inspect / status / health / build / install / start / stop / restart / delete / upgrade / verify / logs / troubleshoot. Spec at static/service_manager.openrpc.json, served on service.sock GET /openrpc.json.

ServiceDefinition — pure data with typed extension points

BindStrategy / Extra / InstallPolicy / ArgSource / EnvSource / Resolver. Adding a new service is one file under service_manager/services/<name>.rs plus one line in services::all() — no new code paths.

Coverage — every Hero service (33 ports)

Every service_<name>.nu module under hero_skills/nutools/modules/services/ is represented in the registry:

hero_agent, hero_aibroker, hero_biz, hero_books, hero_browser,
hero_claude, hero_code, code_indexer, hero_codescalers, hero_collab,
hero_compute, hero_db, hero_editor, hero_embedder, hero_foundry,
hero_indexer, hero_livekit, hero_logic, hero_mail, hero_matrixchat,
hero_office, hero_os, hero_osis, hero_planner, hero_proxy,
hero_router, hero_runner_rhai, hero_shrimp, hero_slides, hero_voice,
hero_wallet, hero_whiteboard, mycelium

Documented exceptions (in services/mod.rs):

  • hero_proc — circular: the manager is a hero_proc client, can't supervise its own supervisor.
  • hero_onlyoffice — Docker container; engine doesn't issue docker run actions yet.
  • hero_do — installer-only nu module; no daemon to supervise.
  • service_core.nu — empty meta-module.

Source vs download — both first-class

InstallPolicy::Either { asset_suffix } is the default. ELF-magic verification, freshness check (catches "rebuilt but supervisor running old binary"), atomic rename + chmod. Source path shells out to cargo build --release in the service's checkout under $CODEROOT.

Env resolution

  • Static("...") — literal
  • FromCallerEnv("KEY", "default") — from manager's process env (mirrors operator-shell env-passthrough that nu modules use today; wired into hero_agent / hero_claude / hero_office / hero_proxy / hero_shrimp)
  • FromHeroProcSecret("KEY") — stub; reserved for runtime-side resolution per hero_proc_meta
  • Resolved(Resolver::*) — typed computed values (sockets, ports, paths, mycelium addr)

CLI mirrors the agent

hero_router service <op> connects to service.sock over UDS — same dispatcher path as the agent, no in-process shortcut.

Documentation (#90 deliverables)

  • crates/hero_router/docs/service_manager/README.md — developer guide
  • crates/hero_router/docs/service_manager/migration.md — 4-phase plan from nu → manager
  • crates/hero_router/docs/service_manager/removal.md — strict bottom-up deletion order

Workspace gate

  • cargo fmt --check
  • cargo clippy -p hero_router --all-targets -- -D warnings
  • cargo build -p hero_router --release
  • cargo test -p hero_router125 tests (9 dispatcher e2e + 11 unit + 105 pre-existing)

Test plan

  • Compile + clippy + tests green
  • CLI surface (hero_router service --help) lists all 13 ops
  • list_includes_full_service_set test fails if any documented service is missing
  • Manual smoke (next session): hero_proc + hero_router → walk a few services through install / start / health / troubleshoot on heroci
  • Heroci validation in a follow-up session per feedback_no_direct_push_except_hero_demo.md

Per-service caveats (search Limitation: in services/*.rs)

Some nu modules paper over wrinkles the engine doesn't yet model:

  • LiveKit credential bootstrap (hero_collab)
  • ONNX Runtime auto-install (hero_voice / hero_embedder / hero_editor — declared via Extra::OnnxRuntime, but install is operator-side today)
  • Cascade mother variant (hero_aibroker --split)
  • Multi-instance suffix (hero_codescalers / mycelium — instance 0 only)
  • Mycelium IPv6 auto-detect (hero_router — BindStrategy::Mycelium declared but engine treats as Loopback)

The migration plan (Phase 2 — feature parity) tracks closing each gap as a follow-up issue. Until then, the nu modules remain the production path for the affected services and the manager handles the simple-shape majority.

Out-of-scope follow-ups

Tracked in docs/service_manager/migration.md:

  1. LiveKit auto-bootstrap
  2. ONNX Runtime install (extends Extra::OnnxRuntime)
  3. BindStrategy::Mycelium engine support
  4. Multi-instance suffix support
  5. Cascade --split mother/child
  6. Docker-action support for hero_onlyoffice
  7. Admin UI dashboard pane on ui.sock

Signed-off-by: mik-tf

Closes https://forge.ourworld.tf/lhumina_code/hero_router/issues/90. Plan posted upstream: https://forge.ourworld.tf/lhumina_code/hero_router/issues/90#issuecomment-30677 ## What this delivers A third Unix socket — `service.sock` alongside `rpc.sock` and `ui.sock` — serving 14 `service.*` JSON-RPC methods over its own OpenRPC domain. Wraps `hero_proc_sdk` plus a build/install pipeline behind agent-introspectable methods. ``` $HERO_SOCKET_DIR/hero_router/ ├── rpc.sock ← existing: router.* JSON-RPC ├── ui.sock ← existing: admin dashboard └── service.sock ← NEW: service.* JSON-RPC ``` ### RPC surface (`service.*`) `list` / `inspect` / `status` / `health` / `build` / `install` / `start` / `stop` / `restart` / `delete` / `upgrade` / `verify` / `logs` / `troubleshoot`. Spec at `static/service_manager.openrpc.json`, served on `service.sock GET /openrpc.json`. ### ServiceDefinition — pure data with typed extension points `BindStrategy` / `Extra` / `InstallPolicy` / `ArgSource` / `EnvSource` / `Resolver`. Adding a new service is **one file** under `service_manager/services/<name>.rs` plus one line in `services::all()` — no new code paths. ### Coverage — every Hero service (33 ports) Every `service_<name>.nu` module under `hero_skills/nutools/modules/services/` is represented in the registry: ``` hero_agent, hero_aibroker, hero_biz, hero_books, hero_browser, hero_claude, hero_code, code_indexer, hero_codescalers, hero_collab, hero_compute, hero_db, hero_editor, hero_embedder, hero_foundry, hero_indexer, hero_livekit, hero_logic, hero_mail, hero_matrixchat, hero_office, hero_os, hero_osis, hero_planner, hero_proxy, hero_router, hero_runner_rhai, hero_shrimp, hero_slides, hero_voice, hero_wallet, hero_whiteboard, mycelium ``` Documented exceptions (in `services/mod.rs`): - **`hero_proc`** — circular: the manager is a hero_proc client, can't supervise its own supervisor. - **`hero_onlyoffice`** — Docker container; engine doesn't issue `docker run` actions yet. - **`hero_do`** — installer-only nu module; no daemon to supervise. - **`service_core.nu`** — empty meta-module. ### Source vs download — both first-class `InstallPolicy::Either { asset_suffix }` is the default. ELF-magic verification, freshness check (catches "rebuilt but supervisor running old binary"), atomic rename + chmod. Source path shells out to `cargo build --release` in the service's checkout under `$CODEROOT`. ### Env resolution - `Static("...")` — literal - `FromCallerEnv("KEY", "default")` — from manager's process env (mirrors operator-shell env-passthrough that nu modules use today; wired into hero_agent / hero_claude / hero_office / hero_proxy / hero_shrimp) - `FromHeroProcSecret("KEY")` — stub; reserved for runtime-side resolution per `hero_proc_meta` - `Resolved(Resolver::*)` — typed computed values (sockets, ports, paths, mycelium addr) ### CLI mirrors the agent `hero_router service <op>` connects to `service.sock` over UDS — same dispatcher path as the agent, no in-process shortcut. ### Documentation (#90 deliverables) - `crates/hero_router/docs/service_manager/README.md` — developer guide - `crates/hero_router/docs/service_manager/migration.md` — 4-phase plan from nu → manager - `crates/hero_router/docs/service_manager/removal.md` — strict bottom-up deletion order ## Workspace gate - `cargo fmt --check` ✅ - `cargo clippy -p hero_router --all-targets -- -D warnings` ✅ - `cargo build -p hero_router --release` ✅ - `cargo test -p hero_router` — **125 tests** (9 dispatcher e2e + 11 unit + 105 pre-existing) ✅ ## Test plan - [x] Compile + clippy + tests green - [x] CLI surface (`hero_router service --help`) lists all 13 ops - [x] `list_includes_full_service_set` test fails if any documented service is missing - [ ] Manual smoke (next session): hero_proc + hero_router → walk a few services through `install / start / health / troubleshoot` on heroci - [ ] Heroci validation in a follow-up session per `feedback_no_direct_push_except_hero_demo.md` ## Per-service caveats (search `Limitation:` in `services/*.rs`) Some nu modules paper over wrinkles the engine doesn't yet model: - LiveKit credential bootstrap (hero_collab) - ONNX Runtime auto-install (hero_voice / hero_embedder / hero_editor — declared via `Extra::OnnxRuntime`, but install is operator-side today) - Cascade mother variant (hero_aibroker `--split`) - Multi-instance suffix (hero_codescalers / mycelium — instance 0 only) - Mycelium IPv6 auto-detect (hero_router — `BindStrategy::Mycelium` declared but engine treats as Loopback) The migration plan (Phase 2 — feature parity) tracks closing each gap as a follow-up issue. Until then, the nu modules remain the production path for the affected services and the manager handles the simple-shape majority. ## Out-of-scope follow-ups Tracked in `docs/service_manager/migration.md`: 1. LiveKit auto-bootstrap 2. ONNX Runtime install (extends `Extra::OnnxRuntime`) 3. `BindStrategy::Mycelium` engine support 4. Multi-instance suffix support 5. Cascade `--split` mother/child 6. Docker-action support for hero_onlyoffice 7. Admin UI dashboard pane on `ui.sock` Signed-off-by: mik-tf
feat(service_manager): Hero Service Manager subsystem on service.sock
All checks were successful
Build & Test / check (pull_request) Successful in 2m50s
4ed03d9e33
Implements #90.

Adds a third Unix socket — service.sock alongside rpc.sock and ui.sock —
serving service.* JSON-RPC under its own OpenRPC domain. Wraps
hero_proc_sdk plus a build/install pipeline behind 14 agent-introspectable
methods (list / inspect / status / health / build / install / start /
stop / restart / delete / upgrade / verify / logs / troubleshoot).

ServiceDefinition is pure data with typed extension points
(BindStrategy / Extra / InstallPolicy / ArgSource / EnvSource /
Resolver) — adding a service is one file under services/ plus one line
in services::all(). Two services ported as the validation set:
hero_db (musl assets, RESP TCP 6378) and hero_books (gnu assets,
HERO_BOOKS_DATA + HERO_EMBEDDER_URL env).

Both source build (cargo) and forge release download paths are
first-class, including ELF-magic verification, freshness check, and
mtime-protected atomic placement. CLI mirrors the agent: hero_router
service <op> connects to service.sock — same code path as RPC.

Existing nu modules under hero_skills keep working in parallel; this
PR adds capability, doesn't remove anything. Mycelium binding and ONNX
overlay extras compile in but are stubbed for follow-up PRs.

Workspace gate green: cargo fmt --check + clippy -D warnings + release
build + 124 tests (8 new dispatcher e2e tests + 11 unit tests across
definition/build/install/ops_log/registry).

Signed-off-by: mik-tf
feat(service_manager): port all 33 services + docs (closes #90)
All checks were successful
Build & Test / check (pull_request) Successful in 2m7s
19e8bac552
Completes the META at #90.

Adds ServiceDefinition values for every Hero service that has a
service_<name>.nu module under hero_skills:

  hero_agent, hero_aibroker, hero_biz, hero_books, hero_browser,
  hero_claude, hero_code, code_indexer, hero_codescalers, hero_collab,
  hero_compute, hero_db, hero_editor, hero_embedder, hero_foundry,
  hero_indexer, hero_livekit, hero_logic, hero_mail, hero_matrixchat,
  hero_office, hero_os, hero_osis, hero_planner, hero_proxy,
  hero_router, hero_runner_rhai, hero_shrimp, hero_slides, hero_voice,
  hero_wallet, hero_whiteboard, mycelium    (33 services)

Documented exceptions (services/mod.rs):
  - hero_proc       — circular (manager is a hero_proc client)
  - hero_onlyoffice — Docker shape; engine doesn't issue docker run yet
  - hero_do         — installer-only nu module; not a daemon
  - service_core.nu — empty meta-module

Adds EnvSource::FromCallerEnv(key, default) for forwarding API keys and
HERO_* knobs from the manager's process env into the action spec —
mirrors how the legacy nu modules forwarded operator-shell env into
hero_proc actions. Wired into hero_agent / hero_claude / hero_office /
hero_proxy / hero_shrimp.

Adds three documentation files under docs/service_manager/:
  - README.md   — developer guide (RPC surface, ServiceDefinition skeleton,
                  build paths, env resolution, error codes, source layout)
  - migration.md — 4-phase plan from nu → manager (coexistence → parity →
                  cutover → removal)
  - removal.md   — strict bottom-up deletion order for nu / Makefile /
                  buildenv.sh once parity is verified

Adds a registry-coverage test (list_includes_full_service_set) that fails
if any documented service is missing from the registry.

Workspace gate green: cargo fmt --check + clippy -D warnings + release
build + 125 tests (9 dispatcher e2e + 11 unit + 105 pre-existing).

Signed-off-by: mik-tf
mik-tf changed title from feat(service_manager): Hero Service Manager subsystem on service.sock (#90) to feat(service_manager): Hero Service Manager + all 33 services (closes #90) 2026-05-07 17:40:30 +00:00
Author
Owner

Superseded by v2

Per Kristof's redirection (full discussion on issue #90): the ServiceDefinition schema + interpreter pattern was over-engineered. Each service should be Rust code that uses hero_proc_sdk + shared helpers directly — like the existing nu modules, just in Rust.

This PR is being closed. The work continues on development_mik_service_manager_v2, keeping ~60% of this PR (third socket, dispatcher, helpers, docs, OpenRPC spec, CLI subcommand) and rewriting the per-service files as HeroService trait impls.

Will open a fresh PR shortly.

## Superseded by v2 Per Kristof's redirection (full discussion on [issue #90](https://forge.ourworld.tf/lhumina_code/hero_router/issues/90#issuecomment-30721)): the `ServiceDefinition` schema + interpreter pattern was over-engineered. Each service should be Rust code that uses `hero_proc_sdk` + shared helpers directly — like the existing nu modules, just in Rust. This PR is being closed. The work continues on `development_mik_service_manager_v2`, keeping ~60% of this PR (third socket, dispatcher, helpers, docs, OpenRPC spec, CLI subcommand) and rewriting the per-service files as `HeroService` trait impls. Will open a fresh PR shortly.
mik-tf closed this pull request 2026-05-07 17:59:42 +00:00
All checks were successful
Build & Test / check (pull_request) Successful in 2m7s

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_router!91
No description provided.