lab service --status/--stop can't see CLI-registered services (SERVICE_MAP vs service.toml/hero_proc registry) #308
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_skills#308
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
lab service <name> --status(and--stop) cannot see services that were registered with hero_proc by a service's own CLI (e.g.hero_collab --start), because lab and the per-service CLIs register services into hero_proc with two different, incompatible shapes. lab reportsstate=inactive / pid=0for services that are in factrunning.This is a symptom of a deeper issue: lab's hardcoded
SERVICE_MAPis a parallel source of truth that diverges from each binary'sservice.tomland from hero_proc's live registry.Evidence
After
hero_collab --start --auth-mode dev --seed-dev-users, hero_proc'sservice.listcontains:hero_proc
job.list(ground truth) shows the collab jobs running:But
lab service hero_collab --statusreports:Root cause: two registration models, not just two namespaces
hero_collab)ServiceBuilder::new("hero_collab")+ named member actionshero_collab→ actionshero_collab_server,hero_collab_web; running jobs namedhero_collab.<action>SERVICE_MAP(crates/lab/src/service/service_manager.rs:508) → one hero_proc service per binaryhero_planner_server,hero_planner_web; no parent grouplab'sdo_status(binary)(service_manager.rs:73) callsservice_status(name=<binary>)per binary. For lab's own flat registrations that matches. For a CLI's grouped registration, the live job ishero_collab.hero_collab_web(an action under a service) while lab queries a flathero_collab_webservice — which is an empty/stub entry →inactive / pid 0.Why this matters
lab service <name> --status|--stopsilently misreports CLI-started services as down. An operator can't manage (or even see) them via lab.hero_collabandhero_collab_serverandhero_collab_web).Proposed direction (align on a single source of truth)
The supervisor (hero_proc) is the source of truth; the canonical service identity should come from each binary's
service.toml(which already declares[service] name = "…"+ member binaries/sockets —lab infocheckalready reads these). Both lab and the per-service CLIs should defer to that, rather than either tool owning the namespace.service.tomldescribes and what the per-service CLIs already do. lab's flat-per-binary model is the outlier.SERVICE_MAPin favour of dynamic discovery fromservice.toml— this is already the documented long-term intent (lab.md: "SERVICE_MAP is meant to be replaced by dynamic discovery from each binary's embedded service.toml").lab … --statusshould query hero_proc's registry by canonical service name and list whatever actions hero_proc reports — so it sees CLI-registered services for free.lab … --startshould register grouped (service + member actions) like the CLIs, so the two tools converge on one shape instead of producing duplicate entries.A smaller, immediate mitigation (if the full migration is out of scope short-term): have
lab … --statusfall back to matchingservice.<binary>-named jobs in hero_proc'sjob.listwhen the flatservice_status(<binary>)lookup returns not-found/inactive — so at minimum lab stops misreporting running CLI-started services as down.Repro
Found while bringing up hero_collab/hero_planner via both
laband the per-service CLI on 2026-06-03.