lab: model sibling-supervised daemons (supervised flag) so lab build --start stops starting children like lk-backend #315
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_skills#315
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
lab build --start/--restartstarts every binary whoseservice.tomlkind ∈ {server, admin, web}as a standalone hero_proc service. But some long-running daemons are supervised by a sibling binary, not by hero_proc — e.g.hero_livekit'slk-backendandlivekit-serverare spawned/managed byhero_livekit_servervia itsstart()RPC. lab startslk-backendstandalone, it lacks the env/config the parent injects, and CI/build fails:Root cause — a modeling gap
kindconflates two orthogonal properties:lk-backendis genuinely aserver(long-running daemon), but its lifecycle is owned byhero_livekit_server. There's no way to express that, so thekind-only filter starts it.6aac3e8(register hero_livekit inSERVICE_MAP) fixed one path —lab service hero_livekitconsults the hardcodedSERVICE_MAP, which correctly lists onlyhero_livekit_server+hero_livekit_admin. But thelab build --start/fast_teardownpath does not consultSERVICE_MAP— it re-derives fromkind(fast_teardown.rs:422,fast_teardown.rs:520,service_manager.rs:2179) and still startslk-backend. The two paths disagree.Proposed fix — a first-class
supervisedflag on[[binaries]]1. hero_lib —
crates/core/src/base/service.rs, add toBinary:2. lab — collapse the 3 duplicated checks into one predicate + the guard:
Apply at
service_manager.rs:2179,fast_teardown.rs:422,fast_teardown.rs:520. Now bothlab serviceandlab build --startagree.3. hero_livekit
service.toml(×4) — keep the accurate kind, declare ownership:4. (optional) retire the
SERVICE_MAPhero_livekit entry from6aac3e8— the flag now covers every path, so the hardcoded curation becomes redundant (one source of truth).Why this design
kind) from lifecycle-ownership (supervised). No semantic lie (cf. relabellinglk-backendtokind=cli).SERVICE_MAPentry.ServiceToml/Binarystructs have no#[serde(deny_unknown_fields)]and the field is#[serde(default)]. So old lab + newservice.toml→ ignores the field; new lab + oldservice.toml→ defaultsfalse. Both directions safe, any merge order.Alternatives considered
Kind::Backendvariant — breaks every exhaustivematchonKindecosystem-wide. Rejected (additive bool is non-breaking).fast_teardownreadSERVICE_MAP— keeps the centralized hardcoded list; only helps mapped repos.kind = "cli"workaround — lies about process type, mislabels in catalogs, repeated per-repo.Pre-merge check
Cross-org grep for any struct-literal construction of
Binary { … }(positional/all-fields) — those need the new field. Deserialization sites (the majority) are unaffected.Decision: going with the simpler "install-only" approach, not the
supervisedflagAfter implementing and testing the
supervisedflag end-to-end, we are backing it out in favour of a lighter approach. Two reasons:1. It is more machinery than the problem needs. The flag required a new field in the shared
hero_libservice schema plus changes to multiple lab code paths — and there turned out to be four start-decision sites, not three.builder/orchestrator.rs(the pathlab build --startactually uses) was missed in the first pass, so the flag silently did nothing on the exact command users run.2. It is conceptually awkward. Declaring a binary to the supervisor (lab/hero_proc) and then flagging "…but do not supervise it" is contradictory. lab’s manifest should list what it manages.
The simpler model
lab already has an "install but never start" category — that is what
cli/tool binaries are (e.g.hero_do_hero_livekit: installed, never started).lk-backendis operationally exactly that from lab’s perspective: lab installs it to~/hero/bin, andhero_livekit_serverspawns/supervises it via itsstart()RPC. So the fix is to putlk-backendin lab’s install-only bucket rather than invent a new "long-running-but-do-not-start" concept.Status
The
supervised-flag changes were reverted from theintegrationbranches ofhero_lib,hero_skills(lab), andhero_livekit(viagit revert, no force-push). The hero_rpc2 migration on hero_livekitintegrationis untouched.Follow-up — the actual root issue
lk-backendis declaredkind = "server", so every lab start-path tries to launch it standalone, where it dies (it only works ashero_livekit_server’s child). The install-only approach addresses that directly. Also worth a look: all four hero_livekitservice.tomls currently list all four binaries (server/admin/lk-backend/do) — that duplication is part of what made this confusing, and is likely where the cleanest fix lives.