lk-backend is declared kind = "server" — lab build --start wrongly starts it (apply hero_skills#315 install-only fix) #41

Closed
opened 2026-06-07 16:43:58 +00:00 by sameh-farouk · 1 comment
Member

crates/hero_livekit_backend/service.toml:41-42 declares lk-backend with kind = "server", even though its own desc says "Supervised by hero_livekit_server."

In lab (builder/orchestrator.rs:441), Kind::Serveris_service = true, so lab build --start calls start_binary_now for it (gates at :510 and :702). On a fresh box this starts lk-backend directly — with none of the env its parent injects (LIVEKIT_API_KEY/SECRET, SQLITE_PATH, PORT) and racing the child that LiveKitService.start() (hero_livekit_server/src/livekit/rpc.rs:614) is meant to supervise.

This is the downstream half of hero_skills#315, whose adopted resolution is to reclassify supervised binaries as install-only.

Fix (either):

  • A: set lk-backend to kind = "cli" in service.tomlis_cmdline_exception = true → still built + installed, never started. Self-contained, but mislabels a real TCP daemon ([[binaries.tcp]] :8080) as a CLI tool.
  • B (preferred): add lk-backend to CMDLINE_EXCEPTIONS in hero_skills lab/src/builder/known_services.rs (alongside hero_shrimp, hero_slides). Keeps the manifest honest (kind = "server" stays true); cost is hardcoding a downstream name into lab.

Acceptance: lab build --start on a clean checkout builds + installs lk-backend but does not start it; only hero_livekit_server (+ admin) come up.

`crates/hero_livekit_backend/service.toml:41-42` declares `lk-backend` with `kind = "server"`, even though its own `desc` says *"Supervised by hero_livekit_server."* In `lab` (`builder/orchestrator.rs:441`), `Kind::Server` → `is_service = true`, so `lab build --start` calls `start_binary_now` for it (gates at `:510` and `:702`). On a fresh box this starts `lk-backend` directly — with none of the env its parent injects (`LIVEKIT_API_KEY/SECRET`, `SQLITE_PATH`, `PORT`) and racing the child that `LiveKitService.start()` (`hero_livekit_server/src/livekit/rpc.rs:614`) is meant to supervise. This is the downstream half of hero_skills#315, whose adopted resolution is to reclassify supervised binaries as **install-only**. **Fix (either):** - **A:** set `lk-backend` to `kind = "cli"` in `service.toml` → `is_cmdline_exception = true` → still built + installed, never started. Self-contained, but mislabels a real TCP daemon (`[[binaries.tcp]]` :8080) as a CLI tool. - **B (preferred):** add `lk-backend` to `CMDLINE_EXCEPTIONS` in `hero_skills` `lab/src/builder/known_services.rs` (alongside `hero_shrimp`, `hero_slides`). Keeps the manifest honest (`kind = "server"` stays true); cost is hardcoding a downstream name into lab. **Acceptance:** `lab build --start` on a clean checkout builds + installs `lk-backend` but does **not** start it; only `hero_livekit_server` (+ admin) come up.
Author
Member

Design spec (source of truth) — declarative hybrid B1, after org-wide consistency check

This supersedes the original "pick band-aid A or B" framing. After checking the live ecosystem (hero_collab reference impl, hero_proc redesign issues, and the lab-hardcode direction), the design and the interim fix have both firmed up.

Decision summary

  • Interim fix (do now): set lk-backend to kind = "cli" in crates/hero_livekit_backend/service.toml. This makes it install-only (built + installed, never auto-started by lab build --start), stopping the double-start that currently fails on fresh installs (no parent-injected env).
  • Target architecture: declarative hybrid B1 (below). Lock the design now; implement after the hero_proc refactor stabilizes and ideally after hero_proc#135 lands.
  • Do NOT add a bespoke onlyoffice-style lab module for lk-backend, and do NOT use CMDLINE_EXCEPTIONS. See "Why kind = "cli" over the alternatives".

Why kind = "cli" over the alternatives

  • vs CMDLINE_EXCEPTIONS (the earlier "preferred B"): that adds lk-backend to a lab hardcode list. hero_skills#308 explicitly calls lab's hardcoded SERVICE_MAP "a parallel source of truth that diverges from each binary's service.toml", and hero_proc#135 is wiring service.toml [[dependencies]] as the authoritative mechanism. Adding a lab hardcode cuts directly against that direction. kind = "cli" keeps the decision in service.toml — the source of truth the ecosystem is converging on — and is trivially reversible when B1 lands.
  • vs a bespoke lab launcher module (onlyoffice-style): that is the less consistent path (bespoke lab code, cuts against hero_proc cleanup #138 and #308). It is also the wrong tool for lk-backend, which is a repo crate and should be a normal declarative service.toml service — not a launcher script. The only legitimate future use of a lab acquire module is the external livekit-server binary (see wrinkle 1), which is separate, later work and does not address this issue.
  • Cosmetic caveat: lk-backend is genuinely a TCP daemon, so kind = "cli" is a white lie. It is acceptable here because (a) lk-backend is TCP-only ([[binaries.tcp]], port 8081) with no rpc.sock, so it is not part of the socket-discovery mesh and the mislabel does not corrupt discovery; (b) it is explicitly temporary and reverts to kind = "server" under B1.

Target architecture — declarative hybrid B1

Mirrors the hero_collab reference (one crate per binary, each service.toml declares [[dependencies]] on the socket it needs):

  • lk-backend → its own hero_proc service, kind = "server", declaring [[dependencies]] on hero_livekit_server rpc.sock. Started by hero_proc in dependency order — no longer hand-spawned by hero_livekit_server. (Requires lk-backend to self-load its config from backend.env rather than rely on env injected by the parent at spawn — see wrinkle 2.)
  • hero_livekit_admin → declare the missing [[dependencies]] on hero_livekit_server rpc.sock (tracked separately; it is missing today, unlike hero_collab_admin).
  • hero_livekit_server → absorbs config-gen (mint secret, write livekit.yaml + backend.env) as a boot step (it already self-heals config in start()), and stops holding Child handles / pkill-ing. Becomes pure control plane.
  • livekit-server (external downloaded SFU) → acquired + registered the onlyoffice way (it is not a repo crate, so it has no service.toml). Depends on config + redis/hero_db.

Three wrinkles the check surfaced (design constraints)

  1. livekit-server is an external binary — no crate, no service.toml. The declarative model can't register it; it needs a bespoke acquire path. Hence "hybrid".
  2. Config-gen is not expressible as a [[dependencies]]. The Dependency { repo, crate, bin, socket } model is socket-readiness only. We ride it via "server writes config on boot → dependents gate on the server's socket." This also means lk-backend must read backend.env itself when hero_proc starts it (today the parent injects those env vars at spawn).
  3. hero_proc#135 is not landed. Until it is, [[dependencies]] ordering is enforced by lab only (ensure_dependency_running), not the supervisor — so CLI/service.start starts are unordered. This is the same half-working state hero_collab lives in now.

Sequencing

  1. Now: kind = "cli" interim (this issue). Also reconcile the port mismatch: service.toml says 8080, DEFAULT_BACKEND_PORT is 8081.
  2. Locked: this design.
  3. Later (post-refactor): implement B1. Coordinate with the hero_proc redesign owners — livekit is a useful first "declarative consumer with an external binary + a config-gen step", which exercises gaps the socket-ready gate (#135) does not cover.
  • hero_proc#135 (populate depends_on from service.toml [[dependencies]] — the enforcement mechanism B1 rides)
  • hero_proc#95 (auto-start hero_db — livekit-server needs redis)
  • hero_proc#115/#116/#106/#114 (restart/backoff gaps — bound B1's resilience)
  • hero_skills#308 (lab hardcode vs service.toml source of truth — why not CMDLINE_EXCEPTIONS)
  • hero_livekit#42 (turnkey install — folds into B1's config-gen-on-boot)
## Design spec (source of truth) — declarative hybrid B1, after org-wide consistency check This supersedes the original "pick band-aid A or B" framing. After checking the live ecosystem (hero_collab reference impl, hero_proc redesign issues, and the lab-hardcode direction), the design and the interim fix have both firmed up. ### Decision summary - **Interim fix (do now): set `lk-backend` to `kind = "cli"` in `crates/hero_livekit_backend/service.toml`.** This makes it install-only (built + installed, never auto-started by `lab build --start`), stopping the double-start that currently fails on fresh installs (no parent-injected env). - **Target architecture: declarative hybrid B1** (below). Lock the design now; implement after the hero_proc refactor stabilizes and ideally after hero_proc#135 lands. - **Do NOT** add a bespoke onlyoffice-style lab module for `lk-backend`, and **do NOT** use `CMDLINE_EXCEPTIONS`. See "Why `kind = "cli"` over the alternatives". ### Why `kind = "cli"` over the alternatives - **vs `CMDLINE_EXCEPTIONS` (the earlier "preferred B"):** that adds `lk-backend` to a lab hardcode list. hero_skills#308 explicitly calls lab's hardcoded `SERVICE_MAP` *"a parallel source of truth that diverges from each binary's `service.toml`"*, and hero_proc#135 is wiring `service.toml [[dependencies]]` as the authoritative mechanism. Adding a lab hardcode cuts directly against that direction. `kind = "cli"` keeps the decision **in `service.toml`** — the source of truth the ecosystem is converging on — and is trivially reversible when B1 lands. - **vs a bespoke lab launcher module (onlyoffice-style):** that is the *less* consistent path (bespoke lab code, cuts against hero_proc cleanup #138 and #308). It is also the wrong tool for `lk-backend`, which is a repo crate and should be a normal declarative `service.toml` service — not a launcher script. The only legitimate future use of a lab acquire module is the **external `livekit-server` binary** (see wrinkle 1), which is separate, later work and does not address this issue. - **Cosmetic caveat:** `lk-backend` is genuinely a TCP daemon, so `kind = "cli"` is a white lie. It is acceptable here because (a) `lk-backend` is TCP-only (`[[binaries.tcp]]`, port 8081) with no `rpc.sock`, so it is **not** part of the socket-discovery mesh and the mislabel does not corrupt discovery; (b) it is explicitly temporary and reverts to `kind = "server"` under B1. ### Target architecture — declarative hybrid B1 Mirrors the hero_collab reference (one crate per binary, each `service.toml` declares `[[dependencies]]` on the socket it needs): - **`lk-backend`** → its own hero_proc service, `kind = "server"`, declaring `[[dependencies]]` on `hero_livekit_server` `rpc.sock`. Started by hero_proc in dependency order — **no longer hand-spawned** by `hero_livekit_server`. (Requires `lk-backend` to self-load its config from `backend.env` rather than rely on env injected by the parent at spawn — see wrinkle 2.) - **`hero_livekit_admin`** → declare the missing `[[dependencies]]` on `hero_livekit_server` `rpc.sock` (tracked separately; it is missing today, unlike hero_collab_admin). - **`hero_livekit_server`** → absorbs config-gen (mint secret, write `livekit.yaml` + `backend.env`) as a boot step (it already self-heals config in `start()`), and **stops holding `Child` handles / `pkill`-ing**. Becomes pure control plane. - **`livekit-server`** (external downloaded SFU) → acquired + registered the onlyoffice way (it is not a repo crate, so it has no `service.toml`). Depends on config + redis/hero_db. ### Three wrinkles the check surfaced (design constraints) 1. **`livekit-server` is an external binary** — no crate, no `service.toml`. The declarative model can't register it; it needs a bespoke acquire path. Hence "hybrid". 2. **Config-gen is not expressible as a `[[dependencies]]`.** The `Dependency { repo, crate, bin, socket }` model is **socket-readiness only**. We ride it via "server writes config on boot → dependents gate on the server's socket." This also means `lk-backend` must read `backend.env` itself when hero_proc starts it (today the parent injects those env vars at spawn). 3. **hero_proc#135 is not landed.** Until it is, `[[dependencies]]` ordering is enforced by **lab only** (`ensure_dependency_running`), not the supervisor — so CLI/`service.start` starts are unordered. This is the same half-working state hero_collab lives in now. ### Sequencing 1. **Now:** `kind = "cli"` interim (this issue). Also reconcile the port mismatch: `service.toml` says 8080, `DEFAULT_BACKEND_PORT` is 8081. 2. **Locked:** this design. 3. **Later (post-refactor):** implement B1. Coordinate with the hero_proc redesign owners — livekit is a useful first "declarative consumer with an external binary + a config-gen step", which exercises gaps the socket-ready gate (#135) does not cover. ### Related - hero_proc#135 (populate `depends_on` from `service.toml [[dependencies]]` — the enforcement mechanism B1 rides) - hero_proc#95 (auto-start hero_db — livekit-server needs redis) - hero_proc#115/#116/#106/#114 (restart/backoff gaps — bound B1's resilience) - hero_skills#308 (lab hardcode vs service.toml source of truth — why not CMDLINE_EXCEPTIONS) - hero_livekit#42 (turnkey install — folds into B1's config-gen-on-boot)
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_livekit#41
No description provided.