Fresh install is not turnkey — media server (livekit-server) is never downloaded/configured/started by lab build --start #42

Open
opened 2026-06-07 16:43:58 +00:00 by sameh-farouk · 1 comment
Member

After lab build --start, hero_proc reports everything "running", but no meeting works until a human runs the Server tab: Install → Configure → Start flow (README:62-64).

  • LiveKitService.install (rpc.rs:447) downloads livekit-server from GitHub releases and mints the API secret — only when called.
  • LiveKitService.configure writes livekit.yaml + backend.env.
  • LiveKitService.start (rpc.rs:614) spawns livekit-server + lk-backend; errors "binaries not installed -- call install first" otherwise.

Nothing auto-invokes these: the admin /backend/start handler (server.rs:145) only spawns the hero_livekit_server daemon, and there's no hero_proc post-start hook.

Proposal: make first-boot turnkey — e.g. hero_livekit_server runs install+configure on first start if livekit-server is missing (network-gated, version-pinnable), or a hero_proc post-start hook performs the bootstrap. Must stay idempotent and reuse the existing secret on re-runs.

Acceptance: a fresh lab build --start (with network) yields a working meeting without manual UI steps.

After `lab build --start`, hero_proc reports everything "running", but no meeting works until a human runs the **Server tab: Install → Configure → Start** flow (README:62-64). - `LiveKitService.install` (`rpc.rs:447`) downloads `livekit-server` from GitHub releases and mints the API secret — **only when called**. - `LiveKitService.configure` writes `livekit.yaml` + `backend.env`. - `LiveKitService.start` (`rpc.rs:614`) spawns `livekit-server` + `lk-backend`; errors `"binaries not installed -- call install first"` otherwise. Nothing auto-invokes these: the admin `/backend/start` handler (`server.rs:145`) only spawns the `hero_livekit_server` daemon, and there's no hero_proc post-start hook. **Proposal:** make first-boot turnkey — e.g. `hero_livekit_server` runs install+configure on first start if `livekit-server` is missing (network-gated, version-pinnable), or a hero_proc post-start hook performs the bootstrap. Must stay idempotent and reuse the existing secret on re-runs. **Acceptance:** a fresh `lab build --start` (with network) yields a working meeting without manual UI steps.
Author
Member

Approved design spec — turnkey one-command livekit start

Committed to integration at docs/superpowers/specs/2026-06-08-livekit-turnkey-start-design.md (commit e116645). Reproduced here as the source of truth.


LiveKit Turnkey Start — Design Spec

  • Date: 2026-06-08
  • Status: design approved; pending implementation plan
  • Tracks: hero_livekit#42 (turnkey install)
  • Branch: integration

Goal

Bring up a fully working livekit instance — the downloaded livekit-server
media binary, generated config, and running child processes — from a single
lab service hero_livekit --start (or lab build --start), on real
servers
, with no manual install → configure → start RPC dance.

Config source: hero_proc secrets

Per-server configuration lives in hero_proc's secret store — the canonical
Hero pattern ("services probe their secrets from hero_proc at boot"), already
used in this repo by hero_livekit_admin (whitelist.rs reads its IP
allow-list via secret_get). Three values:

Secret Meaning
LIVEKIT_NODE_IP Publicly reachable IP livekit advertises for WebRTC media.
LIVEKIT_VERSION Pinned livekit-server release to download (e.g. v1.7.2).
LIVEKIT_API_SECRET Token-signing secret. If absent, the server mints one and writes it back to hero_proc so it persists and is shared across all three processes.

Exact key names to be reconciled with hero_proc's KNOWN_SECRET_KEYS
catalogue during planning.

Mechanism: server-side idempotent ensure-on-start

On hero_livekit_server startup, after rpc.sock is bound and serving,
run one idempotent ensure_ready() routine as a background task (so a slow
download never delays RPC availability):

  1. Read the three values from hero_proc secrets.
  2. Provision gateLIVEKIT_VERSION being set is the single opt-in
    signal: if it is unset, skip provisioning and serve the plain daemon
    (today's behavior); no extra flag. When provisioning, if LIVEKIT_NODE_IP
    is unset, default to 127.0.0.1 and log a loud warning (works for local
    dev; clearly flagged as wrong for a real server).
  3. If the livekit-server binary is missing, download the pinned
    LIVEKIT_VERSION (best-effort). Download-if-missing only this iteration:
    a changed LIVEKIT_VERSION on an already-installed box does not
    auto-re-download (that would need installed-version tracking) — a version
    bump is an explicit re-install for now; auto-upgrade is a follow-up.
  4. Write livekit.yaml + backend.env from the secrets — only if changed.
  5. Spawn livekit-server + lk-backend only if they are not already
    running
    (liveness-checked). ensure_ready() is non-disruptive: it must
    NOT kill or restart healthy children — unlike the explicit start() RPC,
    which pkills and respawns. Healthy stack on restart → no churn.

ensure_ready() re-runs on every server start, so it self-heals after a
restart.

Safety rules (non-negotiable for real servers)

  1. The daemon always comes up and serves RPC, even if provisioning fails.
    A failed download (offline, GitHub down, rate-limited) logs a clear warning
    and is retried on the next start — it never fails or bricks the daemon.
  2. Provisioning is gated on secrets presence — no surprise downloads on a
    box that did not opt in.
  3. The API secret persists in hero_proc — identical across
    server / livekit-server / lk-backend and across restarts; no per-run random
    secret.

Scope / non-goals

  • In scope: single public-IP servers (IP sits directly on a local
    interface) and local dev (LIVEKIT_NODE_IP=127.0.0.1).
  • Out of scope this iteration: full NAT traversal (separate bind-IP vs
    advertised-IP / rtc.use_external_ip). If LIVEKIT_NODE_IP is not on a
    local interface, log a clear warning; advanced NAT config is a follow-up.
  • Compatible with B1 (deferred): if lk-backend / livekit-server later
    become hero_proc-supervised, only step 5 (spawn children) moves out to
    hero_proc; steps 1–4 stay in the server.

Acceptance criteria

  • With the 3 secrets set, a fresh lab service hero_livekit --start on a
    server with internet egress yields a working room + token, no manual UI.
  • With no secrets set, --start brings up the daemon only (unchanged).
  • Download failure → daemon still up and serving, warning logged, retried on
    next start.
  • Restart when already provisioned → no re-download, config unchanged, children
    (re)spawned only if down.
  • API secret is identical across restarts and across the three processes.

Open questions for planning

  • Exact hero_proc secret key names / catalogue entries.
  • node_ip on-interface check + warning wording.
  • Retry/backoff cadence for a failed download across restarts.
## Approved design spec — turnkey one-command livekit start Committed to `integration` at `docs/superpowers/specs/2026-06-08-livekit-turnkey-start-design.md` (commit `e116645`). Reproduced here as the source of truth. --- # LiveKit Turnkey Start — Design Spec - **Date:** 2026-06-08 - **Status:** design approved; pending implementation plan - **Tracks:** hero_livekit#42 (turnkey install) - **Branch:** integration ## Goal Bring up a fully working livekit instance — the downloaded `livekit-server` media binary, generated config, and running child processes — from a single `lab service hero_livekit --start` (or `lab build --start`), on **real servers**, with no manual `install → configure → start` RPC dance. ## Config source: hero_proc secrets Per-server configuration lives in hero_proc's secret store — the canonical Hero pattern ("services probe their secrets from hero_proc at boot"), already used in this repo by `hero_livekit_admin` (`whitelist.rs` reads its IP allow-list via `secret_get`). Three values: | Secret | Meaning | |---|---| | `LIVEKIT_NODE_IP` | Publicly reachable IP livekit advertises for WebRTC media. | | `LIVEKIT_VERSION` | Pinned `livekit-server` release to download (e.g. `v1.7.2`). | | `LIVEKIT_API_SECRET` | Token-signing secret. If absent, the server mints one and **writes it back** to hero_proc so it persists and is shared across all three processes. | Exact key names to be reconciled with hero_proc's `KNOWN_SECRET_KEYS` catalogue during planning. ## Mechanism: server-side idempotent ensure-on-start On `hero_livekit_server` startup, **after** `rpc.sock` is bound and serving, run one idempotent `ensure_ready()` routine **as a background task** (so a slow download never delays RPC availability): 1. Read the three values from hero_proc secrets. 2. **Provision gate** — `LIVEKIT_VERSION` being set is the single opt-in signal: if it is unset, skip provisioning and serve the plain daemon (today's behavior); no extra flag. When provisioning, if `LIVEKIT_NODE_IP` is unset, default to `127.0.0.1` and log a loud warning (works for local dev; clearly flagged as wrong for a real server). 3. If the `livekit-server` binary is missing, download the pinned `LIVEKIT_VERSION` (best-effort). **Download-if-missing only this iteration:** a changed `LIVEKIT_VERSION` on an already-installed box does *not* auto-re-download (that would need installed-version tracking) — a version bump is an explicit re-install for now; auto-upgrade is a follow-up. 4. Write `livekit.yaml` + `backend.env` from the secrets — only if changed. 5. Spawn `livekit-server` + `lk-backend` **only if they are not already running** (liveness-checked). `ensure_ready()` is **non-disruptive**: it must NOT kill or restart healthy children — unlike the explicit `start()` RPC, which `pkill`s and respawns. Healthy stack on restart → no churn. `ensure_ready()` re-runs on every server start, so it self-heals after a restart. ## Safety rules (non-negotiable for real servers) 1. **The daemon always comes up and serves RPC, even if provisioning fails.** A failed download (offline, GitHub down, rate-limited) logs a clear warning and is retried on the next start — it never fails or bricks the daemon. 2. **Provisioning is gated on secrets presence** — no surprise downloads on a box that did not opt in. 3. **The API secret persists in hero_proc** — identical across server / livekit-server / lk-backend and across restarts; no per-run random secret. ## Scope / non-goals - **In scope:** single public-IP servers (IP sits directly on a local interface) and local dev (`LIVEKIT_NODE_IP=127.0.0.1`). - **Out of scope this iteration:** full NAT traversal (separate bind-IP vs advertised-IP / `rtc.use_external_ip`). If `LIVEKIT_NODE_IP` is not on a local interface, log a clear warning; advanced NAT config is a follow-up. - **Compatible with B1 (deferred):** if `lk-backend` / `livekit-server` later become hero_proc-supervised, only step 5 (spawn children) moves out to hero_proc; steps 1–4 stay in the server. ## Acceptance criteria - With the 3 secrets set, a fresh `lab service hero_livekit --start` on a server with internet egress yields a working room + token, no manual UI. - With no secrets set, `--start` brings up the daemon only (unchanged). - Download failure → daemon still up and serving, warning logged, retried on next start. - Restart when already provisioned → no re-download, config unchanged, children (re)spawned only if down. - API secret is identical across restarts and across the three processes. ## Open questions for planning - Exact hero_proc secret key names / catalogue entries. - `node_ip` on-interface check + warning wording. - Retry/backoff cadence for a failed download across restarts.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_livekit#42
No description provided.