Promote tester-VM install runner to a Rust crate (replace setup-binaries.sh in install path) #17

Open
opened 2026-05-28 01:27:50 +00:00 by mik-tf · 0 comments
Owner

The deployer's install_hero_stack RPC currently drives tester-VM install by SCPing cockpit-services.toml + app.env, then SSHing in and piping curl <bootstrap_url> | bash against hero_demo/deploy/single-vm/scripts/setup-binaries.sh. This works but has structural problems that became visible across s172c live walks.

What's wrong with the current shape

  1. Wrong repo for install logic. hero_demo is for demo scripts and runbooks, not for code that the deployer depends on for every tester onboarding. The deployer is in hero_os_tfgrid_deployer; the install runner it depends on should also live in hero_os_tfgrid_deployer.

  2. Wrong language. Bash has no type system for the manifest. cockpit-services.toml and app.env are projections of the deployer's typed Rust structs into untyped shell. Adding new install-time config means editing the Rust shape, the TOML rendering in the deployer, AND (potentially) the bash script that reads them. Easy to drift; impossible to unit-test.

  3. Wrong env model. Bash exports in app.env do not propagate to hero_proc-managed daemons (the canonical pattern). Every operator-owned runtime knob must be set separately via hero_proc secret set after install, which today means appending shell snippets to the SSH payload string in web.rs::handle_install_hero_stack. Each new secret = string-format edit. Not a structured surface.

Proposed fix

A new crate lhumina_code/hero_os_tfgrid_deployer/crates/hero_tester_install/ providing:

  • An InstallManifest struct (services list, secrets list as (context, key, value) triples, env list, install bootstrap URL, optional embedder model size). Serializes to TOML.
  • A binary hero_tester_install that takes --manifest /path/to/install.toml, connects to (or starts) hero_proc, calls hero_proc_sdk::secret_set for every secret triple, registers each enabled service via the hero_proc service.add RPC, downloads binaries via lab build --download, starts services in dependency order, polls health, returns structured status.
  • Tests: unit tests for the manifest parser, secret-setter (with mocked hero_proc), service-registrar (with mocked lab).

The deployer's ssh.rs then SCPs the binary + a generated manifest TOML, runs ssh ... hero_tester_install --manifest /root/install.toml. The Rust manifest generation in web.rs::handle_install_hero_stack replaces the current app.env + cockpit-services.toml string renderers.

hero_demo/deploy/single-vm/scripts/setup-binaries.sh stays for hero_demo's own manual-install use cases but is no longer invoked by the deployer.

Acceptance criteria

  • A clean live walk shows: deployer SCPs typed manifest + binary, SSH-runs the binary, all enabled services come up, cockpit URL returns 302 with no app.env-shaped artifact on disk.
  • cargo test -p hero_tester_install passes.
  • Browser SSO walks complete end-to-end (which they cannot today without the OAuth client secrets being propagated via the same manifest mechanism).

Scope estimate

1-2 days of focused work. Independent of the home#238 arc closure but enables clean closure of any future install-related work.

Filed from the s172c session on hero_os_tfgrid_deployer/development (commits ec27241..483c8b8). Refs home#238.

The deployer's `install_hero_stack` RPC currently drives tester-VM install by SCPing `cockpit-services.toml` + `app.env`, then SSHing in and piping `curl <bootstrap_url> | bash` against `hero_demo/deploy/single-vm/scripts/setup-binaries.sh`. This works but has structural problems that became visible across s172c live walks. ## What's wrong with the current shape 1. **Wrong repo for install logic.** `hero_demo` is for demo scripts and runbooks, not for code that the deployer depends on for every tester onboarding. The deployer is in `hero_os_tfgrid_deployer`; the install runner it depends on should also live in `hero_os_tfgrid_deployer`. 2. **Wrong language.** Bash has no type system for the manifest. `cockpit-services.toml` and `app.env` are projections of the deployer's typed Rust structs into untyped shell. Adding new install-time config means editing the Rust shape, the TOML rendering in the deployer, AND (potentially) the bash script that reads them. Easy to drift; impossible to unit-test. 3. **Wrong env model.** Bash exports in `app.env` do not propagate to `hero_proc`-managed daemons (the canonical pattern). Every operator-owned runtime knob must be set separately via `hero_proc secret set` after install, which today means appending shell snippets to the SSH payload string in `web.rs::handle_install_hero_stack`. Each new secret = string-format edit. Not a structured surface. ## Proposed fix A new crate `lhumina_code/hero_os_tfgrid_deployer/crates/hero_tester_install/` providing: - An `InstallManifest` struct (services list, secrets list as `(context, key, value)` triples, env list, install bootstrap URL, optional embedder model size). Serializes to TOML. - A binary `hero_tester_install` that takes `--manifest /path/to/install.toml`, connects to (or starts) `hero_proc`, calls `hero_proc_sdk::secret_set` for every secret triple, registers each enabled service via the `hero_proc` `service.add` RPC, downloads binaries via `lab build --download`, starts services in dependency order, polls health, returns structured status. - Tests: unit tests for the manifest parser, secret-setter (with mocked hero_proc), service-registrar (with mocked lab). The deployer's `ssh.rs` then SCPs the binary + a generated manifest TOML, runs `ssh ... hero_tester_install --manifest /root/install.toml`. The Rust manifest generation in `web.rs::handle_install_hero_stack` replaces the current `app.env` + `cockpit-services.toml` string renderers. `hero_demo/deploy/single-vm/scripts/setup-binaries.sh` stays for hero_demo's own manual-install use cases but is no longer invoked by the deployer. ## Acceptance criteria - A clean live walk shows: deployer SCPs typed manifest + binary, SSH-runs the binary, all enabled services come up, cockpit URL returns 302 with no `app.env`-shaped artifact on disk. - `cargo test -p hero_tester_install` passes. - Browser SSO walks complete end-to-end (which they cannot today without the OAuth client secrets being propagated via the same manifest mechanism). ## Scope estimate 1-2 days of focused work. Independent of the home#238 arc closure but enables clean closure of any future install-related work. Filed from the s172c session on `hero_os_tfgrid_deployer/development` (commits ec27241..483c8b8). Refs home#238.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_os_tfgrid_deployer#17
No description provided.