lab onboarding flow on a fresh Ubuntu 24 box has multiple gaps and one hard breakage #281

New issue

Closed

opened 2026-05-21 11:28:04 +00:00 by nabil_salah · 1 comment

nabil_salah commented

2026-05-21 11:28:04 +00:00

Member

Summary

On a brand-new Ubuntu 24 root account, following the documented install path (curl one-liner → lab user init → lab install core) hits four distinct issues that combine to make the first-run experience unworkable. Reproduced on a fresh ssh root@<vm> Ubuntu 24.04 LTS install.

Environment

OS: Ubuntu 24.04 LTS (noble), fresh image, no prior config
Shell: bash, root user
Install path: documented curl one-liner from crates/lab/README.md
lab version installed: lab 0.1.0

Issues

1. `install.sh` finishes, but `lab path` doesn't work yet

After curl … install.sh | bash, lab is installed at ~/hero/bin/lab and on PATH. The README implies users should next run lab path (or eval it) to load the Hero env. But:

root@vmrx5xp:~# lab path
echo "ERROR: PATH_ROOT is not set — run 'lab user init' first" >&2
exit 1

Two problems in one:

a) lab path has a hard prerequisite (lab user init) that neither install.sh nor the README mentions. A new user follows the README, runs lab path, gets this output, and has no idea what to do — lab user init is nowhere in the install instructions.
b) The error output is raw shell code rendered to a TTY. The eval-friendly form (echo … >&2 / exit 1) is appropriate when piped through eval, but interactively it's confusing — looks like the binary is printing source code at the user.

Fix:

End of install.sh: print a clear "next: run lab user init" line, and ideally add the canonical PATH export to ~/.bashrc automatically.
Update README to make lab user init an explicit step between install and "use".
lab path should detect a TTY and print a normal stderr error in that case; emit the eval-friendly form only when piped.

2. `lab user init` silently skips secrets import — but the token is already collected

root@vmrx5xp:~# lab user init
…
Enter your FORGE_TOKEN (leave blank to skip): <TOKEN>
✓ FORGE_TOKEN is valid (HTTP 200)
…
→ cloning https://forge.ourworld.tf/<user>/secrets into /root/hero/code/secrets
→ pulling /root/hero/code/secrets
  hero_proc not running — skipping secrets import

lab user init asks for the FORGE_TOKEN, validates it, clones the secrets repo — and then refuses to import the secrets into hero_proc "because hero_proc isn't running." But:

The user just supplied the token interactively.
The clone succeeded.
lab user init could either (a) start hero_proc at the end of init or (b) defer the import to the first hero_proc start.

Instead it prints a warning that looks like an error, then prints the success banner. The user has no obvious next step to fix it.

Fix: either auto-import the secrets at the end of the next hero_proc start, or change the wording to "secrets will be imported on next hero_proc start (run lab install core next)" so it doesn't read as a failure.

3. `lab install core` is killed mid-install when systemd self-upgrades over SSH

lab install core invokes sudo apt-get install … for ~166 packages in the foreground. The package list includes systemd, systemd-sysv, udev, libsystemd-shared, etc. Upgrading systemd on a live host triggers a systemd re-exec, which restarts sshd. The SSH session terminates, SIGHUP propagates to the shell's process group, and apt dies mid-install — leaving dpkg in a broken state.

Reproduce:

root@vmrx5xp:~# lab install core
…
=== Installing base tools ===
…
Get:166 …protobuf-compiler …
Fetched 267 MB in 4s
…
Preparing to unpack mount…
Unpacking python3-minimal …
Setting up python3-minimal …
W: Operation was interrupted before it could finish
W: APT had planned for dpkg to do more than it reported back (14 vs 665).
   Affected packages: <list of 651>
Installing sccache …                ← lab continued anyway
…
=== Starting hero_proc ===
…
  fail hero_proc_server: `screen` is required to launch hero_proc_server but is not installed.

The subsequent retry fails immediately with Unmet dependencies. Try 'apt --fix-broken install' with no packages — i.e., lab leaves the host in a state it can't self-recover from.

Two coupled bugs:

a) The apt step is killable by its own systemd upgrade. Must run detached.
b) lab doesn't check apt's exit code / dpkg state before proceeding. It installed sccache, ONNX, Claude Code, Python on top of a half-broken apt state, then failed at the hero_proc_server start because screen (one of the missing 651 packages) was never installed.

Workaround that made it succeed

Running lab install core detached survives the sshd restart:

sudo dpkg --configure -a
sudo apt-get -f install -y
nohup lab install core > /root/lab_install.log 2>&1 &

This should not be required from the documented onboarding flow.

Priority

All four are blockers for the new-user experience. #3 is the most severe — it leaves the host in a broken state with no obvious recovery path.

## Summary On a brand-new Ubuntu 24 root account, following the documented install path (curl one-liner → `lab user init` → `lab install core`) hits four distinct issues that combine to make the first-run experience unworkable. Reproduced on a fresh `ssh root@<vm>` Ubuntu 24.04 LTS install. ## Environment - OS: Ubuntu 24.04 LTS (noble), fresh image, no prior config - Shell: bash, root user - Install path: documented curl one-liner from `crates/lab/README.md` - `lab` version installed: `lab 0.1.0` ## Issues ### 1. `install.sh` finishes, but `lab path` doesn't work yet After `curl … install.sh | bash`, `lab` is installed at `~/hero/bin/lab` and on PATH. The README implies users should next run `lab path` (or eval it) to load the Hero env. But: ``` root@vmrx5xp:~# lab path echo "ERROR: PATH_ROOT is not set — run 'lab user init' first" >&2 exit 1 ``` Two problems in one: - **a) `lab path` has a hard prerequisite (`lab user init`) that neither `install.sh` nor the README mentions.** A new user follows the README, runs `lab path`, gets this output, and has no idea what to do — `lab user init` is nowhere in the install instructions. - **b) The error output is raw shell code rendered to a TTY.** The eval-friendly form (`echo … >&2 / exit 1`) is appropriate when piped through `eval`, but interactively it's confusing — looks like the binary is printing source code at the user. **Fix:** - End of `install.sh`: print a clear "next: run `lab user init`" line, and ideally add the canonical PATH export to `~/.bashrc` automatically. - Update README to make `lab user init` an explicit step between install and "use". - `lab path` should detect a TTY and print a normal stderr error in that case; emit the eval-friendly form only when piped. ### 2. `lab user init` silently skips secrets import — but the token is already collected ``` root@vmrx5xp:~# lab user init … Enter your FORGE_TOKEN (leave blank to skip): <TOKEN> ✓ FORGE_TOKEN is valid (HTTP 200) … → cloning https://forge.ourworld.tf/<user>/secrets into /root/hero/code/secrets → pulling /root/hero/code/secrets hero_proc not running — skipping secrets import ``` `lab user init` asks for the FORGE_TOKEN, validates it, clones the secrets repo — and then refuses to import the secrets into `hero_proc` "because hero_proc isn't running." But: - The user just supplied the token interactively. - The clone succeeded. - `lab user init` could either (a) start hero_proc at the end of init or (b) defer the import to the first hero_proc start. Instead it prints a warning that looks like an error, then prints the success banner. The user has no obvious next step to fix it. **Fix:** either auto-import the secrets at the end of the next `hero_proc` start, or change the wording to "secrets will be imported on next hero_proc start (run `lab install core` next)" so it doesn't read as a failure. ### 3. `lab install core` is killed mid-install when systemd self-upgrades over SSH `lab install core` invokes `sudo apt-get install …` for ~166 packages in the foreground. The package list includes `systemd`, `systemd-sysv`, `udev`, `libsystemd-shared`, etc. Upgrading systemd on a live host triggers a systemd re-exec, which restarts sshd. The SSH session terminates, `SIGHUP` propagates to the shell's process group, and `apt` dies mid-install — leaving dpkg in a broken state. Reproduce: ``` root@vmrx5xp:~# lab install core … === Installing base tools === … Get:166 …protobuf-compiler … Fetched 267 MB in 4s … Preparing to unpack mount… Unpacking python3-minimal … Setting up python3-minimal … W: Operation was interrupted before it could finish W: APT had planned for dpkg to do more than it reported back (14 vs 665). Affected packages: <list of 651> Installing sccache … ← lab continued anyway … === Starting hero_proc === … fail hero_proc_server: `screen` is required to launch hero_proc_server but is not installed. ``` The subsequent retry fails immediately with `Unmet dependencies. Try 'apt --fix-broken install' with no packages` — i.e., lab leaves the host in a state it can't self-recover from. Two coupled bugs: - **a) The apt step is killable by its own systemd upgrade.** Must run detached. - **b) lab doesn't check apt's exit code / dpkg state before proceeding.** It installed sccache, ONNX, Claude Code, Python on top of a half-broken apt state, then failed at the `hero_proc_server` start because `screen` (one of the missing 651 packages) was never installed. ## Workaround that made it succeed Running `lab install core` detached survives the sshd restart: ```bash sudo dpkg --configure -a sudo apt-get -f install -y nohup lab install core > /root/lab_install.log 2>&1 & ``` This should not be required from the documented onboarding flow. ## Priority All four are blockers for the new-user experience. #3 is the most severe — it leaves the host in a broken state with no obvious recovery path.