[spec] Hero OS bootstrap MUST work on any Ubuntu 24.04+ — root or unprivileged user #50

Open
opened 2026-04-30 17:52:26 +00:00 by mik-tf · 0 comments
Owner

Requirement

Hero OS bootstrap must work end-to-end on any Ubuntu 24.04+ environment:

  • TF Grid VM, Hetzner VM, AWS / GCP / Azure VM, on-prem, laptop — any provisioning origin
  • Running as root or as an unprivileged user (e.g. driver)
  • Without requiring external CLI tools beyond what install_core documents and provisions

A fresh deploy following docs/ops/DEPLOYMENT.md should reach a working Hero OS regardless of which of the above environments the operator chose. Today, several aspects of the install/start flow assume specifics that hold on TF Grid as root but fail elsewhere.

Why this matters

The CEO is currently deploying on Hetzner as root and hitting issues that operators on TF Grid never see. Quote: "its a split root/user, so doesn't fully work — first setting the root well — rather manual, it seems the tfgrid is root so its easier to deploy."

If reproducibility is the goal (one of the stated philosophies of this stack — "infrastructure as code, white-label as code"), then the bootstrap flow being host-shape-coupled is a real gap.

Audit scope (where the assumptions hide)

Things to systematically verify and either fix or document:

1. Path assumptions in hero_skills

  • ~/hero/bin/ vs /root/hero/bin/ — install paths assume invoking user's HOME. If install runs as root and runtime as driver, paths diverge.
  • ~/hero/code0/<repo> vs ~/code/hero_skills/... — multiple anchors, inconsistent.
  • $HERO_HOME / $HERO_ROOTDIR env vars — partial coverage (only some modules respect them).

2. CLI tool assumptions

  • redis-cli (today's livekit bug — issue on hero_skills#TBD)
  • claude Claude Code CLI (today's livekit bug)
  • pandoc, python3-openpyxl, libreoffice (Office seed) — already installed by install_core, document explicitly
  • Any other tool a service_X start flow calls via ^cmd

3. Hardcoded ports and addresses

  • service_livekit.nu — port 6379 (should be hero_db's actual port)
  • TF-Grid-only 10.1.2.x references should all be opt-in via env, not assumed
  • nginx / hero_router / hero_proxy ports — verify configurable

4. systemd vs nohup

  • TF Grid VMs have no systemd → install_docker_btrfs uses nohup for dockerd
  • Generic Ubuntu has systemd → should use systemctl enable docker
  • Currently a sidebar in DEPLOYMENT.md §10; should also be enforced in install_core (auto-detect)

5. Filesystem assumptions

  • TF Grid: btrfs swap, chattr +C, /data partition
  • Generic: ext4, normal swapfile, no /data
  • Already documented as sidebars in §2 of the runbook
  • install_core should auto-detect and not require operator awareness

6. User-execution model

  • install_core is run as driver (unprivileged) but uses sudo for apt + Docker
  • service_X install runs as the invoking user
  • service_X start registers actions tied to the invoking user's hero_proc
  • Mixing root-then-driver causes path drift

Acceptance criteria

This issue is the umbrella; sub-issues will track individual fixes. The umbrella closes when:

  • A fresh root-only Ubuntu 24.04 VM (Hetzner, AWS, anywhere) follows the runbook end-to-end and reaches working Hero OS without manual fixes
  • A fresh driver-user Ubuntu 24.04 VM (any host) reaches working Hero OS via the same runbook
  • install_core validates host shape (systemd present? btrfs?) and adapts, instead of requiring operator awareness
  • Runbook §0-§3 explicitly states the root-vs-user invariant and what install_core enforces
  • All ^<tool> external CLI invocations in hero_skills/tools/modules/services/*.nu are either:
    • documented prerequisites in install_core, or
    • replaced with portable equivalents

Sub-tickets to spawn

  • hero_skills: service_livekit start redis-cli + claude bugs (already filed today as separate hero_skills issue)
  • hero_demo runbook: explicit root vs driver-user invariant in §2; auto-detect systemd in §10; explicit list of CLI tools install_core provisions
  • hero_skills install_core: validate host shape, fail fast with clear remediation if assumption violated; auto-detect systemd vs nohup
  • hero_skills paths: audit all ~/hero/... references; honor $HERO_HOME / $HERO_ROOTDIR consistently across all service modules

References

  • Today's session on herodemo + Kristof's Hetzner deploy revealed the gap concretely
  • Related: hero_demo#46 (operational restore), hero_demo#47 (runbook update flow), hero_demo#48 / #49 (seed data reproducibility)

Signed-off-by: mik-tf

## Requirement Hero OS bootstrap **must** work end-to-end on **any Ubuntu 24.04+ environment**: - TF Grid VM, Hetzner VM, AWS / GCP / Azure VM, on-prem, laptop — any provisioning origin - Running as `root` **or** as an unprivileged user (e.g. `driver`) - **Without** requiring external CLI tools beyond what `install_core` documents and provisions A fresh deploy following `docs/ops/DEPLOYMENT.md` should reach a working Hero OS regardless of which of the above environments the operator chose. Today, several aspects of the install/start flow assume specifics that hold on TF Grid as `root` but fail elsewhere. ## Why this matters The CEO is currently deploying on **Hetzner as root** and hitting issues that operators on TF Grid never see. Quote: *"its a split root/user, so doesn't fully work — first setting the root well — rather manual, it seems the tfgrid is root so its easier to deploy."* If reproducibility is the goal (one of the stated philosophies of this stack — *"infrastructure as code, white-label as code"*), then the bootstrap flow being host-shape-coupled is a real gap. ## Audit scope (where the assumptions hide) Things to systematically verify and either fix or document: ### 1. Path assumptions in `hero_skills` - `~/hero/bin/` vs `/root/hero/bin/` — install paths assume invoking user's HOME. If install runs as root and runtime as driver, paths diverge. - `~/hero/code0/<repo>` vs `~/code/hero_skills/...` — multiple anchors, inconsistent. - `$HERO_HOME` / `$HERO_ROOTDIR` env vars — partial coverage (only some modules respect them). ### 2. CLI tool assumptions - `redis-cli` (today's livekit bug — issue on hero_skills#TBD) - `claude` Claude Code CLI (today's livekit bug) - `pandoc`, `python3-openpyxl`, `libreoffice` (Office seed) — already installed by `install_core`, document explicitly - Any other tool a `service_X start` flow calls via `^cmd` ### 3. Hardcoded ports and addresses - `service_livekit.nu` — port 6379 (should be hero_db's actual port) - TF-Grid-only `10.1.2.x` references should all be opt-in via env, not assumed - nginx / hero_router / hero_proxy ports — verify configurable ### 4. systemd vs nohup - TF Grid VMs have no systemd → `install_docker_btrfs` uses `nohup` for dockerd - Generic Ubuntu has systemd → should use `systemctl enable docker` - Currently a sidebar in `DEPLOYMENT.md §10`; should also be enforced in `install_core` (auto-detect) ### 5. Filesystem assumptions - TF Grid: btrfs swap, `chattr +C`, /data partition - Generic: ext4, normal swapfile, no /data - Already documented as sidebars in §2 of the runbook - `install_core` should auto-detect and not require operator awareness ### 6. User-execution model - `install_core` is run as `driver` (unprivileged) but uses `sudo` for apt + Docker - `service_X install` runs as the invoking user - `service_X start` registers actions tied to the invoking user's hero_proc - Mixing root-then-driver causes path drift ## Acceptance criteria This issue is the umbrella; sub-issues will track individual fixes. The umbrella closes when: - [ ] A fresh root-only Ubuntu 24.04 VM (Hetzner, AWS, anywhere) follows the runbook end-to-end and reaches working Hero OS without manual fixes - [ ] A fresh driver-user Ubuntu 24.04 VM (any host) reaches working Hero OS via the same runbook - [ ] `install_core` validates host shape (systemd present? btrfs?) and adapts, instead of requiring operator awareness - [ ] Runbook §0-§3 explicitly states the root-vs-user invariant and what `install_core` enforces - [ ] All `^<tool>` external CLI invocations in `hero_skills/tools/modules/services/*.nu` are either: - documented prerequisites in `install_core`, or - replaced with portable equivalents ## Sub-tickets to spawn - [ ] **hero_skills**: `service_livekit start` redis-cli + claude bugs (already filed today as separate hero_skills issue) - [ ] **hero_demo runbook**: explicit root vs driver-user invariant in §2; auto-detect systemd in §10; explicit list of CLI tools `install_core` provisions - [ ] **hero_skills `install_core`**: validate host shape, fail fast with clear remediation if assumption violated; auto-detect systemd vs nohup - [ ] **hero_skills paths**: audit all `~/hero/...` references; honor `$HERO_HOME` / `$HERO_ROOTDIR` consistently across all service modules ## References - Today's session on herodemo + Kristof's Hetzner deploy revealed the gap concretely - Related: hero_demo#46 (operational restore), hero_demo#47 (runbook update flow), hero_demo#48 / #49 (seed data reproducibility) Signed-off-by: mik-tf
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_demo#50
No description provided.