[ops] Long-term: GitOps + immutable infra for Hero OS multi-VM / multi-tenant deploys #164
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
When this matters
Not today. This is the destination issue — open now so we don't forget the shape of the end state while we're deep in tactical fixes.
Trigger to start on this: the moment Hero is deployed to a second long-lived VM (production customer, multi-tenant SaaS, internal staging separate from demo). Until then, the Tier 1 DR + Tier 2
make demopath (#prev-two-issues) covers the need.The gap it closes
Tier 2 (
make demo) produces a deterministic fresh install from code. But:hero_procaction env, that drift is invisible to git. Over months, the VM drifts from the declared state.make demotwice with separate env files, managing their differences manually.~/hero/cfg/env/env.sh. Works for one operator. Doesn't scale to a team.What "Tier 3" looks like for Hero
Everything declarative, reconciled from a git repo:
1. All runtime config moves into
hero_proc actionfiles committed to gitRight now
hero_proc action set <file.json>is an imperative command. In Tier 3: ahero-config/repo holdsactions/<service>.yamlfiles; a reconciler (Argo-CD-style or a small custom daemon) reads them, diffs against the running hero_proc state, applies the delta. Git is the source of truth.Consequence: to change hero_agent's
HERO_AGENT_ROUTING_MODE, you open a PR editingactions/hero_agent_server.yaml, merge → reconciler applies. Fully audited, reviewable, rollback-able.2. Service binaries come from a registry, not from source builds
Tier 2 builds on the VM. Tier 3: CI builds per-commit artifacts pushed to a Forgejo package registry. The action spec says
script: ~/hero/bin/hero_agent_server@v1.2.3; the reconciler ensures the right binary is present.Saves ~30 min per deploy (no in-place cargo build). Enables rollback (
@v1.2.2) without rebuilding.3. Seed data is migration-driven, not replay-driven
Tier 2's
hero_zero_seedproduces a clean initial state. Tier 3: treat seed data like database migrations — each commit adds forward-only migration files (migrations/2026-04-24-001_add_geomind_nitrograph.rhai). Running systems apply pending migrations; fresh systems apply all of them. No more "the old seed TOMLs don't match current schema."4. Secrets in a real secret store
Vault / Teleport / SOPS-encrypted-in-git / Forgejo encrypted env. Any of them. Point is: not plaintext shell files in operator homedirs.
5. Observability from day 1
Prom + Grafana or an equivalent. Every hero_proc service exposes
/metrics(openmetrics). Alerts on: OSIS write failures, embedder query latency, agent.chat P99, aibroker error rate. Makes "is the AI broken?" answerable without SSH.6. Runbooks live alongside code
Every known failure mode from
home#122-160becomes a documented runbook entry indocs_hero/ops/runbooks/. On-call finds a failing service, looks up the runbook, follows numbered steps, fixes it. Not tribal knowledge.What this does NOT need
Pragmatic order of landing (when the time comes)
Each step is an independent landing, each delivers value on its own.
Estimated effort
Related
make demotarget (Tier 2)Signed-off-by: mik-tf
Resolved by
lhumina_code/hero_skills@7c823d1(PR lhumina_code/hero_skills#126).Part of Phase 2 tracker #185.
Reopening — closed in error earlier today. The hero_demo runbook §13 had this issue listed as the tracker for an unrelated deploy step (ONNX install for #162 / HERO_ROOTDIR override for #164), and I trusted the reference without checking the actual issue body. Apologies for the noise. The actual scope of this issue is unchanged from when it was filed.
The correct trackers for the work that just landed: ONNX install + HERO_ROOTDIR are covered directly by
lhumina_code/hero_skills@7c823d1and tracked under #185 (no separate sub-issues filed).Moved to hero_demo#32 — see lhumina_code/hero_demo#32