Replace zinit with hero_proc and add remote management with auth #1

Closed
opened 2026-03-31 14:24:16 +00:00 by timur · 0 comments
Owner

Summary

Replace zinit process supervisor with hero_proc in the gubuntu-installer image. The installed nodes (5 datacenter machines) must:

  1. Boot with hero_proc as the process supervisor
  2. Start mycelium overlay network
  3. Ping a beacon with the node's address
  4. Expose hero_proc for authenticated remote management over mycelium
  5. Allow operators to remotely deploy and manage services on each node

Background

Current Architecture (zinit)

The installer currently bakes in zinit + zinit_server binaries and the following service chain:

zinit (systemd unit)
  └─ install-mycelium.toml  (oneshot: download mycelium binary)
      └─ mycelium.toml      (daemon: IPv6 overlay network)
          └─ call-home.toml  (oneshot: register with beacon)

Key files to replace:

  • config/zinit.service — systemd unit for zinit
  • config/zinit/*.toml — zinit service definitions
  • config/zinit/call-home.sh — beacon registration script
  • builder/lib/rootfs.shinstall_zinit() function (lines ~142-184)
  • test/zinit_services.bats — zinit TOML validation tests

New Architecture (hero_proc)

hero_proc is a Rust-based process supervisor with:

  • TOML-based service configuration (similar to zinit)
  • Dependency management (requires, after, wants, conflicts)
  • Health checks (TCP, HTTP, exec-based)
  • OpenRPC 2.0 API over Unix socket
  • Web admin dashboard (embedded, no CDN)
  • Secret management
  • Xinet socket activation (on-demand service startup)
  • PID 1 mode for containers/VMs

Repo: https://forge.ourworld.tf/lhumina_code/hero_proc


Detailed Requirements

Phase 1: Replace zinit with hero_proc in the image

1.1 Build System Changes (builder/lib/rootfs.sh)

  • Replace install_zinit() with install_hero_proc()
  • Download hero_proc and hero_proc_server binaries from the Forgejo package registry (or build from source)
  • Install binaries to /usr/local/bin/
  • Create systemd unit file for hero_proc_server (replaces config/zinit.service)
    • ExecStart: /usr/local/bin/hero_proc_server
    • Config dir: /etc/hero_proc/services/
    • Socket: /run/hero_proc.sock
    • After: network-online.target
    • WantedBy: multi-user.target
  • Copy hero_proc service TOML files to /etc/hero_proc/services/
  • Enable hero_proc systemd service

1.2 Service Definitions (new TOML format)

Convert existing zinit TOMLs to hero_proc format:

install-mycelium service:

[service]
name = "install-mycelium"
description = "Download mycelium binary if not present"
exec = "/bin/sh -c 'test -f /usr/local/bin/mycelium || curl -fsSL https://github.com/threefoldtech/mycelium/releases/latest/download/mycelium-x86_64-unknown-linux-musl.tar.gz | tar xz -C /usr/local/bin/'"
status = "start"
oneshot = true
class = "system"

mycelium service:

[service]
name = "mycelium"
description = "Mycelium IPv6 overlay network daemon"
exec = "/usr/local/bin/mycelium --peers <peer-list> --tun-url http://[::1]:8989"
status = "start"
class = "system"

[dependencies]
requires = ["install-mycelium"]

[[actions]]
name = "health"
script = "curl -sf http://127.0.0.1:8989/api/v1"
trigger = "check"
interval_ms = 10000
timeout_ms = 3000

call-home service:

[service]
name = "call-home"
description = "Register node with beacon server"
exec = "/usr/local/bin/call-home.sh"
status = "start"
oneshot = true
class = "system"

[service.env]
CALL_HOME_URL = "https://beacon.znzfreezone.net/api/v1/nodes/register"

[dependencies]
requires = ["mycelium"]

1.3 Update call-home.sh

  • Update the beacon registration payload to include hero_proc endpoint information
  • Include the node's mycelium IPv6 address and the hero_proc RPC port/socket info
  • Include the node's public key (for authentication, see Phase 2)

Proposed payload:

{
  "hostname": "node-01",
  "mycelium_pubkey": "<key>",
  "mycelium_subnet": "<subnet>",
  "mycelium_address": "<ipv6>",
  "hero_proc_port": 9999,
  "hero_proc_auth_pubkey": "<ed25519-public-key>"
}

Phase 2: Remote Access & Security

The Problem

hero_proc currently uses Unix socket permissions only for access control — no authentication, no TLS, no tokens. For remote management over the network, we need an authentication layer.

Security Requirements

  • 5 datacenter nodes must be remotely manageable
  • Access is over mycelium IPv6 overlay (already encrypted at transport level)
  • Only authorized operators should be able to issue RPC commands
  • Must work without interactive login (no SSH-then-run-CLI workflow)
  • Must be bootstrappable — the auth mechanism must work from first boot with no manual setup

Proposed Approach: Pre-Shared Ed25519 Key Authentication

Recommended approach for the initial deployment:

  1. At build time:

    • Generate or embed an Ed25519 keypair for the operator (or use an existing one)
    • Bake the operator's public key into the image at /etc/hero_proc/authorized_keys/
    • Generate a unique node keypair per node (or at first boot) stored at /etc/hero_proc/node_key
  2. At runtime (hero_proc_server):

    • Expose hero_proc RPC over TCP on the mycelium IPv6 address (not just Unix socket)
    • Require a signed challenge-response or signed request header for all RPC calls
    • Verify the signature against the authorized public keys
  3. Operator workflow:

    • Operator queries beacon to discover nodes and their mycelium addresses
    • Operator's client signs RPC requests with their Ed25519 private key
    • Node verifies signature → allows RPC execution

Alternative Approaches to Consider

Approach Pros Cons
Pre-shared Ed25519 keys (recommended) Simple, no external deps, works offline, rotate by rebuilding image Key rotation requires re-imaging or remote update mechanism
Pre-shared symmetric token Simplest to implement Single secret, if leaked all nodes compromised, no per-operator audit
mTLS (mutual TLS) Industry standard, per-client certs Requires CA infrastructure, cert distribution, expiry management
SSH tunnel + Unix socket Reuses existing SSH auth Requires SSH session per node, not great for automation
WireGuard/Mycelium built-in auth Mycelium already handles encryption Mycelium auth is network-level, not application-level; anyone on the overlay can reach the port

Decision Needed

Question: Which authentication approach should we implement? The recommendation is pre-shared Ed25519 keys baked into the image at build time, with the operator's public key in a known location. This gives us:

  • Zero-config on first boot
  • No external dependencies
  • Per-operator identity (can add multiple authorized keys)
  • Audit trail (signed requests identify the caller)
  • Rotation possible via image rebuild or remote key push (once first key is trusted)

Phase 3: hero_proc TCP Listener for Remote Access

3.1 Expose hero_proc over TCP

Currently hero_proc only listens on Unix sockets. For remote management we need it accessible over the network. Options:

  • Option A: Use hero_proc's xinet proxy — configure a xinet listener that bridges TCP on [mycelium_ipv6]:9999 to the Unix socket. This works today with no code changes to hero_proc.
  • Option B: Add TCP bind support to hero_proc_server — modify hero_proc to natively bind to a TCP address. More robust but requires upstream changes.
  • Option C: Reverse proxy (nginx/caddy) — too heavy for a minimal datacenter node.

Recommendation: Start with Option A (xinet proxy) as it requires zero changes to hero_proc. The xinet service config:

# /etc/hero_proc/services/hero-proc-remote.toml
[service]
name = "hero-proc-remote"
description = "Expose hero_proc RPC over mycelium network"
exec = "/usr/local/bin/hero_proc xinet set hero-proc-tcp --listen tcp:[::]:9999 --backend unix:/run/hero_proc.sock --connect-timeout 5"
status = "start"
oneshot = true
class = "system"

[dependencies]
requires = ["mycelium"]

3.2 Add hero_proc service for beacon registration

Create a new hero-proc-remote service in the config that:

  1. Waits for mycelium to be healthy
  2. Sets up the TCP→Unix socket bridge via xinet
  3. Registers with the beacon (replaces call-home.sh or extends it)

Phase 4: Update Tests

  • Replace test/zinit_services.bats with hero_proc service validation tests
    • Validate new TOML files against hero_proc's expected format
    • Verify dependencies are correct
    • Check service names match filenames
  • Update test/test-call-home.sh E2E test for new payload format
  • Add integration test: boot image → hero_proc starts → mycelium connects → beacon receives registration → remote RPC call succeeds
  • Test with 5-node scenario (can use QEMU instances)

Phase 5: Multi-Node Deployment

  • Build image with hero_proc + mycelium + auth keys
  • Flash 5 USB drives (or netboot)
  • Install on 5 datacenter nodes
  • Verify all 5 nodes register with beacon
  • Verify remote hero_proc access works for all 5 nodes
  • Deploy initial services remotely via hero_proc RPC

Boot Sequence (New)

┌─────────────────────────────────────────────────────────┐
│ Node Powers On                                          │
│  └─ GRUB → Linux kernel → systemd                       │
│      └─ hero_proc_server.service starts                 │
│          ├─ install-mycelium (oneshot)                   │
│          │   └─ downloads mycelium if needed             │
│          ├─ mycelium (daemon)                            │
│          │   └─ connects to overlay, gets IPv6           │
│          ├─ hero-proc-remote (oneshot)                   │
│          │   └─ xinet: TCP [mycelium_ipv6]:9999 →        │
│          │          Unix /run/hero_proc.sock             │
│          └─ call-home (oneshot)                          │
│              └─ POST to beacon:                          │
│                  {hostname, ipv6, pubkey, rpc_port}      │
│                                                         │
│  Node is now remotely manageable via:                   │
│  hero_proc --socket tcp:[node_ipv6]:9999 <command>      │
└─────────────────────────────────────────────────────────┘

Files to Modify

File Action
builder/lib/rootfs.sh Replace install_zinit() with install_hero_proc()
config/zinit.service Replace with config/hero_proc.service (systemd unit)
config/zinit/*.toml Replace with config/hero_proc/*.toml (new format)
config/zinit/call-home.sh Update payload, move to config/hero_proc/call-home.sh
config/build.conf Add hero_proc version/source config
config/ssh-keys.list Potentially add operator Ed25519 public keys
test/zinit_services.bats Rewrite for hero_proc TOML validation
test/test-call-home.sh Update for new payload and hero_proc
ubuntu-installer-prd.md Update architecture documentation
README.md Update references from zinit to hero_proc

New Files

File Purpose
config/hero_proc.service systemd unit for hero_proc_server
config/hero_proc/install-mycelium.toml Download mycelium binary
config/hero_proc/mycelium.toml Mycelium overlay daemon
config/hero_proc/call-home.toml Beacon registration
config/hero_proc/hero-proc-remote.toml TCP xinet bridge for remote access
config/hero_proc/authorized_keys/ Operator Ed25519 public keys

Open Questions

  1. Auth mechanism: Pre-shared Ed25519 keys (recommended) vs symmetric token vs mTLS? See Phase 2 discussion above.
  2. hero_proc binary distribution: Download from Forgejo package registry at build time (like zinit currently) or cross-compile and embed?
  3. Per-node identity: Generate unique node keypair at first boot, or bake different keys per node at build time?
  4. Beacon updates: Does the beacon server need to be updated to accept the new payload fields (hero_proc_port, hero_proc_auth_pubkey)?
  5. hero_proc upstream changes: Does hero_proc need any changes to support authenticated TCP access, or do we implement auth as a middleware/proxy layer?
  6. Node naming: How are the 5 nodes distinguished? Sequential hostnames (node-01 through node-05)? Passed at install time via --hostname?

Acceptance Criteria

  • Image builds successfully with hero_proc instead of zinit
  • Installed node boots and hero_proc starts automatically
  • Mycelium overlay network connects and node gets IPv6 address
  • Node registers with beacon (including hero_proc endpoint info)
  • hero_proc is accessible remotely over mycelium IPv6
  • Remote RPC calls are authenticated (only authorized operators)
  • All 5 datacenter nodes are manageable remotely
  • Existing tests updated and passing
  • New integration test for remote access scenario
## Summary Replace `zinit` process supervisor with [`hero_proc`](https://forge.ourworld.tf/lhumina_code/hero_proc) in the gubuntu-installer image. The installed nodes (5 datacenter machines) must: 1. Boot with `hero_proc` as the process supervisor 2. Start mycelium overlay network 3. Ping a beacon with the node's address 4. Expose `hero_proc` for **authenticated remote management** over mycelium 5. Allow operators to remotely deploy and manage services on each node --- ## Background ### Current Architecture (zinit) The installer currently bakes in `zinit` + `zinit_server` binaries and the following service chain: ``` zinit (systemd unit) └─ install-mycelium.toml (oneshot: download mycelium binary) └─ mycelium.toml (daemon: IPv6 overlay network) └─ call-home.toml (oneshot: register with beacon) ``` Key files to replace: - `config/zinit.service` — systemd unit for zinit - `config/zinit/*.toml` — zinit service definitions - `config/zinit/call-home.sh` — beacon registration script - `builder/lib/rootfs.sh` → `install_zinit()` function (lines ~142-184) - `test/zinit_services.bats` — zinit TOML validation tests ### New Architecture (hero_proc) `hero_proc` is a Rust-based process supervisor with: - TOML-based service configuration (similar to zinit) - Dependency management (`requires`, `after`, `wants`, `conflicts`) - Health checks (TCP, HTTP, exec-based) - OpenRPC 2.0 API over Unix socket - Web admin dashboard (embedded, no CDN) - Secret management - Xinet socket activation (on-demand service startup) - PID 1 mode for containers/VMs Repo: https://forge.ourworld.tf/lhumina_code/hero_proc --- ## Detailed Requirements ### Phase 1: Replace zinit with hero_proc in the image #### 1.1 Build System Changes (`builder/lib/rootfs.sh`) - [ ] Replace `install_zinit()` with `install_hero_proc()` - [ ] Download `hero_proc` and `hero_proc_server` binaries from the Forgejo package registry (or build from source) - [ ] Install binaries to `/usr/local/bin/` - [ ] Create systemd unit file for `hero_proc_server` (replaces `config/zinit.service`) - ExecStart: `/usr/local/bin/hero_proc_server` - Config dir: `/etc/hero_proc/services/` - Socket: `/run/hero_proc.sock` - After: `network-online.target` - WantedBy: `multi-user.target` - [ ] Copy hero_proc service TOML files to `/etc/hero_proc/services/` - [ ] Enable hero_proc systemd service #### 1.2 Service Definitions (new TOML format) Convert existing zinit TOMLs to hero_proc format: **install-mycelium service:** ```toml [service] name = "install-mycelium" description = "Download mycelium binary if not present" exec = "/bin/sh -c 'test -f /usr/local/bin/mycelium || curl -fsSL https://github.com/threefoldtech/mycelium/releases/latest/download/mycelium-x86_64-unknown-linux-musl.tar.gz | tar xz -C /usr/local/bin/'" status = "start" oneshot = true class = "system" ``` **mycelium service:** ```toml [service] name = "mycelium" description = "Mycelium IPv6 overlay network daemon" exec = "/usr/local/bin/mycelium --peers <peer-list> --tun-url http://[::1]:8989" status = "start" class = "system" [dependencies] requires = ["install-mycelium"] [[actions]] name = "health" script = "curl -sf http://127.0.0.1:8989/api/v1" trigger = "check" interval_ms = 10000 timeout_ms = 3000 ``` **call-home service:** ```toml [service] name = "call-home" description = "Register node with beacon server" exec = "/usr/local/bin/call-home.sh" status = "start" oneshot = true class = "system" [service.env] CALL_HOME_URL = "https://beacon.znzfreezone.net/api/v1/nodes/register" [dependencies] requires = ["mycelium"] ``` #### 1.3 Update call-home.sh - [ ] Update the beacon registration payload to include `hero_proc` endpoint information - [ ] Include the node's mycelium IPv6 address **and** the hero_proc RPC port/socket info - [ ] Include the node's public key (for authentication, see Phase 2) Proposed payload: ```json { "hostname": "node-01", "mycelium_pubkey": "<key>", "mycelium_subnet": "<subnet>", "mycelium_address": "<ipv6>", "hero_proc_port": 9999, "hero_proc_auth_pubkey": "<ed25519-public-key>" } ``` --- ### Phase 2: Remote Access & Security #### The Problem `hero_proc` currently uses **Unix socket permissions only** for access control — no authentication, no TLS, no tokens. For remote management over the network, we need an authentication layer. #### Security Requirements - **5 datacenter nodes** must be remotely manageable - Access is over **mycelium IPv6 overlay** (already encrypted at transport level) - Only **authorized operators** should be able to issue RPC commands - Must work **without interactive login** (no SSH-then-run-CLI workflow) - Must be **bootstrappable** — the auth mechanism must work from first boot with no manual setup #### Proposed Approach: Pre-Shared Ed25519 Key Authentication Recommended approach for the initial deployment: 1. **At build time:** - Generate or embed an **Ed25519 keypair** for the operator (or use an existing one) - Bake the **operator's public key** into the image at `/etc/hero_proc/authorized_keys/` - Generate a **unique node keypair** per node (or at first boot) stored at `/etc/hero_proc/node_key` 2. **At runtime (hero_proc_server):** - Expose hero_proc RPC over **TCP on the mycelium IPv6 address** (not just Unix socket) - Require a **signed challenge-response** or **signed request header** for all RPC calls - Verify the signature against the authorized public keys 3. **Operator workflow:** - Operator queries beacon to discover nodes and their mycelium addresses - Operator's client signs RPC requests with their Ed25519 private key - Node verifies signature → allows RPC execution #### Alternative Approaches to Consider | Approach | Pros | Cons | |----------|------|------| | **Pre-shared Ed25519 keys** (recommended) | Simple, no external deps, works offline, rotate by rebuilding image | Key rotation requires re-imaging or remote update mechanism | | **Pre-shared symmetric token** | Simplest to implement | Single secret, if leaked all nodes compromised, no per-operator audit | | **mTLS (mutual TLS)** | Industry standard, per-client certs | Requires CA infrastructure, cert distribution, expiry management | | **SSH tunnel + Unix socket** | Reuses existing SSH auth | Requires SSH session per node, not great for automation | | **WireGuard/Mycelium built-in auth** | Mycelium already handles encryption | Mycelium auth is network-level, not application-level; anyone on the overlay can reach the port | #### Decision Needed > **Question:** Which authentication approach should we implement? The recommendation is **pre-shared Ed25519 keys** baked into the image at build time, with the operator's public key in a known location. This gives us: > - Zero-config on first boot > - No external dependencies > - Per-operator identity (can add multiple authorized keys) > - Audit trail (signed requests identify the caller) > - Rotation possible via image rebuild or remote key push (once first key is trusted) --- ### Phase 3: hero_proc TCP Listener for Remote Access #### 3.1 Expose hero_proc over TCP Currently hero_proc only listens on Unix sockets. For remote management we need it accessible over the network. Options: - [ ] **Option A: Use hero_proc's xinet proxy** — configure a xinet listener that bridges TCP on `[mycelium_ipv6]:9999` to the Unix socket. This works today with no code changes to hero_proc. - [ ] **Option B: Add TCP bind support to hero_proc_server** — modify hero_proc to natively bind to a TCP address. More robust but requires upstream changes. - [ ] **Option C: Reverse proxy (nginx/caddy)** — too heavy for a minimal datacenter node. **Recommendation:** Start with **Option A** (xinet proxy) as it requires zero changes to hero_proc. The xinet service config: ```toml # /etc/hero_proc/services/hero-proc-remote.toml [service] name = "hero-proc-remote" description = "Expose hero_proc RPC over mycelium network" exec = "/usr/local/bin/hero_proc xinet set hero-proc-tcp --listen tcp:[::]:9999 --backend unix:/run/hero_proc.sock --connect-timeout 5" status = "start" oneshot = true class = "system" [dependencies] requires = ["mycelium"] ``` #### 3.2 Add hero_proc service for beacon registration Create a new `hero-proc-remote` service in the config that: 1. Waits for mycelium to be healthy 2. Sets up the TCP→Unix socket bridge via xinet 3. Registers with the beacon (replaces call-home.sh or extends it) --- ### Phase 4: Update Tests - [ ] Replace `test/zinit_services.bats` with hero_proc service validation tests - Validate new TOML files against hero_proc's expected format - Verify dependencies are correct - Check service names match filenames - [ ] Update `test/test-call-home.sh` E2E test for new payload format - [ ] Add integration test: boot image → hero_proc starts → mycelium connects → beacon receives registration → remote RPC call succeeds - [ ] Test with 5-node scenario (can use QEMU instances) --- ### Phase 5: Multi-Node Deployment - [ ] Build image with hero_proc + mycelium + auth keys - [ ] Flash 5 USB drives (or netboot) - [ ] Install on 5 datacenter nodes - [ ] Verify all 5 nodes register with beacon - [ ] Verify remote hero_proc access works for all 5 nodes - [ ] Deploy initial services remotely via hero_proc RPC --- ## Boot Sequence (New) ``` ┌─────────────────────────────────────────────────────────┐ │ Node Powers On │ │ └─ GRUB → Linux kernel → systemd │ │ └─ hero_proc_server.service starts │ │ ├─ install-mycelium (oneshot) │ │ │ └─ downloads mycelium if needed │ │ ├─ mycelium (daemon) │ │ │ └─ connects to overlay, gets IPv6 │ │ ├─ hero-proc-remote (oneshot) │ │ │ └─ xinet: TCP [mycelium_ipv6]:9999 → │ │ │ Unix /run/hero_proc.sock │ │ └─ call-home (oneshot) │ │ └─ POST to beacon: │ │ {hostname, ipv6, pubkey, rpc_port} │ │ │ │ Node is now remotely manageable via: │ │ hero_proc --socket tcp:[node_ipv6]:9999 <command> │ └─────────────────────────────────────────────────────────┘ ``` --- ## Files to Modify | File | Action | |------|--------| | `builder/lib/rootfs.sh` | Replace `install_zinit()` with `install_hero_proc()` | | `config/zinit.service` | Replace with `config/hero_proc.service` (systemd unit) | | `config/zinit/*.toml` | Replace with `config/hero_proc/*.toml` (new format) | | `config/zinit/call-home.sh` | Update payload, move to `config/hero_proc/call-home.sh` | | `config/build.conf` | Add hero_proc version/source config | | `config/ssh-keys.list` | Potentially add operator Ed25519 public keys | | `test/zinit_services.bats` | Rewrite for hero_proc TOML validation | | `test/test-call-home.sh` | Update for new payload and hero_proc | | `ubuntu-installer-prd.md` | Update architecture documentation | | `README.md` | Update references from zinit to hero_proc | ## New Files | File | Purpose | |------|--------| | `config/hero_proc.service` | systemd unit for hero_proc_server | | `config/hero_proc/install-mycelium.toml` | Download mycelium binary | | `config/hero_proc/mycelium.toml` | Mycelium overlay daemon | | `config/hero_proc/call-home.toml` | Beacon registration | | `config/hero_proc/hero-proc-remote.toml` | TCP xinet bridge for remote access | | `config/hero_proc/authorized_keys/` | Operator Ed25519 public keys | --- ## Open Questions 1. **Auth mechanism**: Pre-shared Ed25519 keys (recommended) vs symmetric token vs mTLS? See Phase 2 discussion above. 2. **hero_proc binary distribution**: Download from Forgejo package registry at build time (like zinit currently) or cross-compile and embed? 3. **Per-node identity**: Generate unique node keypair at first boot, or bake different keys per node at build time? 4. **Beacon updates**: Does the beacon server need to be updated to accept the new payload fields (`hero_proc_port`, `hero_proc_auth_pubkey`)? 5. **hero_proc upstream changes**: Does hero_proc need any changes to support authenticated TCP access, or do we implement auth as a middleware/proxy layer? 6. **Node naming**: How are the 5 nodes distinguished? Sequential hostnames (`node-01` through `node-05`)? Passed at install time via `--hostname`? --- ## Acceptance Criteria - [ ] Image builds successfully with hero_proc instead of zinit - [ ] Installed node boots and hero_proc starts automatically - [ ] Mycelium overlay network connects and node gets IPv6 address - [ ] Node registers with beacon (including hero_proc endpoint info) - [ ] hero_proc is accessible remotely over mycelium IPv6 - [ ] Remote RPC calls are authenticated (only authorized operators) - [ ] All 5 datacenter nodes are manageable remotely - [ ] Existing tests updated and passing - [ ] New integration test for remote access scenario
timur closed this issue 2026-04-01 09:13:35 +00:00
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
geomind_code/gubuntu-installer#1
No description provided.