lab: UX papercuts + hero_aibroker fresh-install + README audit #296
No reviewers
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_skills!296
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "lab-followup-fixes"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Bundles four commits that together close issue #282 (the hero_aibroker fresh-install wall) and clean up six medium/low-severity UX papercuts from our recent fresh-Ubuntu-24 test pass. Verified end-to-end on a clean VM:
curl … install.sh | bash→lab user init→lab install core→lab secrets set OPENROUTER_API_KEY <key>→lab service corenow succeeds on the first try with all smoke tests green.This is a direct follow-up to merged PR #286 (
fresh-installer-fixes) which closed issue #281.Commits
1.
fix(lab): UX polish — warnings, SIGPIPE, lab service no-args, sccache noise, polling spamSix bundled fixes from the followup bug list:
completions.rs, missing#[allow(unreachable_code)]block, deadrun_opt/connect_raw, leftoverKindimport infast_teardown.rs).main()solab repo find | head,lab secrets list | grep, etc. stop crashing with "Broken pipe (os error 32)".lab serviceno-args: print a directory of installed services + usage hints instead of erroring with "no .git directory found" when called outside a git repo.lab servicedep-wait WARN lines to ≤ 1 per socket per 5 s (after a 10 s warm-up), bringing aggregate noise down ~10× (from ~430 lines per dep-wait phase to ~42).2.
fix(lab): hero_aibroker fresh-install — config auto-fetch + provider-key preflight + cleaner --stopCloses the hero_aibroker chain (issue #282):
ensure_companion_configinacquire.rs: whenacquire_binaryresolveshero_aibroker_server, also fetchmodelsconfig.ymlfrom the repo's main branch raw URL into$PATH_VAR/hero_aibroker/modelsconfig.yml. Idempotent (skip when present + non-empty). Match is strict — any other binary returnsOk(())immediately, so zero new behavior for hero_proc, hero_router, hero_db, etc. Fetch failure is non-fatal: warn and continue.preflight_aibroker_provider_keyinservice_manager.rs: beforedo_start_validatedhandshero_aibroker_serveroff to hero_proc, query hero_proc secrets for any of the 10 supported provider key names (OPENROUTER_API_KEY,ANTHROPIC_API_KEY,ANTHROPIC_API_KEYS,OPENAI_API_KEY,GROQ_API_KEY,CEREBRAS_API_KEY,DEEPSEEK_API_KEY,DEEPINFRA_API_KEY,SAMBANOVA_API_KEY,HF_API_TOKEN). If none are set, bail up front with the actionablelab secrets set …invocation and full key-name list — instead of letting aibroker register, crash 5×, and bury the error in$PATH_VAR/logs/core/hero_aibroker_server/<job>/<date>.log.do_stopshort-circuit: treatstate == "failed" | "error" | "halted"as already-stopped (no live process, clean state). Replaces the self-contradictory "stop returned an error (may already be stopped): stop failed — state 'failed'" message.3.
fix(lab acquire): fetch companion config on Forge-download path tooBug fix on top of commit 2: the previous patch missed the most common path. When
try_forge_downloadreturnsOk(true)ANDhost_path.exists(), the early return inside the match arm bypassed the post-matchif forge_ok { ensure_companion_config(...) }block. Moved the call inside the early return.Verified on a second fresh VM run:
lab service corenow prints…followed by
smoke tests: 44 passed.4.
docs(lab/readme): audit retired/renamed surfaces; document new commandsEight targeted edits to bring the README in line with the current binary:
lab, addslab user init,lab install <component>,lab path,lab completions,lab infocheck, all newlab buildredeploy verbs (--restart,--fast --restart,--fast --stop,--reset --start),lab service <name> --<verb>syntax.lab build [flags]and prefixed with a deprecation note: bare-laband top-level--start/--stop/--statusare retired, build lives underlab build, service lifecycle underlab service <name>. Every example flag updated tolab build --flag. Adds the destructive-redeploy flow.ROOTDIR→PATH_ROOT,CODEROOT→PATH_CODE. Note hero_cfg.toml hydration.lab build [REPO]): add the new redeploy flag examples (--restart,--fast --restart,--fast --stop,--reset --start). Rename$CODEROOT→$PATH_CODE.Service lifecycle): drop the retiredlab --start/stop/statusparenthetical from the heading. Replace the "Start / stop / status (no build)" subsection with a paragraph pointing atlab service <name> --<verb>andlab build --status|--stopsince the top-level flags are gone.lab path(incl. TTY-aware error form) andlab completions.lab infocheck.BUILDDIR/ROOTDIRwith thePATH_*family; addPATH_VAR,CARGO_HOME,RUSTUP_HOME; note auto-hydration.Test plan
Verified end-to-end on a fresh Ubuntu 24.04 root VM using the canonical 5-command onboarding:
Confirmed observable behaviors:
lab repo find | head -5exits 0 (no SIGPIPE panic)lab serviceoutside a git repo prints the installed-services directorylab build hero_routercold output has no scarysccache: Connection refusedlab service coredep-wait emits ≤ ~50 WARN lines per service (down from ~430)cargo build -p labfinishes with 0 warningslab service corefetches modelsconfig.yml automatically:installed companion config: /root/hero/var/hero_aibroker/modelsconfig.yml (20309 bytes)lab service corefinishes with=== lab service core: all services started ===,every service's smoke tests green (hero_proc 2/2, hero_proc_admin 2/2, hero_router 6/6,
hero_db_server 4/4, hero_aibroker_server 44/44, hero_code_server 4/4,
hero_code_admin 2/2, hero_db_admin 2/2)
lab service hero_aibroker_server --stopon afailed-state service prints"state is failed — no live process, already stopped." (was the contradictory
"stop returned an error (may already be stopped): stop failed")
If the provider-key preflight fires (no AI provider key set),
lab service corebails up front with:…instead of letting aibroker register, crash 5 times, and bury the message in a job log.
Issues closed
Issues that stay open
None on the original report.
Bundles six small fixes uncovered during fresh-install testing on Ubuntu 24. Each is independently small but they all share the same "papercut that makes lab annoying to use day-to-day" character. 1. Clear all 6 cargo build warnings. - installers/completions.rs:install — gate unused params under `#[allow(unused_variables)]`. Signature kept so re-enabling completions is a single-line revert. - installers/completions.rs:ensure_nu_config_uses_completions — wrap dead body in `#[allow(unreachable_code)] { ... }` block to mirror the sibling `install` function. - flow/uninstall.rs — delete dead `run_opt`. - secrets/client.rs — delete dead `connect_raw` shim. - service/fast_teardown.rs:start_repo_services — remove the leftover `use herolib_core::base::Kind;` import (Kind is used in a sibling function, not this one). 2. Restore default SIGPIPE handling on Unix so `lab repo find | head`, `lab secrets list | head`, `lab X | grep` etc. stop crashing with: thread 'main' panicked at .../io/stdio.rs:1165:9: failed printing to stdout: Broken pipe (os error 32) Now the process exits 141 cleanly when its stdout reader goes away, like every other well-behaved CLI. 3. `lab service` (no name, outside a git repo) now prints a one-screen directory of installed services instead of erroring with "no .git directory found" or panicking with "PATH_ROOT is not set". New `service::list_installed_service_binaries` + `print_service_directory` helpers; both fallback sites in main.rs route through them. Three branches: no PATH_ROOT, empty bin dir, populated bin dir. 4. Silence sccache subprocess noise during routine restart. Every cold `lab build` printed: sccache: error: couldn't connect to server sccache: caused by: Connection refused (os error 111) sccache: Starting the server... The "Connection refused" is benign — `--stop-server` against a not-yet-running daemon. Pipe sccache's stdout+stderr to /dev/null on both --stop-server and --start-server. Lab still logs its own intent via `tracing::info!`/`warn!`. 5. Throttle `lab service` dep-wait WARN spam. Previously the 12-socket wait phase in `lab service core` emitted ~24 lines/sec for 18 s (~430 identical "not ready yet" lines per dependency). Now we warn at most once per socket per 5 s, and only after a 10 s warm-up. Reduces aggregate noise to ~2.4 lines/sec and makes the smoke-test output below it actually readable. 6. As a side effect of (5), the interleaved-stdout problem (parallel WARN + smoke-test ✗ lines fusing mid-line) becomes statistically rare. True atomicity would require routing tracing + println through a single channel — kept out of scope.Three coupled fixes that together make `lab service core` work on a fresh Ubuntu 24 box without manual log-spelunking. Discovered while reproducing hero_skills#282 end-to-end: 1. ensure_companion_config — when acquire_binary resolves hero_aibroker_server, also fetch modelsconfig.yml from the repo's main branch raw URL into $PATH_VAR/hero_aibroker/modelsconfig.yml. Idempotent (skip when present + non-empty). Hits all three acquire paths (installed cache hit / Forge download / build-from-source). Match is strict — any other binary returns Ok(()) immediately, so zero new behavior for hero_proc, hero_router, hero_db, etc. Fetch failure is non-fatal: warn and continue so the binary install path never blocks on the config fetch. Without this, fresh boxes hit: Error: Failed to read config file: /root/hero/var/hero_aibroker/modelsconfig.yml 2. preflight_aibroker_provider_key — before do_start_validated hands hero_aibroker_server off to hero_proc, query hero_proc secrets for any of the 10 supported provider keys (OPENROUTER_API_KEY, ANTHROPIC_API_KEY, ANTHROPIC_API_KEYS, OPENAI_API_KEY, GROQ_API_KEY, CEREBRAS_API_KEY, DEEPSEEK_API_KEY, DEEPINFRA_API_KEY, SAMBANOVA_API_KEY, HF_API_TOKEN). If none are set, bail up front with the actionable `lab secrets set …` invocation. Fires for both the direct (`lab service hero_aibroker --start`) and transitive (lab service core → hero_code → hero_aibroker) paths. Without this, the binary registered with hero_proc, refused to start with a clear error message — but that message landed in /root/hero/var/logs/core/<service>/<job>/<date>.log, not stdout. Users had to grep job logs to find out which secret to set. 3. do_stop short-circuits on `failed`/`error`/`halted` state. Before: hero_aibroker_server: stop returned an error (may already be stopped): stop failed — state 'failed' …a self-contradictory message that left users guessing whether the service was stopped or not. Now: query state first, treat any of {failed, error, halted} as already-stopped and return success with "state is failed — no live process, already stopped." Closes hero_skills#281 Bug 7. Plus a README block in the "Next steps after lab user init" section listing the provider-key step and the full supported-key list, with a forward reference to the preflight check so users know `lab service core` will tell them precisely what to set. Refs: hero_skills#281, hero_skills#282The previous PR5 commit added `ensure_companion_config` after the Forge-download branch in `acquire_binary`, but that block was unreachable for the most common path: when `try_forge_download` returns `Ok(true)` AND `host_path` already exists (which is the typical case after a successful Forge install), the function early-returns inside the match arm — never reaching the post-match `if forge_ok { … }` block. Confirmed on a fresh Ubuntu 24 VM running lab-followup-fixes@3eff247: `lab service core` downloaded `hero_aibroker_server` from Forge, then went straight to start without any "fetching companion config:" line. Aibroker then failed its 44 smoke tests because `/root/hero/var/hero_aibroker/modelsconfig.yml` was missing. Move the `ensure_companion_config` call inside the early return so it fires on the Forge-download fast path. The block after the match becomes dead code; left in place as defensive coverage for the forge_ok && !host_path.exists() edge case (which shouldn't normally happen but doesn't hurt to handle). Refs: hero_skills#282lab-followup-fixesto lab: UX papercuts + hero_aibroker fresh-install + README audit