[META] Hero OS sandbox demo, functional readiness: onboarding pipeline + per-app verification

mik-tf commented

2026-05-28 15:43:33 +00:00

Owner

Current state (session 222, 2026-06-08): We shipped and proved the two quick reliability fixes for the shared host. First, a tester's web address is no longer just their username, so re-adding a tester never collides with a leftover network-name registration from a failed attempt; each new machine gets an address unique to that machine, and an operator can also type a custom address. Existing testers keep their current address. Second, the dashboard's free-capacity readout now reports how many testers actually fit based on real disk space rather than a raw slot count, so it no longer claims room that is not there. We found the current host was in fact full (it was wrongly showing room for two more), and the dashboard now correctly says it is full. Both fixes are on the stable branch and live on the admin machine, including updating the underlying capacity component there (which had been deliberately held back) after backing it up for instant rollback. We proved the whole flow by removing one test machine and rebuilding it from scratch: the freed space showed up, the rebuild got a fresh unique address, finished setup, and its login page worked. We also re-planned the order with the operator: close the internal sign-in gap on the private network next, then add the ability to run across more than one host (the unlock now that this host is full), and only then keep a small pool of ready machines (which needs spare room to prepare them). Next: gate the dashboard on the private network the same as the public address.

Current state (session 221, 2026-06-08): We fixed the bug that stopped every tester from logging in, and a tester can now sign in end to end for the first time. The login was failing at the final identity check for three stacked reasons, which we found by adding precise logging and capturing the real error rather than guessing. First, the login token's audience field arrives as a list but our code expected a single value and rejected it before any check. Second, the login token from our identity provider carries only the bare account id, not the username or email (those come from a separate profile lookup we were not making), so an account resolved to a numeric id that never matched the allow list and everyone was refused. Third, a restart could drop the setting that asks for the login token at all. We fixed all three, ran an independent security review that confirmed nothing was weakened, and proved it with a real scripted sign in that returned successfully with the correct username. The fix is on the stable branch and republished, and both login issues are now closed. We also closed a gap where new testers received only one of the two assistant keys: the setup only copies keys that are configured on the admin machine, and only the first was set, so we added the second on the admin machine (verified working), put it on all four existing testers, and updated the runbook so both must be set at admin setup and every future tester gets both. We rebuilt despiegk's machine cleanly with the login fix and both keys, leaving his account untouched. His machine and login gate are correct and reachable from inside the grid, but his public web link is still down on a grid network routing problem that is not our software (rebuilding the gateway did not help, and the machine is reachable from other machines), so we wrote it up for the ThreeFold team and left his machine running for them to inspect. We also re-planned the roadmap with the operator: make the single shared host rock solid first with the quick high value fixes, then verify each app, with running across more than one host added as the explicit next scaling step since the current host is nearly full. Next: the quick reliability fixes (unique gateway names and an honest capacity count), then a small pool of ready machines, then close the internal sign in gap and verify each app.

Current state (session 220, 2026-06-08): We rebuilt despiegk's test machine and fixed the bug that was stopping every new tester from logging in. The login bug: the gateway's security library needed a one line setup it was missing, so it crashed while verifying the login token and showed users an invalid session error; we fixed it, deployed it to a test machine, and confirmed the crash is gone. We also shipped three more fixes to the stable branch: the setup program now computes a machine's web address itself when the grid component fails to report it (so signups recover automatically), the build server is green again (it could not fetch a shared library that had been moved to a new location), and a shared code generator no longer clashes with itself during parallel builds. We proved the signup flow end to end by setting up a brand new test user, which came fully live in about nine minutes. despiegk's machine, services, and login gate are up, but its public web link is still waiting on a one off slow grid network route (not system wide; the new test user routed fine). We found and filed three more items: the dashboard can be reached without signing in over the internal grid network and should be gated like the public address, a capacity readout that overcounts free space, and a plan to make gateway names unique so a failed signup never blocks a retry. With the login crash fixed, signup now gets all the way to the final identity check and fails one step later, which is the first task next session (the gateway most likely was not told where to fetch the login server's verification key). Next: finish that last login step, give despiegk a fresh machine, then close the dashboard sign in gap, the capacity readout, and per app verification.

Current state (session 219, 2026-06-07): We fixed the urgent problem that was breaking every new tester setup, and proved a brand new tester now sets up cleanly entirely from the stable branch. The root cause was that our services were publishing their stable download from the work-in-progress branch instead of the stable branch, so a fresh machine pulled half-finished builds that no longer understood each other. We corrected the publishing rule (the stable branch publishes the stable release, the work branch publishes a separate pre-release) on the core build tool and two core services, which was enough to unblock setup, and we published the previously held setup-reliability fix to the stable branch and confirmed it on a real fresh machine. The voice service was the hardest part: its stable branch had been broken by a mistaken automated merge of the work branch into it, and with the owner's confirmation we undid that merge while keeping the recent voice work, then fixed a missing build credential so it can rebuild. We proved the whole thing end to end by setting up a throwaway tester entirely from stable builds: it reached ready, its login page worked, and the voice service answered. All eleven services the sandbox uses now build and publish from the stable branch. The four real test machines were left untouched, the throwaway was removed cleanly, and we confirmed no work-in-progress was pushed into any stable branch. Next: rebuild despiegk's machine now that setup works again (clear the leftover network-name registration, then re-create it), then check that each app actually works inside the cockpit after signing in.

Current state (session 218, 2026-06-07): The operator hit a real failure: adding the tester despiegk failed at the install step and their page showed an error. We traced it. The login gate on a brand new machine was not being set up, because the step that configures it reads a stored value that, on a fresh machine, comes back empty even though it is actually present (a version mismatch between two pieces of the tooling, not a mistake we introduced, which we confirmed by checking the history). We built a fix in the setup program that (1) waits for a brand new machine to become reachable on the private network instead of giving up after a few seconds, (2) sets the login gate up directly and reliably instead of depending on that racy step, and (3) retries the few service starts that can lose a start-up race, and we lengthened the overall setup time limit to cover the wait. The fix is written, fully checked by automated tests, and installed on the admin machine, but we are holding it back from the shared branch until we can prove it works on a real fresh setup. We could not prove it this session because two separate infrastructure problems got in the way. First, removing despiegk's old machine left a leftover network-name registration on the chain that blocks rebuilding his machine under the same name. Second, and more urgent, a freshly published version of the core tooling can no longer start the services on a brand new machine at all, which is breaking every new setup right now regardless of our fix; we filed that as an urgent issue for the build owners. We also filed a longer-term idea to keep a few machines pre-built and ready so the slow and flaky parts happen ahead of time, and a request for an in-dashboard view of the setup log so failures are visible without logging in to the server. The four real test machines were left untouched and working. Next: get the urgent tooling problem fixed, then prove our setup fix on a fresh machine and merge it, then rebuild despiegk's machine.

Current state (session 217, 2026-06-07): This session was an investigation with no product code changed. We set out to find why setting up a brand new test machine is slow, and to confirm or rule out the assumption that downloading the programs is the main cost. It is not: downloading the entire set of programs takes only about half a minute of a roughly six minute setup, so the planned speed-up of copying programs from the main machine instead of downloading them would save only a minute or two and would add real complexity and upkeep. Most of the time goes elsewhere, on installing the underlying system packages and on a step by step, one program at a time install that spends more bookkeeping per program than the actual download. We also surfaced two problems that matter more than raw speed. First, a brand new machine is not reachable on the private network for several minutes after it is created, and the setup currently gives up after about a minute instead of waiting for it, which can make a fresh setup fail outright. Second, the free capacity readout can over report room because it counts machine slots without checking disk space. Cheaper speed wins are available with no new moving parts, mainly installing the programs in parallel and shortening some fixed waits. We wrote up the full measurements and three clear options with a recommendation, and parked the build choice for the operator, since the original plan's main assumption turned out to be wrong. The recommendation is to fix the reachability and pre-load the common system packages first, do the cheap parallel-install speed up next, and treat copying programs from the main machine as optional polish. Next: the operator picks the approach, then we build it; the request to also choose which version to deploy is workable but needs a few groundwork fixes first.

Current state (session 216, 2026-06-07): We put the assistant and voice onto both the tester screens and the operator admin screen, and rolled it out to all four test machines plus the admin machine. We reused the assistant widget a teammate already built, which has voice built in and can use tools, and added it to the top of every cockpit and the admin dashboard. It runs on the assistant key the operator already gives each tester, so a tester needs no extra setup, and we listed the assistant on the Apps page. We also made sure brand new machines pick it up automatically by fixing how the assistant component is published. One real limit surfaced and is worth flagging: the no-setup assistant can chat and listen but cannot yet take actions such as adding a tester by voice, because taking actions needs a capability the assistant engine does not offer on the no-setup option yet. We asked the component owner what it would take to add that, and the chat and voice version is live now. On the operator review of the work so far we also removed a confusing and broken field on the add a tester form (a machine slot that only ever had one valid value), and we re-ordered the plan from that feedback: the next big win is making a fresh install much faster and letting the operator choose which branch to deploy, with the library showcase and running across more machines coming after. Next: the faster install and branch choice.

Current state (session 215, 2026-06-07): We finished the dashboard controls for access and keys, so the administrator panel now covers that whole area. First, the team's support sign-in keys (the SSH keys we put on every tester machine so we can help and debug) can now be viewed, added, and removed straight from the Settings page instead of being set by hand on the server, and a change applies to the next machine with no restart. Second, each tester's page now has an Access and keys panel: the administrator can choose which extra accounts may sign in to that tester's screen (the administrators and the tester themselves are always allowed, so an edit can never lock anyone out), and can optionally give a single technical tester command line access by pasting their key (off by default, since a normal tester only uses the browser). Both settings are stored against the person, so they survive rebuilding the machine, and they take effect the next time the machine is set up; in this testing phase we simply rebuild a machine to apply them, which keeps things simple and avoids touching a running machine. We also corrected our own notes: setting the email service key from the dashboard was already done in the previous two sessions, so this session completed the remaining access and key items instead. Everything was tested, shipped to the stable branch, then installed and checked on the live admin machine, including proving that editing the access list can never remove the always allowed accounts, with the real test machines left untouched. Next: put the assistant and voice on both the tester screens and the admin screen, so a person can simply ask it to do things, with a clear confirm before anything that changes something or costs money.

Current state (session 214, 2026-06-06): We made the admin dashboard able to manage the assistant keys and the welcome email properly, and we made it easy to find. An administrator can now store a starting key for any of the supported assistant providers, and when adding a tester choose which of those keys to put on that person's machine, or none at all for someone who will bring their own. The welcome email can be turned on or off, both for a single tester and for the whole machine, and its wording (the subject, opening line, and sign off) can be edited from the dashboard, with the sign in details always added automatically so it can never be left broken; an administrator can also send themselves a test copy first. We also fixed a real usability gap: the place to do all this had no menu link and sat buried on the home page, so the dashboard now has a proper top menu (Overview, Users, Settings, Manual), the setup moved onto its own Settings page, the per tester key choices grey out any provider that has no key set with a link to go set it, and there is a Manual page explaining how to use the whole dashboard. All of this is live on the test admin machine. We also planned the next big step and confirmed it is mostly assembly rather than new building: putting the assistant and voice on both the tester screens and the admin screen, so a person can simply ask it to do things, with a clear confirm before it carries out anything that changes something or costs money. We wrote that up along with two smaller follow ups (dashboard polish, and managing the assistant key centrally). Next: start the assistant on the admin screen, with the ask before acting safety.

Current state (session 212, 2026-06-06): We turned the three separate steps for adding a tester into a single action. The dashboard now has one Add and set up button that creates or registers the person, builds their machine, and installs everything in one go, showing progress through each stage (adding, provisioning, installing, ready) and finishing with a ready to copy sign in link; a brand new account's one time password is shown right there in the flow. The old separate buttons stay on the person's page, so a step that fails can be retried on its own and you can still register someone now and build their machine later. We also made the system aware of how full the shared host is: before you add anyone, the form now shows how many more testers will fit on the current host, and if the host is full or offline it refuses to start rather than half building a machine and leaving a stuck contract behind. We proved the whole thing end to end on the live system by adding a throwaway test tester through the new button, watching it go all the way to ready (its address correctly required signing in and the welcome email was sent), seeing the free capacity count drop and then return as we removed it, and tearing it down cleanly, leaving the three real test machines untouched. Next: let the email service key be set from the dashboard, then dashboard controls for support access and keys.

Current state (session 211, 2026-06-06): We built and shipped the dashboard feature that answers the recurring question of which version each tester machine is running and whether a newer one is available. Whenever a machine is set up, the system now records the exact version of every app and service it installed, and the admin dashboard shows that per machine. A new Check for updates button compares what a machine is running against the latest published versions and lists which apps have a newer build waiting. Updating a machine is the existing reinstall, which we made always pull the newest versions instead of skipping ones already present. The operator chose to track every installed component, arranged so that future apps such as the books and memory features are picked up automatically once they join the standard set. We proved the whole flow end to end on a brand new throwaway test machine built from the stable branch: setting it up recorded its versions, the dashboard correctly reported it as up to date immediately after install, and we then removed it cleanly, leaving the three real test machines untouched. Existing machines, which predate this feature, simply show their version as unknown until their next setup. Next: a single add and set up action so onboarding a tester is one click, then letting the email service key be set from the dashboard.

Current state (session 210, 2026-06-05): We did a full end to end check of the welcome email on a brand new test machine, with no person checking an inbox, and everything worked. We created a fresh test account and machine from the stable branch and set it up, then confirmed from the machine itself that the welcome email really was sent through the email service (a genuine message id came back, not a no-send placeholder), that the one time password was stored when the account was created and then erased the moment the email went out, that the email's wording carries the username to sign in as, that password, and a link straight to the dashboard, and that the brand new machine's entire web address requires signing in on its own (every part redirects to the login, while only the health check stays open). We then removed the throwaway test machine and its account cleanly, leaving the three real test machines untouched and still protected. This proves the welcome email and the whole address sign in protection both reproduce on a brand new machine straight from the stable branch, with no manual step. We then looked into the next item, showing in the dashboard which build each machine is running together with a reliable one click update, and found it needs a decision first: there is no clean record today of which version a machine is running, so rather than guess we wrote up the options with a recommendation and parked it for the operator to choose which set of components counts as the build. No product code changed this session. Next: make that build visibility decision and build it, then allow setting the email service key from the dashboard.

Current state (session 209, 2026-06-05): We closed the security gap flagged last session: a person who is not signed in can no longer load any part of a tester machine's web address. The whole address now requires signing in (it redirects to the Forge login), and only the health check and the login callback stay open; before, the bare address and a couple of internal status pages were reachable without signing in, which exposed the machine's internal service list. We proved this live on the admin machine and on all three running test machines. We also improved the welcome email so it now states the exact username to sign in as, plus the one time password for a brand new account (or tells an existing person to use their own password), and links straight to the dashboard app rather than the bare address. And any account name now works for onboarding, including names with capitals or dashes, because the web address is formed safely from the name. All of this is real code on the main branch that publishes automatically, so a brand new machine reproduces it on its own; the only manual step is providing the email service key, which we have now written into the setup guide along with the new whole address sign in behaviour. Two items remain for next time: a full end to end check of the welcome email on a freshly created test machine, and adding dashboard visibility for which build each machine is running together with a reliable one click update, so updating a machine is no longer done by hand.

Current state (session 208, 2026-06-05): We built and shipped the welcome email and proved it works end to end on the live system. When a new tester's machine finishes setting itself up, the system now automatically emails that person their personal web address and how to sign in, sent from a Hero address on a new project domain. We tested it for real: a fresh test machine was created and set up, and the moment it was ready the email arrived in a normal Gmail inbox, not spam. To do this properly we settled the naming, the company is Lhumina and Hero is the product, bought a company domain, and verified a dedicated sending sub-address with the email service (and asked the separate freezone project to do the same for its own domain). While testing we found two things to fix next session. First and most important: a person who is not signed in can still load the bare web address of a tester's machine and see a blank landing page, even though the actual dashboard correctly requires signing in. The entire machine address should require signing in, showing nothing to a logged out visitor, so closing that gap is the next priority. Second, the email should always say which username to sign in as and include the one time password for a brand new account (or tell an existing person to use their own password), and link straight to the dashboard instead of the blank page. We also cleaned up several leftover test machines and stuck grid contracts by hand, which keeps hitting a known grid teardown problem already flagged for the grid maintainer. Next: lock the whole tester address behind sign in, then refine the welcome email's wording and link.

Current state (session 207, 2026-06-05): We built and shipped the simpler way to add a tester, and it is now live for newly created machines. A new tester no longer needs an SSH key at all: a normal tester only ever uses the dashboard apps in a browser and never opens a terminal, so we stopped asking for that technical credential, and a machine now gets only our own setup key plus the team's support keys. We can also register someone who already has an account instead of only ever creating a brand new one, so a colleague with an existing account just gets a machine and signs in with what they already have. We added a friendlier page for when someone signs in but has not been granted access, telling them which account they used and to ask the administrator, instead of a bare error. While building this we re-checked the security claim from last session and corrected it honestly: because the assistant on a tester machine can run commands, a determined tester could in principle read the shared AI keys we preload, so removing terminal access reduces but does not fully remove that exposure. The operator accepted this for now, since those keys are limited and are rotated after each demo, and we wrote it down for a proper later fix. Everything was tested and merged, the released builds refreshed successfully, and existing running machines were unaffected. Next: the welcome email that sends a new tester their address and first steps.

Current state (session 206, 2026-06-05): This session was planning plus a quick live fix, with no new product code shipped. We agreed how to make adding a tester much simpler for ordinary people. A tester will no longer need an SSH key, which is a technical credential most people do not have and do not understand, because that key is only for opening a terminal on the machine, something a normal tester never does, while signing in to the dashboard uses their ordinary account. We will also be able to reuse an existing account instead of only ever creating a new one, so a colleague who already has an account just gets a machine and signs in with what they already have. We confirmed this is safe: the sign-in only reads a person's basic profile, never their private projects, and because a tester has no terminal access they can never read the AI keys we preload for them. We wrote this up as the plan for the next session (#247) and pointed it at the exact parts of the code to change. Separately, we brought one existing test machine fully up to match the main demo machine: its assistant now uses the same fast model with live web search and working keys, and it can drive the planner, slides and whiteboard apps and use voice, just like the main one. We verified the assistant key works, the page is reachable behind sign-in, all three apps respond, the assistant program is the same fast build as the main machine, and voice is running; the only thing left unmatched is a cosmetic settings row, deliberately skipped. Next: build the simpler onboarding (no SSH key, reuse existing accounts, and a friendlier access-denied message), then the welcome email.

Current state (session 205, 2026-06-05): We made shared whiteboard links work for anyone, so a tester can send a link and an outside person can open and collaborate on that one board without needing an account, the same way a shared document works. We traced why it was not working: the whiteboard's sharing feature itself was fine, but the machine's sign-in wall was blocking every outside visitor, and it had been blocking them since before testers were ever put behind that wall, so shared links had only ever worked in local testing, never on a deployed machine. The fix lets a shared board link past the sign-in wall as an anonymous visitor, while the whiteboard itself re-checks the link's secret on every action and only ever allows that one board, read-only for a view link or editable for an edit link, and refuses everything else. We proved all of this on the live test machine from a signed-out browser: opening the link works, reading and editing the shared board works, live collaboration works, and trying to reach a different board, write with a view-only link, or use the page without the link are all correctly refused, while the rest of the machine still requires sign-in. We also added a box in the tester settings where each tester can paste their own assistant subscription key, and fixed a separate problem where a person's sign-in could fail if the gateway was restarted while they were partway through signing in. Two cleanups remain: rotate a tester access token and the shared assistant key, and confirm a brand new machine picks up the sharing setting automatically. Next: the welcome email that sends a new tester their address and first steps.

Current state (session 204, 2026-06-04): We finished the planned polish of the demo assistant on the live test machine, and a person checked all three user-facing results by hand. The assistant is now faster on its first action, and it can answer questions using live web search now that it points at the subscription search service, with the key kept off the disk. The voice feature works end to end: you can speak to the planning assistant and hear it answer back. We removed a leftover test item from the planner, and added a small fix to the admin screens so clicking a tester action (provision, install, reinstall, destroy or delete) now shows a spinner and a label instead of looking frozen. A person confirmed all three results in the browser and by microphone: web search, voice and the spinner all work. A few small follow-ups remain for next time: rotate the shared search key after the demo, save the assistant's web-search settings into the tester setup tool so a freshly built machine gets them automatically, and refresh a couple of background release builds so machines built later match the live one. Next: let people open shared board links without hitting the sign-in wall, and let each tester add their own assistant key.

Current state (session 203, 2026-06-04): While preparing for the demo we found that the first tester created by hand through the admin screens (add, provision, install) was reachable on the internet with its login protection turned off, while the other testers correctly required sign-in. We traced the cause: when a new tester web address is set up, the grid can report the step as failed even though the address was actually created, and the tester creation tool then treated the tester as having no address and skipped setting up its sign-in protection, so the install finished without it. We fixed the tool so creating a tester now repairs a missing address and sign-in protection automatically and, most importantly, refuses to finish unless sign-in protection is in place, so a tester can never again be published without it. The admin screen also gained a one-click button to set up a missing address. We deployed this to the admin machine, repaired the exposed tester in place without deleting it so it now requires sign-in like the others, and created a brand new throwaway tester from scratch to confirm the whole flow end to end (create, set up, install, sign-in required, then delete) before removing it. All three real testers now require sign-in, each with its own separate login credentials. We also logged the underlying grid behaviour for the maintainers to fix at the source, and noted a later idea to add live video meetings. Next: resume the planned assistant polish (search, speed, voice).

Current state (session 201, 2026-06-04): We proved the AI assistant can actually operate the demo apps on a live deployed machine, and we supported a live demo. On the demo machine the assistant, running on the newest Kimi model, read real content from the planner, the whiteboard and the slides app through its tool connection, and it created a planning item in the planner, so it both reads and writes across all three apps. During the live demo the assistant was not answering because its model key held a bad value, so we set a working key, moved it to the newest model and added the slides app to its tool set, after which it worked. We also confirmed the two release builds a fresh machine needs are published and correct. One gap remains for brand new machines: the installer on the admin machine is an older build that writes the assistant configuration with an unsupported provider option and does not include the slides app or the newest model, so a freshly created machine does not yet get a working assistant. A fresh test machine was created and shows exactly that gap. We filed a follow up (home#249) to make the assistant fast on its first action and to close these installer gaps. Next, in order: (1) update the installer so a brand new machine gets a working, app driving assistant with no manual fixing (home#244); (2) the welcome email (home#236); (3) confirm every app works standalone, by voice and driven by the assistant (home#248); (4) the documentation library and memory stack (home#246).

Current state (session 200, 2026-06-04): The Kimi assistant now fully works on a deployed tester and the gateway chat-connection bug is fixed, both merged to the stable branch and proven live. A browser check found the assistant page rendered unstyled at the address without a trailing slash, and chat hung on "Reconnecting". Both were fixed properly: the gateway now forwards the live chat connection for tester domains (previously only one of its two routing paths did, so every tester domain's chat reconnect-looped), and the assistant now embeds its own setup files inside its program and recreates them on the machine at startup, so a fresh install needs no manual staging (this was the remaining follow-up from the previous session). Next, in order: (1) confirm a brand-new tester provisioned from the stable release comes up working with no hand-fixing (home#244); (2) add the welcome email so testers can be onboarded automatically (home#236, now using Resend); (3) confirm every demo app works standalone, by voice, and driven by the assistant (home#248); (4) the documentation-library and memory Ask-the-Librarian stack (home#246).

Current state (s199): The Kimi AI assistant can now be opened and chatted with on the live demo machine. A tester's browser check (open it, say hi, nothing happens) led to a full investigation that found the assistant had never actually answered on a deployed machine before, only its parts had been checked. Three separate setup problems were found and fixed: the assistant announced its web address in a form the machine's router did not recognise, so its page would not load; the assistant looked for some of its own setup files at a location that only exists on the build computer; and it was configured with a model-provider option the assistant does not support, so it never connected to the AI service. With those fixed it now opens, takes a message, and returns a real streamed answer, whether or not the user picks a model first. The app launcher and installer fixes are shipped from the stable branch; the assistant's own two code fixes are ready on a branch and filed for the maintainer to merge, with one remaining follow-up to bundle the assistant's setup files inside its program. Next is the full-stack tester with the documentation library and grounded search, then the welcome-email step.

§0 Current state

s201 (2026-06-04): Proved the AI assistant operates the demo apps on a live deployed machine and supported a live demo. On the demo machine the assistant, on the newest Kimi model, read real content from the planner, whiteboard and slides through its tool connection and created an item in the planner, confirming read and write across all three. A live demo had the assistant not answering because its model key held a bad value, so we set a working key, switched to the newest model and added the slides app to its tools, after which it worked. Both release builds a fresh machine needs are published and verified. Remaining gap for new machines: the admin installer is an older build that writes an unsupported provider option and omits the slides app and the newest model, so a freshly created machine does not yet get a working assistant, confirmed on a fresh test machine. Filed a follow up to make the assistant fast on its first action and to close the installer gaps. Next: update the installer so a fresh machine gets a working, app driving assistant with no manual fixing.

s199 (2026-06-04): The Kimi assistant can now be opened and used on the live demo machine, proven end to end. A tester opened it, typed a message, and nothing happened, so we investigated and found the assistant had in fact never answered on a real deployed machine, only its parts had been checked. Three separate problems were found and fixed. First, the assistant's web page would not load through the machine's router because the assistant announced its address in a form the router did not recognise; we aligned it to the standard form and the page now loads. Second, the assistant looked for some of its own setup files at a location that only exists on the build computer, so it stopped the moment someone sent a message; we placed those files where it looks for now and noted the proper fix is to bundle them inside the program. Third, the assistant was set up with a model-provider option it does not support, so it never connected; we switched it to the supported option (which speaks the standard AI interface and so works with the chosen provider) and let it read the AI key from the machine's environment so the key stays out of any file. After these changes the assistant opens, accepts a message, and streams back a real answer, with or without picking a model first. The app launcher now lists the assistant, and saving an AI key in the cockpit now also stores it under the name the assistant reads. The launcher and installer fixes are shipped from the stable branch; the assistant's own two code fixes are ready on a branch and filed for the maintainer to merge.

s198 (2026-06-04): The Kimi AI assistant was added to the standard tester install and wired so it can operate the planner and whiteboard for the user, proven on the live demo machine. Installing a tester now also brings up the assistant, writes its configuration, and points it at the planner and whiteboard through the machine's own local router, which exposes each app's actions as tools the assistant can use. On the live machine the assistant's tool list returned the full set of planner and whiteboard actions, and a test call created a workspace in each, so the assistant can genuinely drive both apps. The assistant uses the tester's own AI key, and we confirmed the key authenticates. Voice was checked and still works on the machine. We also started moving the voice engine to publish from the stable branch, but its build is currently failing for a routine dependency reason shared by a few other components, so finishing the voice move (the engine build fix plus a catch-up update on the voice app) is queued for next session. We corrected the plan too: the email step will use Resend, and onboarding will create fresh accounts rather than inviting existing ones, since a fresh account only has the access we grant it. Next: the full-stack tester with documentation libraries, the knowledge store and grounded search.

s196 (2026-06-03): Scoped the work to exactly the meeting demo and moved the first components onto the stable branch. We confirmed the deliverable is the meeting demo and nothing more, dropped everything else from the tracker, and (team offline) did the component changes ourselves, each being a one-file publish switch or the same small library update already proven on the proxy and base components. The router, planner and slides components now publish from the stable branch (builds green) and default to it; the whiteboard and the assistant got the same default switch as they were already publishing from stable; the supervisor was refreshed to publish from stable. Result: every component the first demo needs is on the stable branch and publishing, which unblocks the live bring-up on the dedicated node. We also brought the assistant's code repositories down locally for the small fixes coming next.

s195 (2026-06-03): Moved the three deployment components we own onto the stable branch so the demo deploys from stable, not the in-flux development branch. The grid deployer, the cockpit, and the demo-script repository are now on the stable branch by default and publishing their stable releases from it, all verified. The deployer was made self-contained: the installer script sent to each new tester machine is now built into the deployer instead of fetched from a separate repository (that repository also moved to a different team area). The cockpit could not build against the stable shared libraries until a teammate pointed the shared web-proxy's stable branch at them; we then completed a small migration to the libraries' relocated helper functions and it built clean and published. We opened a coordination tracker for the wider move. What remains for the whole demo to run from stable is the team's app and engine components doing the same small migration (one engine has no release yet), each tracked there. Next milestone: a full live bring-up from the stable branch on the dedicated node once enough components have moved, proving people can test there.

s194 (2026-06-01): Attempted to retire the old test accounts and their machines from the admin screen, and hit a grid-side block, then stopped early for another priority. The first delete error was on our side: the admin machine's grid-control service held a dead long-lived connection to the grid, so we restarted that service and confirmed it came back clean and could read the machine list again. The retry failed for a different reason outside our stack: cancelling the on-grid contracts is rejected with a 502 gateway error, reproducible in a fresh private browser window. Listing machines (a read from the chain) works, but deleting (which must reach the actual grid node to tear the machine down) does not, so the grid node or its relay looks unreachable. All three test accounts and machines were left running and untouched, with their contract cancellations stuck pending. Next: retry the delete when the grid node and relay are healthy, and if it stays stuck cancel the contracts directly on the chain, then complete last session's parked review-and-merge of the search fix and rotate the test access tokens.

s190 (2026-06-01): The changes held back for a maintainer go-ahead last session were merged, and the talk-to-it voice bar is now live on every app's screen, proven on the live test user. With the go-ahead, six held changes were merged: the safe, sandbox-only way for the assistant to view pages behind the login gate, plus the voice bar on five more app interfaces. The login bypass was deployed and confirmed on the test machine. A request carrying the sandbox secret is shown the page as the machine's own user, while a request with no secret, or the wrong secret, is still sent to the login screen. The five updated app screens were built and deployed, and the voice bar was confirmed showing on all seven app screens. Along the way we found and fixed a real bug carried in from last session: for one app (the whiteboard) the voice bar had been added to a page template the app does not actually use, so it never appeared; we moved it to the real page and confirmed it now shows. One caveat for next time: the automated release build for the memory app is failing for a separate, pre-existing reason, so a brand new machine could receive an older memory screen until that is fixed (the live test machine already runs the correct build, since we deployed it directly). Next: prove a freshly provisioned tester comes up with the memory and voice screens already installed and running, remove an outdated leftover voice binary so installs pick the correct one, and rotate the test access tokens while tidying the existing test user.

s189 (2026-05-31): Voice now works end to end for a tester, and the three maintainer decisions from last session were all resolved. The Memory app's screen was already published; it was simply not installed on the hand-maintained test machine, so we installed and started it and it now opens. The assistant gained a safe, off-by-default way to view pages behind the login gate on a sandbox machine, locked to a secret only sandbox machines hold (an adversarial review caught and corrected an unsafe first design that relied on which network port a request used). The talk-to-it voice bar was added to every app's interface. Most importantly, speech-to-text was failing while read-aloud and the rewrite tool worked; we traced it to two causes and fixed both: the test machine ran an outdated, wrong build of the voice service, and the shared voice engine rejected microphone audio because it only accepted one specific audio quality and did not convert other rates. We installed the correct build plus its missing audio library, and changed the engine to convert any incoming audio to the quality it needs; a full round trip now transcribes correctly on the live machine. The engine fix, the audio-library install, and the runbook notes were merged, and a tracking issue was filed. The login-bypass and the per-app voice-bar changes are committed and pushed but wait for a maintainer go-ahead before merging. Next: merge those waiting changes, deploy the updated app interfaces, prove a freshly provisioned tester comes up with all of this automatically, and remove an outdated leftover binary so installs pick the correct one.

s188 (2026-05-31): The save-once answer cache is now published to all four library repositories and loads automatically. We generated each library's question-and-answer set once and committed it into the public repository, so every future tester gets it for free, and the Books service now replays that committed cache on startup at no model cost, so a brand new machine has grounded, searchable answers immediately with no manual step (proven on the live test user: about eight and a half thousand answer pairs loaded in roughly thirteen seconds with zero model calls). We also fixed the Ask the Librarian AI summary, which had failed for testers who only set an OpenRouter key, so it now uses whichever AI provider key the tester configured. Separately the cockpit gained an Apps page: a simple app-store style launcher that shows installed apps as tiles, opens a running one in one click, marks a stopped one with a hint to start it from Services, and hides apps that are not installed. Parked for a maintainer decision: whether the Memory app should be hidden or have its interface published, a safe way for the assistant to verify pages behind the login gate, and making sure a fresh tester starts with all user apps already running.

s187 (2026-05-31): The save-once answer cache is built, merged, and proven working on the live test user. The expensive question-and-answer generation can now be done one time by a maintainer and saved into each library as a small portable file, then reused for free by every other machine, which only re-embeds locally at no model cost. Two switches control it, both off by default and easy to remove: one that saves the generated answers back into the library, and one that reuses a saved cache instead of regenerating. Nothing in the memory service changed; the cache sits on top of it. We proved both halves on the live test user against its real Hero OS guide library: turning on the save switch wrote a cache file for every page with the real questions and answers, and turning on the reuse switch rebuilt the full set of answers from those files with no model calls at all, after which asking what Hero OS is returned correct, grounded results just as the paid path does. The saved files are committed with a safe, non-personal author identity because the libraries are public. The remaining steps all publish into the four public libraries and so wait for a go-ahead: do the one-time generation and publish for the other three libraries, add the small browsable-book definitions for those three, and wire the voice widget so a user can ask by speaking.

s186 (2026-05-31): The fix that makes a new user's documentation libraries work was shipped to the admin machine, and the reason a fresh machine showed no books was found and fixed. First, the change that pre-loads the four default libraries was deployed and confirmed: reinstalling the test user's machine now sets the default library list and clones all four libraries automatically. Second, we found why the library service never started on a fresh machine. It was set to depend on the AI-provider broker, which refuses to start until an AI key is present, so on a brand-new machine with no key yet the whole chain failed and the install errored. Since that broker is parked and the library service reaches AI through the memory service instead, we removed the dependency, and the library service now starts cleanly on a key-less fresh machine. Third, we fixed why the library web page showed no books: a fresh machine never turned its cloned libraries into browsable books because of a start-up ordering gap. We added a step that builds each library's books from the small definition the library carries, at no AI cost, and proved it live: the test user now shows the Hero OS Guide as a seven-page browsable book. We also sent the upstream team a concrete, easy-to-switch-off plan to restore saving the generated answers back into each library so the cost is paid once and never repeated. Next: do the same one-time book setup for the other three libraries, build the save-once answer cache behind an off switch, publish the answers into the four libraries, and wire the voice widget so a user can ask by speaking.

s185 (2026-05-30): A brand new test user was created and provisioned from scratch to confirm onboarding works for a fresh user, not just the hand maintained existing one. The grid provisioning error from earlier sessions was diagnosed and cleared (a wrong machine identifier was being passed; the correct one was already configured), so the new user's machine was created, the full stack installed automatically, and it reached the ready state. The core shared engine wiring is now proven on a genuinely fresh user: the machine came up pointed at both the shared embedding and voice engines with its own access tokens, the engines recognised it, and its dashboard is reachable. Two gaps that only a fresh install reveals were found and are being carried as fixes: the documentation library service does not start automatically and its default libraries are not pre loaded on a new machine, and one memory sub service failed to start. The existing test user was left in place as the known good reference until the fresh path is fully green. Next: deploy the already merged change that pre loads the default libraries, get the library service starting automatically on a fresh machine, fix the failed memory sub service, rotate the test access token, then retire the old test user.

s184 (2026-05-30): The documentation-library experience is proven end to end on the live test user, and the app list in the tester dashboard now matches how services really work. We refreshed the Hero OS guide content to today's design, then loaded all four default public libraries (the Hero guide plus the public Geomind, OurWorld, and Mycelium docs) on the test user and confirmed that asking a question scoped to each library returns correct, grounded answers. As designed, you ask a question inside one library (or one book), never across all libraries at once. The embedding work is done for free by the shared engine on the admin machine; only the one-time question-and-answer generation uses a paid model. We also reworked the dashboard's service list so each app is installed as a whole (one Install button brings up its background service, its web page, and its admin page together) and, once installed, shows its parts as separate manageable rows; the deployment runbook was updated to match. Two important gaps were found and written up for the upstream team to decide on, because a recent rework removed them: the generated question-and-answer data is no longer saved back to the library and published, so today every new machine would redo that paid generation from scratch, and the library web page shows no books because the step that turns a cloned library into browsable books was also removed. Separately, the shared build tool had stopped building across all components after an upstream rename; we traced and fixed it so released builds work again. Next: a full walk on a freshly provisioned tester to confirm it all comes up automatically (a grid issue still blocks fresh provisioning), rotate the test access token, tidy the existing test user, and act on the team's decision about restoring the publish-once library data.

s183 (2026-05-29): Worked the follow-up list from last session. First, the automated release builds: audited every component's latest publish run and fixed the ones still failing, so released binaries refresh again. The failures fell into two kinds: build filters left pointing at component or package names that earlier restructuring had renamed or removed (including the assistant, which was still linked to a retired embedding component and was rerouted to use the current memory service for its tool search), and a couple of components that did not pin their dependency versions, so the build picked up a moving upstream piece and broke; those now pin their versions like the others. Five components were fixed, merged, and confirmed publishing successfully again. Two unrelated ones are left for follow-up (one tangled in cross-project version mismatches, one a brand-new example owned by another author) and are noted on the build follow-up issue. Second, the documentation library service now starts cleanly as a managed service for testers: it had been told to place its socket under the administrator account's home folder, which the unprivileged service user cannot write to, so managed startup failed while a hand-started copy worked; corrected to use the per-user runtime location like the other services, and proven running on the live test user. Next: refresh the documentation library content to match today's architecture and re-prove incremental re-indexing, pre-load every new tester with the default public libraries, then the full walk on a freshly provisioned tester.

s182 (2026-05-29): The written book summary now works end to end on a test user, the goal of this phase. The library service was aligned to the memory service current workspace model (it had been omitting the workspace label, which is why an earlier search returned nothing), then proven live on the existing test user: ingesting a document produced question and answer pairs embedded through the shared engine on the admin machine, and asking what Hero OS is returned three correct, grounded results where before it returned nothing. While doing this we found the automated release build was broken across nearly every component, because a recent change to the build tool moved where it installs and the publish workflows still looked in the old location, so the build failed immediately and stopped refreshing the released binaries. We fixed the release workflows across about forty components and removed a dangerous cleanup step that would have erased already installed programs, and confirmed the fix by watching several components publish successfully again. A few components still fail later in their own build for unrelated reasons and are filed to fix. New follow ups filed: audit and fix the remaining build failures, refresh the documentation library to match today architecture, make the library service start cleanly as a managed service, and pre load every new tester library with the default public libraries. Next: work that list in order, then finish the full walk on a freshly provisioned tester (a grid issue blocked a fresh provision this session, so the proof ran on the existing test user).

s181 (2026-05-29): Phase 1 E3 is shipped and merged, and the shared engines are now consolidated and live. The deployer that wires each new tester to the shared services was generalized from the embedding engine alone to both engines (embedding and voice) in one loop: when a tester is provisioned it now issues that tester one access token, registers it for both engines on the admin machine, and points the tester's clients at both engines over the private network; when the tester is deleted the token is revoked for both engines (previously only one, which could have left a voice token live after deletion). On the admin machine the older duplicate embedding and memory services were stopped and removed, and the voice engine was switched from a hand-started process to a properly managed service that starts automatically. Both engines' security checks were re-confirmed live after the changes: no token refused, a valid token accepted with a real result, a wrong token refused, and a token used under another tester's identity refused, including real synthesized speech returned through the voice engine. The library-service alignment flagged last session was filed as its own task (the library service still omits the workspace label the memory service now requires). Next: align the library service, then a full end-to-end walk where a freshly provisioned tester gets a grounded summary of one of its own books (confirming both engines serve it and the security checks hold) plus retiring the last stale settings on the existing tester.

s180 (2026-05-29): Phase 1 E2 (the shared voice engine) is shipped, merged, and proven live. The voice service now runs once on the admin machine and serves many testers over the private network with the same per-tester security as the embedding engine: every call carries that tester's own access token and identity, and the engine refuses any call with no token, a wrong token, or a token used under another tester's identity, and will not start its private listener if it cannot reach the secret store. One per-tester token now works for both the embedding and voice engines. Each tester's voice client sends speech to the shared engine when its endpoint is configured and falls back to a local engine otherwise, with the token never written to logs. Proven live on the test grid: the voice engine's security check passed every case (no token refused, valid accepted with real audio returned, wrong token refused, wrong identity refused), and from a separate tester machine the engine returned real synthesized speech across the private network using only that tester's token. Both code changes were merged. The voice engine was started by hand for the test and then stopped, and the next step makes it start automatically. While preparing the written book-summary demo we found the library service still talks to the older memory interface and does not send the workspace label the memory service now requires, which is why an earlier search returned nothing, so that needs a small alignment in the library service before the written summary works. Next: wire each new tester to both engines automatically at provision time, register the voice engine as a managed service, retire the older duplicate services, then the library-service alignment and the end-to-end written and spoken book summary.

s179 (2026-05-29): Phase 1 E1 is shipped, merged, and proven live. The shared embedding engine now runs on the admin machine with per-tester security: every call carries that tester's own access token and identity, and the engine refuses any call with no token, a wrong token, or a token used under another tester's identity, and will not start its private network listener at all if it cannot reach the secret store. Each tester's library service was installed on the test machine and pointed at the engine; it embedded documents through the engine across the private network using its own token, and the security check passed every case (no token refused, valid accepted with a real result, wrong token refused, wrong identity refused). Both code changes were merged. The AI broker stays parked; AI access goes through the direct client in the shared library plus each tester's own provider key. Next: the same shared model for the voice engine, then automatic wiring of each new tester at provision time, then the end to end demo where the assistant summarizes one of the tester's own books.

s178 (2026-05-29): Phase 1 was re-based onto the dedicated provider services the upstream team now ships, after syncing with the latest development showed the embedder, voice, and memory pieces had changed underneath the earlier plan. The design is unchanged in spirit (shared engines on the admin VM, each tester's own data and apps on their own VM); what changed is the building blocks. The embedding engine is now hero_embedder_provider (an OpenAI-compatible service) with hero_memory as the per-tester entry point that uses it; the voice engine is now hero_voice_provider, which runs once on the host; and each tester runs only the light clients (memory, books, the assistant, the voice widget) that reach the engines over the private network. Per-user security is now required from the start: every tester call to an engine carries that tester's own token and identity, and an engine refuses any call without a valid matching token, so no tester can act under another tester's identity. Combined with each tester's data staying on their own VM, isolation is airtight. The shared AI broker is deferred for now (its status is uncertain after recent changes); each tester's assistant uses its own provider key, and we validate the assistant by having it summarize one of the tester's own books. The full re-scope is in the comment below. No code or live infrastructure was changed this session; it was a planning and documentation realignment.

s177 (2026-05-29): Phase 1d (shared voice) is code-complete. The voice service now has an authenticated private-network listener, so one voice engine on the admin VM can serve many tester VMs for speech-to-text and text-to-speech, with a per-tester access token and tenant identity check, instead of every tester loading the large speech models locally (the same shared-hub pattern already proven for the embedding service). Testers that present no token, a wrong token, or a token under another tester's identity are refused; the listener will not even start if it cannot reach the secret store to validate tokens. The consumer side falls back to a local engine automatically when no shared endpoint is configured, so single-machine and self-hosted deployments are unaffected. The two voice hub binaries build and pass linting cleanly and the change is committed on a feature branch, but it is not merged yet: the voice repository as a whole currently does not compile because an unrelated, in-progress upstream refactor (moving the voice data API to async) is only half-landed, and the merge waits for that to finish. The work was isolated so the voice hub builds independently of that broken area. Remaining and deferred to the next session: wiring the deployer to hand each tester its voice token and endpoint at provision time, and the live end-to-end test on the sandbox grid (both gated on the upstream merge and the test VM). Phase 1b (welcome email) and 1c (shared AI broker) remain available to pick up independently if the voice merge stays blocked. No live infrastructure was changed this session.

s176 (2026-05-28): Phase 1a is now fully automated and live-proven end to end. The deployer wires a freshly provisioned tester VM to the shared embedding service automatically: when it provisions a tester it issues that tester a unique access token, registers the token on the admin VM, and configures the tester's assistant to use the shared embedder over the private network; when the tester VM is deleted the token is revoked. Verified on a throwaway tester created and torn down on the test grid: the four configuration values were set correctly, the assistant authenticated to the shared embedder and got a valid response, a wrong token and a missing token were both rejected, a token presented under a different tester's identity was rejected, and deletion removed the token. Shipped as one change in the deployer, merged. Next session: the welcome-email pipeline, or sharing the AI-provider broker the same way, reusing this pattern.

s175 (2026-05-28): Phase 1a hub-shared embedder is functionally proven end to end. A tester VM's assistant now uses the shared embedding service running on the admin VM over the private mycelium network, with per-tester token plus context authentication. Embeddings are computed by local models on the admin VM (no cloud embedding API), and a full document index plus semantic search works with correct ranking (a query about forgotten login credentials returns the password-reset document first). The assistant connects across VMs and answers using the tester's own AI provider key. Shipped in hero_embedder and hero_agent: the real embedding call now speaks the inference daemon's protocol, plus fixes for a build-dependency break and an environment-variable fallback bug that had been silently ignoring a pasted AI key. The embedder generation choice is recorded in a workspace-private decision. Next session automates the per-tester wiring in the deployer so a freshly provisioned tester VM comes up already connected to the shared embedder.

s173.5 (2026-05-28): Arc opened. home#238 (admin and tester UX) closed alongside the filing of this arc after the visible UX surface shipped across the admin path (deployer admin UI) and tester path (cockpit Services / Settings / Manual / About / Feedback pages, Bootstrap modals, dark-mode contrast, log_tail in install result modal, Manual completeness, Expose surface hidden in sandbox mode, connection-status dot fix in hero_admin_lib). About 25 commits across hero_cockpit, hero_os_tfgrid_deployer, hero_website_framework, hero_demo. D-36 minted (Resend supersedes SendGrid). hero_proxy#57 filed as a carried-independent item under this arc. Next session is s174 = Phase 1a welcome-email pipeline.

Context

home#238 (Phase 3: admin and tester UX) closes alongside the filing of this arc. That arc shipped the visible UX surface for both the admin path (the tfgrid_deployer admin UI: per-user VMs table, install state, provisioning) and the tester path (the cockpit: Services / Settings / Manual / About / Feedback pages, install-from-catalog flow, Bootstrap modals, dark-mode contrast).

This arc moves from "the UI is great" to "the sandbox actually works end-to-end as a tester would use it."

Scope of this arc

Hub-and-spoke architecture for sandbox (D-37)

A late-s173.5 architectural decision restructures Phase 1 around a hub-and-spoke model for resource-heavy stateless services. Admin VM runs shared hub services: hero_embedder, hero_voice STT/TTS, hero_aibroker. Tester VMs run only per-tenant spoke services and consume the hub services over mycelium IPv6 with per-tester bearer-token auth. Per-context isolation enforced at the API layer via X-Hero-Context header injection.

Rationale: 10 testers × 6 GB embedder = 60 GB wasted; 1 shared 6 GB embedder serving 10 testers = honest about sandbox economics. Sovereign-tier paid customers still get their own hub on their own admin VM (sandbox-only decision, no inheritance).

Decision locked at decisions/D-37-hub-shared-services-for-sandbox.md (workspace-private).

Phase 1: onboarding plumbing

1a (P0). Hub-shared hero_embedder PoC. Validate the hub-and-spoke pattern end-to-end with the simplest hub-eligible service. Server side: bind admin VM's hero_embedder_server HTTP listener to [admin_vm_mycelium_ipv6]:9988 in addition to whatever local UDS it has. Add bearer-token validation against a embedder/TENANT_TOKENS registry on the admin VM. Client side: every spoke service that calls hero_embedder (currently hero_indexer_server, hero_books_server) reads HERO_EMBEDDER_URL and SHARED_EMBEDDER_TOKEN from its cockpit/* hero_proc slots; if set, talk to the URL with the token; if empty, fall back to local embedder. Deployer side: handle_install_hero_stack sets the URL + token slots on the tester VM and skips installing hero_embedder on tester VMs (saves about 6 GB RAM per tester). Verify end-to-end on alice123: confirm hero_embedder is not installed, confirm an indexing call from hero_books_server reaches the admin VM's embedder, confirm round-trip latency is sub-100 ms.

1b. Welcome-email pipeline. Lift the EmailProvider trait and ResendProvider implementation from znzfreezone_backend/src/providers/email.rs (about 100 LOC, production-tested in the freezone workspace) into Hero. Port to ureq 3.x to match what shipped in hero_os_tfgrid_deployer/crates/hero_tfgrid_deployer_server/src/forge_oauth_admin.rs. Wire two send points: (a) deployer.create_user sends a welcome-and-how-to-register email to the operator-supplied address; (b) deployer.install_hero_stack sends a VM-is-ready email with the cockpit URL and initial password once install_state transitions to ready. The Resend API key lives in the hero_proc secret store at deployer/RESEND_API_KEY. Dev-mode fallback when the slot is empty: log to console and return dev-mode-no-send, matching the freezone reference pattern.

Prerequisite for 1b: an operator-owned domain verified with Resend (SPF + DKIM records). Default: noreply@hero.ourworld.tf. The workspace already controls ourworld.tf (Forge runs on forge.ourworld.tf); the hero. subdomain keeps Hero emails visually distinct from Forge platform emails. Non-usable noreply by default: no MX record on the from-address, replies bounce naturally; Resend only handles outbound. Operator can substitute (noreply@ourworld.tf or noreply@forge.ourworld.tf both work technically) at Phase 1b start.

1c. Hub-shared hero_aibroker. Tester pastes an AI provider key (Anthropic, OpenAI, OpenRouter, Groq, etc.) into the cockpit Settings page. The save handler propagates the key to the hub aibroker's per-tenant key registry at aibroker/TENANT_<context>_<provider>_KEY on the admin VM. Dependent services on the tester VM (hero_books_server, hero_agent_server) start once at install time and stay running; they call HERO_AIBROKER_URL for inference and the hub looks up the calling tester's key from the registry. This collapses the originally-scoped BYO-key auto-start cascade entirely: no service-start dependency chain to manage, no service restart on key paste, the AI calls just succeed once the registry is populated.

1d. Hub-shared hero_voice STT/TTS. Same pattern as 1a but for the voice models. Lower priority than 1a/1c because voice is not on the critical path for most Phase 2 app walks; can slot later in Phase 1 or even after Phase 2 if needed.

Phase 2: per-app functional verification walks

Walk every catalog service end-to-end on the live tester VM (currently alice123) as a real tester would. One pass / fail outcome per app, in dependency order so infrastructure issues surface first:

#	Service	Verification gate
1	`hero_db_server`	Writes accepted, reads return what was written
2	`hero_embedder` + `hero_indexer`	Document, embedding, searchable index round-trip
3	`hero_aibroker_server` (with BYO key set)	Provider connectivity, completion request returns text
4	`hero_agent`	Natural-language query lands on an answer grounded in available context
5	`hero_books`	Library renders, document indexes, ask-a-question returns grounded answer
6	`hero_slides`	Deck renders from markdown source, illustration regen returns new image
7	`hero_whiteboard`	Two-browser-tab live edit sync
8	`hero_planner` / `hero_biz`	Typed-record creation, persistence across restart, query-back
9	`hero_collab`	OnlyOffice editor loads, .docx round-trip edits
10	`hero_voice`	Mic input, transcript text
11	`hero_office` / `hero_foundry` / `hero_archipelagos`	Extended-catalog smoke tests (lower priority)

Output: one Forge issue per service that fails, with the failure mode and reproduction steps captured live during the walk.

Phase 3: close the gaps

Fix the per-app issues filed during Phase 2. Each app's issue gets its own ship cycle; some will be quick config drift, some will be deep product gaps in the app itself.

Carried-independent items (slot anywhere, not blocking arc closure)

hero_proxy#57: Bootstrap-styled SSO error page. About 2 to 3 hours of work. Visible only on auth failures so it can slip late.
hero_os_tfgrid_deployer#17: typed InstallManifest Rust crate that replaces hero_demo/deploy/single-vm/scripts/setup-binaries.sh. About 1 to 2 days of work. Best done after Phase 2 because the app walks may surface install-runner gaps that change the typed manifest shape.

Email-provider decision logged alongside this arc

The previously locked email provider choice (SendGrid via EmailSender trait) is superseded by Resend via EmailProvider trait, locked in a new workspace decision file. Rationale: the operator already has Resend credentials available, the freezone workspace has a production-tested reference implementation that can be ported, and Resend's dev-mode fallback pattern (log to console when no API key) is friendlier for testers who don't have email credentials yet.

Definition of arc closure

This arc closes when:

A new tester provisioned through the admin VM receives a welcome email at create_user time and a VM-is-ready email at install_state=ready time.
A tester who pastes an AI provider key into the cockpit Settings page sees the dependent services (hero_aibroker_server, hero_books_server, hero_agent_server) auto-start with visible progress, and can then click Install on hero_books_admin and have it succeed end-to-end.
Every catalog service has been walked through its Phase 2 verification gate, and every failure mode has either been fixed or filed as its own follow-up issue with a clear scope.

Rows
Columns

[META] Hero OS sandbox demo, functional readiness: onboarding pipeline + per-app verification #239