[META] Hero OS sandbox demo, functional readiness: onboarding pipeline + per-app verification #239
Labels
No labels
meeting-notes
meeting-transcript
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/home#239
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Current state (session 222, 2026-06-08): We shipped and proved the two quick reliability fixes for the shared host. First, a tester's web address is no longer just their username, so re-adding a tester never collides with a leftover network-name registration from a failed attempt; each new machine gets an address unique to that machine, and an operator can also type a custom address. Existing testers keep their current address. Second, the dashboard's free-capacity readout now reports how many testers actually fit based on real disk space rather than a raw slot count, so it no longer claims room that is not there. We found the current host was in fact full (it was wrongly showing room for two more), and the dashboard now correctly says it is full. Both fixes are on the stable branch and live on the admin machine, including updating the underlying capacity component there (which had been deliberately held back) after backing it up for instant rollback. We proved the whole flow by removing one test machine and rebuilding it from scratch: the freed space showed up, the rebuild got a fresh unique address, finished setup, and its login page worked. We also re-planned the order with the operator: close the internal sign-in gap on the private network next, then add the ability to run across more than one host (the unlock now that this host is full), and only then keep a small pool of ready machines (which needs spare room to prepare them). Next: gate the dashboard on the private network the same as the public address.
Current state (session 221, 2026-06-08): We fixed the bug that stopped every tester from logging in, and a tester can now sign in end to end for the first time. The login was failing at the final identity check for three stacked reasons, which we found by adding precise logging and capturing the real error rather than guessing. First, the login token's audience field arrives as a list but our code expected a single value and rejected it before any check. Second, the login token from our identity provider carries only the bare account id, not the username or email (those come from a separate profile lookup we were not making), so an account resolved to a numeric id that never matched the allow list and everyone was refused. Third, a restart could drop the setting that asks for the login token at all. We fixed all three, ran an independent security review that confirmed nothing was weakened, and proved it with a real scripted sign in that returned successfully with the correct username. The fix is on the stable branch and republished, and both login issues are now closed. We also closed a gap where new testers received only one of the two assistant keys: the setup only copies keys that are configured on the admin machine, and only the first was set, so we added the second on the admin machine (verified working), put it on all four existing testers, and updated the runbook so both must be set at admin setup and every future tester gets both. We rebuilt despiegk's machine cleanly with the login fix and both keys, leaving his account untouched. His machine and login gate are correct and reachable from inside the grid, but his public web link is still down on a grid network routing problem that is not our software (rebuilding the gateway did not help, and the machine is reachable from other machines), so we wrote it up for the ThreeFold team and left his machine running for them to inspect. We also re-planned the roadmap with the operator: make the single shared host rock solid first with the quick high value fixes, then verify each app, with running across more than one host added as the explicit next scaling step since the current host is nearly full. Next: the quick reliability fixes (unique gateway names and an honest capacity count), then a small pool of ready machines, then close the internal sign in gap and verify each app.
Current state (session 220, 2026-06-08): We rebuilt despiegk's test machine and fixed the bug that was stopping every new tester from logging in. The login bug: the gateway's security library needed a one line setup it was missing, so it crashed while verifying the login token and showed users an invalid session error; we fixed it, deployed it to a test machine, and confirmed the crash is gone. We also shipped three more fixes to the stable branch: the setup program now computes a machine's web address itself when the grid component fails to report it (so signups recover automatically), the build server is green again (it could not fetch a shared library that had been moved to a new location), and a shared code generator no longer clashes with itself during parallel builds. We proved the signup flow end to end by setting up a brand new test user, which came fully live in about nine minutes. despiegk's machine, services, and login gate are up, but its public web link is still waiting on a one off slow grid network route (not system wide; the new test user routed fine). We found and filed three more items: the dashboard can be reached without signing in over the internal grid network and should be gated like the public address, a capacity readout that overcounts free space, and a plan to make gateway names unique so a failed signup never blocks a retry. With the login crash fixed, signup now gets all the way to the final identity check and fails one step later, which is the first task next session (the gateway most likely was not told where to fetch the login server's verification key). Next: finish that last login step, give despiegk a fresh machine, then close the dashboard sign in gap, the capacity readout, and per app verification.
Current state (session 219, 2026-06-07): We fixed the urgent problem that was breaking every new tester setup, and proved a brand new tester now sets up cleanly entirely from the stable branch. The root cause was that our services were publishing their stable download from the work-in-progress branch instead of the stable branch, so a fresh machine pulled half-finished builds that no longer understood each other. We corrected the publishing rule (the stable branch publishes the stable release, the work branch publishes a separate pre-release) on the core build tool and two core services, which was enough to unblock setup, and we published the previously held setup-reliability fix to the stable branch and confirmed it on a real fresh machine. The voice service was the hardest part: its stable branch had been broken by a mistaken automated merge of the work branch into it, and with the owner's confirmation we undid that merge while keeping the recent voice work, then fixed a missing build credential so it can rebuild. We proved the whole thing end to end by setting up a throwaway tester entirely from stable builds: it reached ready, its login page worked, and the voice service answered. All eleven services the sandbox uses now build and publish from the stable branch. The four real test machines were left untouched, the throwaway was removed cleanly, and we confirmed no work-in-progress was pushed into any stable branch. Next: rebuild despiegk's machine now that setup works again (clear the leftover network-name registration, then re-create it), then check that each app actually works inside the cockpit after signing in.
Current state (session 218, 2026-06-07): The operator hit a real failure: adding the tester despiegk failed at the install step and their page showed an error. We traced it. The login gate on a brand new machine was not being set up, because the step that configures it reads a stored value that, on a fresh machine, comes back empty even though it is actually present (a version mismatch between two pieces of the tooling, not a mistake we introduced, which we confirmed by checking the history). We built a fix in the setup program that (1) waits for a brand new machine to become reachable on the private network instead of giving up after a few seconds, (2) sets the login gate up directly and reliably instead of depending on that racy step, and (3) retries the few service starts that can lose a start-up race, and we lengthened the overall setup time limit to cover the wait. The fix is written, fully checked by automated tests, and installed on the admin machine, but we are holding it back from the shared branch until we can prove it works on a real fresh setup. We could not prove it this session because two separate infrastructure problems got in the way. First, removing despiegk's old machine left a leftover network-name registration on the chain that blocks rebuilding his machine under the same name. Second, and more urgent, a freshly published version of the core tooling can no longer start the services on a brand new machine at all, which is breaking every new setup right now regardless of our fix; we filed that as an urgent issue for the build owners. We also filed a longer-term idea to keep a few machines pre-built and ready so the slow and flaky parts happen ahead of time, and a request for an in-dashboard view of the setup log so failures are visible without logging in to the server. The four real test machines were left untouched and working. Next: get the urgent tooling problem fixed, then prove our setup fix on a fresh machine and merge it, then rebuild despiegk's machine.
Current state (session 217, 2026-06-07): This session was an investigation with no product code changed. We set out to find why setting up a brand new test machine is slow, and to confirm or rule out the assumption that downloading the programs is the main cost. It is not: downloading the entire set of programs takes only about half a minute of a roughly six minute setup, so the planned speed-up of copying programs from the main machine instead of downloading them would save only a minute or two and would add real complexity and upkeep. Most of the time goes elsewhere, on installing the underlying system packages and on a step by step, one program at a time install that spends more bookkeeping per program than the actual download. We also surfaced two problems that matter more than raw speed. First, a brand new machine is not reachable on the private network for several minutes after it is created, and the setup currently gives up after about a minute instead of waiting for it, which can make a fresh setup fail outright. Second, the free capacity readout can over report room because it counts machine slots without checking disk space. Cheaper speed wins are available with no new moving parts, mainly installing the programs in parallel and shortening some fixed waits. We wrote up the full measurements and three clear options with a recommendation, and parked the build choice for the operator, since the original plan's main assumption turned out to be wrong. The recommendation is to fix the reachability and pre-load the common system packages first, do the cheap parallel-install speed up next, and treat copying programs from the main machine as optional polish. Next: the operator picks the approach, then we build it; the request to also choose which version to deploy is workable but needs a few groundwork fixes first.
Current state (session 216, 2026-06-07): We put the assistant and voice onto both the tester screens and the operator admin screen, and rolled it out to all four test machines plus the admin machine. We reused the assistant widget a teammate already built, which has voice built in and can use tools, and added it to the top of every cockpit and the admin dashboard. It runs on the assistant key the operator already gives each tester, so a tester needs no extra setup, and we listed the assistant on the Apps page. We also made sure brand new machines pick it up automatically by fixing how the assistant component is published. One real limit surfaced and is worth flagging: the no-setup assistant can chat and listen but cannot yet take actions such as adding a tester by voice, because taking actions needs a capability the assistant engine does not offer on the no-setup option yet. We asked the component owner what it would take to add that, and the chat and voice version is live now. On the operator review of the work so far we also removed a confusing and broken field on the add a tester form (a machine slot that only ever had one valid value), and we re-ordered the plan from that feedback: the next big win is making a fresh install much faster and letting the operator choose which branch to deploy, with the library showcase and running across more machines coming after. Next: the faster install and branch choice.
Current state (session 215, 2026-06-07): We finished the dashboard controls for access and keys, so the administrator panel now covers that whole area. First, the team's support sign-in keys (the SSH keys we put on every tester machine so we can help and debug) can now be viewed, added, and removed straight from the Settings page instead of being set by hand on the server, and a change applies to the next machine with no restart. Second, each tester's page now has an Access and keys panel: the administrator can choose which extra accounts may sign in to that tester's screen (the administrators and the tester themselves are always allowed, so an edit can never lock anyone out), and can optionally give a single technical tester command line access by pasting their key (off by default, since a normal tester only uses the browser). Both settings are stored against the person, so they survive rebuilding the machine, and they take effect the next time the machine is set up; in this testing phase we simply rebuild a machine to apply them, which keeps things simple and avoids touching a running machine. We also corrected our own notes: setting the email service key from the dashboard was already done in the previous two sessions, so this session completed the remaining access and key items instead. Everything was tested, shipped to the stable branch, then installed and checked on the live admin machine, including proving that editing the access list can never remove the always allowed accounts, with the real test machines left untouched. Next: put the assistant and voice on both the tester screens and the admin screen, so a person can simply ask it to do things, with a clear confirm before anything that changes something or costs money.
Current state (session 214, 2026-06-06): We made the admin dashboard able to manage the assistant keys and the welcome email properly, and we made it easy to find. An administrator can now store a starting key for any of the supported assistant providers, and when adding a tester choose which of those keys to put on that person's machine, or none at all for someone who will bring their own. The welcome email can be turned on or off, both for a single tester and for the whole machine, and its wording (the subject, opening line, and sign off) can be edited from the dashboard, with the sign in details always added automatically so it can never be left broken; an administrator can also send themselves a test copy first. We also fixed a real usability gap: the place to do all this had no menu link and sat buried on the home page, so the dashboard now has a proper top menu (Overview, Users, Settings, Manual), the setup moved onto its own Settings page, the per tester key choices grey out any provider that has no key set with a link to go set it, and there is a Manual page explaining how to use the whole dashboard. All of this is live on the test admin machine. We also planned the next big step and confirmed it is mostly assembly rather than new building: putting the assistant and voice on both the tester screens and the admin screen, so a person can simply ask it to do things, with a clear confirm before it carries out anything that changes something or costs money. We wrote that up along with two smaller follow ups (dashboard polish, and managing the assistant key centrally). Next: start the assistant on the admin screen, with the ask before acting safety.
Current state (session 212, 2026-06-06): We turned the three separate steps for adding a tester into a single action. The dashboard now has one Add and set up button that creates or registers the person, builds their machine, and installs everything in one go, showing progress through each stage (adding, provisioning, installing, ready) and finishing with a ready to copy sign in link; a brand new account's one time password is shown right there in the flow. The old separate buttons stay on the person's page, so a step that fails can be retried on its own and you can still register someone now and build their machine later. We also made the system aware of how full the shared host is: before you add anyone, the form now shows how many more testers will fit on the current host, and if the host is full or offline it refuses to start rather than half building a machine and leaving a stuck contract behind. We proved the whole thing end to end on the live system by adding a throwaway test tester through the new button, watching it go all the way to ready (its address correctly required signing in and the welcome email was sent), seeing the free capacity count drop and then return as we removed it, and tearing it down cleanly, leaving the three real test machines untouched. Next: let the email service key be set from the dashboard, then dashboard controls for support access and keys.
Current state (session 211, 2026-06-06): We built and shipped the dashboard feature that answers the recurring question of which version each tester machine is running and whether a newer one is available. Whenever a machine is set up, the system now records the exact version of every app and service it installed, and the admin dashboard shows that per machine. A new Check for updates button compares what a machine is running against the latest published versions and lists which apps have a newer build waiting. Updating a machine is the existing reinstall, which we made always pull the newest versions instead of skipping ones already present. The operator chose to track every installed component, arranged so that future apps such as the books and memory features are picked up automatically once they join the standard set. We proved the whole flow end to end on a brand new throwaway test machine built from the stable branch: setting it up recorded its versions, the dashboard correctly reported it as up to date immediately after install, and we then removed it cleanly, leaving the three real test machines untouched. Existing machines, which predate this feature, simply show their version as unknown until their next setup. Next: a single add and set up action so onboarding a tester is one click, then letting the email service key be set from the dashboard.
Current state (session 210, 2026-06-05): We did a full end to end check of the welcome email on a brand new test machine, with no person checking an inbox, and everything worked. We created a fresh test account and machine from the stable branch and set it up, then confirmed from the machine itself that the welcome email really was sent through the email service (a genuine message id came back, not a no-send placeholder), that the one time password was stored when the account was created and then erased the moment the email went out, that the email's wording carries the username to sign in as, that password, and a link straight to the dashboard, and that the brand new machine's entire web address requires signing in on its own (every part redirects to the login, while only the health check stays open). We then removed the throwaway test machine and its account cleanly, leaving the three real test machines untouched and still protected. This proves the welcome email and the whole address sign in protection both reproduce on a brand new machine straight from the stable branch, with no manual step. We then looked into the next item, showing in the dashboard which build each machine is running together with a reliable one click update, and found it needs a decision first: there is no clean record today of which version a machine is running, so rather than guess we wrote up the options with a recommendation and parked it for the operator to choose which set of components counts as the build. No product code changed this session. Next: make that build visibility decision and build it, then allow setting the email service key from the dashboard.
Current state (session 209, 2026-06-05): We closed the security gap flagged last session: a person who is not signed in can no longer load any part of a tester machine's web address. The whole address now requires signing in (it redirects to the Forge login), and only the health check and the login callback stay open; before, the bare address and a couple of internal status pages were reachable without signing in, which exposed the machine's internal service list. We proved this live on the admin machine and on all three running test machines. We also improved the welcome email so it now states the exact username to sign in as, plus the one time password for a brand new account (or tells an existing person to use their own password), and links straight to the dashboard app rather than the bare address. And any account name now works for onboarding, including names with capitals or dashes, because the web address is formed safely from the name. All of this is real code on the main branch that publishes automatically, so a brand new machine reproduces it on its own; the only manual step is providing the email service key, which we have now written into the setup guide along with the new whole address sign in behaviour. Two items remain for next time: a full end to end check of the welcome email on a freshly created test machine, and adding dashboard visibility for which build each machine is running together with a reliable one click update, so updating a machine is no longer done by hand.
Current state (session 208, 2026-06-05): We built and shipped the welcome email and proved it works end to end on the live system. When a new tester's machine finishes setting itself up, the system now automatically emails that person their personal web address and how to sign in, sent from a Hero address on a new project domain. We tested it for real: a fresh test machine was created and set up, and the moment it was ready the email arrived in a normal Gmail inbox, not spam. To do this properly we settled the naming, the company is Lhumina and Hero is the product, bought a company domain, and verified a dedicated sending sub-address with the email service (and asked the separate freezone project to do the same for its own domain). While testing we found two things to fix next session. First and most important: a person who is not signed in can still load the bare web address of a tester's machine and see a blank landing page, even though the actual dashboard correctly requires signing in. The entire machine address should require signing in, showing nothing to a logged out visitor, so closing that gap is the next priority. Second, the email should always say which username to sign in as and include the one time password for a brand new account (or tell an existing person to use their own password), and link straight to the dashboard instead of the blank page. We also cleaned up several leftover test machines and stuck grid contracts by hand, which keeps hitting a known grid teardown problem already flagged for the grid maintainer. Next: lock the whole tester address behind sign in, then refine the welcome email's wording and link.
Current state (session 207, 2026-06-05): We built and shipped the simpler way to add a tester, and it is now live for newly created machines. A new tester no longer needs an SSH key at all: a normal tester only ever uses the dashboard apps in a browser and never opens a terminal, so we stopped asking for that technical credential, and a machine now gets only our own setup key plus the team's support keys. We can also register someone who already has an account instead of only ever creating a brand new one, so a colleague with an existing account just gets a machine and signs in with what they already have. We added a friendlier page for when someone signs in but has not been granted access, telling them which account they used and to ask the administrator, instead of a bare error. While building this we re-checked the security claim from last session and corrected it honestly: because the assistant on a tester machine can run commands, a determined tester could in principle read the shared AI keys we preload, so removing terminal access reduces but does not fully remove that exposure. The operator accepted this for now, since those keys are limited and are rotated after each demo, and we wrote it down for a proper later fix. Everything was tested and merged, the released builds refreshed successfully, and existing running machines were unaffected. Next: the welcome email that sends a new tester their address and first steps.
Current state (session 206, 2026-06-05): This session was planning plus a quick live fix, with no new product code shipped. We agreed how to make adding a tester much simpler for ordinary people. A tester will no longer need an SSH key, which is a technical credential most people do not have and do not understand, because that key is only for opening a terminal on the machine, something a normal tester never does, while signing in to the dashboard uses their ordinary account. We will also be able to reuse an existing account instead of only ever creating a new one, so a colleague who already has an account just gets a machine and signs in with what they already have. We confirmed this is safe: the sign-in only reads a person's basic profile, never their private projects, and because a tester has no terminal access they can never read the AI keys we preload for them. We wrote this up as the plan for the next session (#247) and pointed it at the exact parts of the code to change. Separately, we brought one existing test machine fully up to match the main demo machine: its assistant now uses the same fast model with live web search and working keys, and it can drive the planner, slides and whiteboard apps and use voice, just like the main one. We verified the assistant key works, the page is reachable behind sign-in, all three apps respond, the assistant program is the same fast build as the main machine, and voice is running; the only thing left unmatched is a cosmetic settings row, deliberately skipped. Next: build the simpler onboarding (no SSH key, reuse existing accounts, and a friendlier access-denied message), then the welcome email.
Current state (session 205, 2026-06-05): We made shared whiteboard links work for anyone, so a tester can send a link and an outside person can open and collaborate on that one board without needing an account, the same way a shared document works. We traced why it was not working: the whiteboard's sharing feature itself was fine, but the machine's sign-in wall was blocking every outside visitor, and it had been blocking them since before testers were ever put behind that wall, so shared links had only ever worked in local testing, never on a deployed machine. The fix lets a shared board link past the sign-in wall as an anonymous visitor, while the whiteboard itself re-checks the link's secret on every action and only ever allows that one board, read-only for a view link or editable for an edit link, and refuses everything else. We proved all of this on the live test machine from a signed-out browser: opening the link works, reading and editing the shared board works, live collaboration works, and trying to reach a different board, write with a view-only link, or use the page without the link are all correctly refused, while the rest of the machine still requires sign-in. We also added a box in the tester settings where each tester can paste their own assistant subscription key, and fixed a separate problem where a person's sign-in could fail if the gateway was restarted while they were partway through signing in. Two cleanups remain: rotate a tester access token and the shared assistant key, and confirm a brand new machine picks up the sharing setting automatically. Next: the welcome email that sends a new tester their address and first steps.
Current state (session 204, 2026-06-04): We finished the planned polish of the demo assistant on the live test machine, and a person checked all three user-facing results by hand. The assistant is now faster on its first action, and it can answer questions using live web search now that it points at the subscription search service, with the key kept off the disk. The voice feature works end to end: you can speak to the planning assistant and hear it answer back. We removed a leftover test item from the planner, and added a small fix to the admin screens so clicking a tester action (provision, install, reinstall, destroy or delete) now shows a spinner and a label instead of looking frozen. A person confirmed all three results in the browser and by microphone: web search, voice and the spinner all work. A few small follow-ups remain for next time: rotate the shared search key after the demo, save the assistant's web-search settings into the tester setup tool so a freshly built machine gets them automatically, and refresh a couple of background release builds so machines built later match the live one. Next: let people open shared board links without hitting the sign-in wall, and let each tester add their own assistant key.
Current state (session 203, 2026-06-04): While preparing for the demo we found that the first tester created by hand through the admin screens (add, provision, install) was reachable on the internet with its login protection turned off, while the other testers correctly required sign-in. We traced the cause: when a new tester web address is set up, the grid can report the step as failed even though the address was actually created, and the tester creation tool then treated the tester as having no address and skipped setting up its sign-in protection, so the install finished without it. We fixed the tool so creating a tester now repairs a missing address and sign-in protection automatically and, most importantly, refuses to finish unless sign-in protection is in place, so a tester can never again be published without it. The admin screen also gained a one-click button to set up a missing address. We deployed this to the admin machine, repaired the exposed tester in place without deleting it so it now requires sign-in like the others, and created a brand new throwaway tester from scratch to confirm the whole flow end to end (create, set up, install, sign-in required, then delete) before removing it. All three real testers now require sign-in, each with its own separate login credentials. We also logged the underlying grid behaviour for the maintainers to fix at the source, and noted a later idea to add live video meetings. Next: resume the planned assistant polish (search, speed, voice).
Current state (session 201, 2026-06-04): We proved the AI assistant can actually operate the demo apps on a live deployed machine, and we supported a live demo. On the demo machine the assistant, running on the newest Kimi model, read real content from the planner, the whiteboard and the slides app through its tool connection, and it created a planning item in the planner, so it both reads and writes across all three apps. During the live demo the assistant was not answering because its model key held a bad value, so we set a working key, moved it to the newest model and added the slides app to its tool set, after which it worked. We also confirmed the two release builds a fresh machine needs are published and correct. One gap remains for brand new machines: the installer on the admin machine is an older build that writes the assistant configuration with an unsupported provider option and does not include the slides app or the newest model, so a freshly created machine does not yet get a working assistant. A fresh test machine was created and shows exactly that gap. We filed a follow up (home#249) to make the assistant fast on its first action and to close these installer gaps. Next, in order: (1) update the installer so a brand new machine gets a working, app driving assistant with no manual fixing (home#244); (2) the welcome email (home#236); (3) confirm every app works standalone, by voice and driven by the assistant (home#248); (4) the documentation library and memory stack (home#246).
Current state (session 200, 2026-06-04): The Kimi assistant now fully works on a deployed tester and the gateway chat-connection bug is fixed, both merged to the stable branch and proven live. A browser check found the assistant page rendered unstyled at the address without a trailing slash, and chat hung on "Reconnecting". Both were fixed properly: the gateway now forwards the live chat connection for tester domains (previously only one of its two routing paths did, so every tester domain's chat reconnect-looped), and the assistant now embeds its own setup files inside its program and recreates them on the machine at startup, so a fresh install needs no manual staging (this was the remaining follow-up from the previous session). Next, in order: (1) confirm a brand-new tester provisioned from the stable release comes up working with no hand-fixing (home#244); (2) add the welcome email so testers can be onboarded automatically (home#236, now using Resend); (3) confirm every demo app works standalone, by voice, and driven by the assistant (home#248); (4) the documentation-library and memory Ask-the-Librarian stack (home#246).
Current state (s199): The Kimi AI assistant can now be opened and chatted with on the live demo machine. A tester's browser check (open it, say hi, nothing happens) led to a full investigation that found the assistant had never actually answered on a deployed machine before, only its parts had been checked. Three separate setup problems were found and fixed: the assistant announced its web address in a form the machine's router did not recognise, so its page would not load; the assistant looked for some of its own setup files at a location that only exists on the build computer; and it was configured with a model-provider option the assistant does not support, so it never connected to the AI service. With those fixed it now opens, takes a message, and returns a real streamed answer, whether or not the user picks a model first. The app launcher and installer fixes are shipped from the stable branch; the assistant's own two code fixes are ready on a branch and filed for the maintainer to merge, with one remaining follow-up to bundle the assistant's setup files inside its program. Next is the full-stack tester with the documentation library and grounded search, then the welcome-email step.
§0 Current state
s201 (2026-06-04): Proved the AI assistant operates the demo apps on a live deployed machine and supported a live demo. On the demo machine the assistant, on the newest Kimi model, read real content from the planner, whiteboard and slides through its tool connection and created an item in the planner, confirming read and write across all three. A live demo had the assistant not answering because its model key held a bad value, so we set a working key, switched to the newest model and added the slides app to its tools, after which it worked. Both release builds a fresh machine needs are published and verified. Remaining gap for new machines: the admin installer is an older build that writes an unsupported provider option and omits the slides app and the newest model, so a freshly created machine does not yet get a working assistant, confirmed on a fresh test machine. Filed a follow up to make the assistant fast on its first action and to close the installer gaps. Next: update the installer so a fresh machine gets a working, app driving assistant with no manual fixing.
s199 (2026-06-04): The Kimi assistant can now be opened and used on the live demo machine, proven end to end. A tester opened it, typed a message, and nothing happened, so we investigated and found the assistant had in fact never answered on a real deployed machine, only its parts had been checked. Three separate problems were found and fixed. First, the assistant's web page would not load through the machine's router because the assistant announced its address in a form the router did not recognise; we aligned it to the standard form and the page now loads. Second, the assistant looked for some of its own setup files at a location that only exists on the build computer, so it stopped the moment someone sent a message; we placed those files where it looks for now and noted the proper fix is to bundle them inside the program. Third, the assistant was set up with a model-provider option it does not support, so it never connected; we switched it to the supported option (which speaks the standard AI interface and so works with the chosen provider) and let it read the AI key from the machine's environment so the key stays out of any file. After these changes the assistant opens, accepts a message, and streams back a real answer, with or without picking a model first. The app launcher now lists the assistant, and saving an AI key in the cockpit now also stores it under the name the assistant reads. The launcher and installer fixes are shipped from the stable branch; the assistant's own two code fixes are ready on a branch and filed for the maintainer to merge.
s198 (2026-06-04): The Kimi AI assistant was added to the standard tester install and wired so it can operate the planner and whiteboard for the user, proven on the live demo machine. Installing a tester now also brings up the assistant, writes its configuration, and points it at the planner and whiteboard through the machine's own local router, which exposes each app's actions as tools the assistant can use. On the live machine the assistant's tool list returned the full set of planner and whiteboard actions, and a test call created a workspace in each, so the assistant can genuinely drive both apps. The assistant uses the tester's own AI key, and we confirmed the key authenticates. Voice was checked and still works on the machine. We also started moving the voice engine to publish from the stable branch, but its build is currently failing for a routine dependency reason shared by a few other components, so finishing the voice move (the engine build fix plus a catch-up update on the voice app) is queued for next session. We corrected the plan too: the email step will use Resend, and onboarding will create fresh accounts rather than inviting existing ones, since a fresh account only has the access we grant it. Next: the full-stack tester with documentation libraries, the knowledge store and grounded search.
s196 (2026-06-03): Scoped the work to exactly the meeting demo and moved the first components onto the stable branch. We confirmed the deliverable is the meeting demo and nothing more, dropped everything else from the tracker, and (team offline) did the component changes ourselves, each being a one-file publish switch or the same small library update already proven on the proxy and base components. The router, planner and slides components now publish from the stable branch (builds green) and default to it; the whiteboard and the assistant got the same default switch as they were already publishing from stable; the supervisor was refreshed to publish from stable. Result: every component the first demo needs is on the stable branch and publishing, which unblocks the live bring-up on the dedicated node. We also brought the assistant's code repositories down locally for the small fixes coming next.
s195 (2026-06-03): Moved the three deployment components we own onto the stable branch so the demo deploys from stable, not the in-flux development branch. The grid deployer, the cockpit, and the demo-script repository are now on the stable branch by default and publishing their stable releases from it, all verified. The deployer was made self-contained: the installer script sent to each new tester machine is now built into the deployer instead of fetched from a separate repository (that repository also moved to a different team area). The cockpit could not build against the stable shared libraries until a teammate pointed the shared web-proxy's stable branch at them; we then completed a small migration to the libraries' relocated helper functions and it built clean and published. We opened a coordination tracker for the wider move. What remains for the whole demo to run from stable is the team's app and engine components doing the same small migration (one engine has no release yet), each tracked there. Next milestone: a full live bring-up from the stable branch on the dedicated node once enough components have moved, proving people can test there.
s194 (2026-06-01): Attempted to retire the old test accounts and their machines from the admin screen, and hit a grid-side block, then stopped early for another priority. The first delete error was on our side: the admin machine's grid-control service held a dead long-lived connection to the grid, so we restarted that service and confirmed it came back clean and could read the machine list again. The retry failed for a different reason outside our stack: cancelling the on-grid contracts is rejected with a 502 gateway error, reproducible in a fresh private browser window. Listing machines (a read from the chain) works, but deleting (which must reach the actual grid node to tear the machine down) does not, so the grid node or its relay looks unreachable. All three test accounts and machines were left running and untouched, with their contract cancellations stuck pending. Next: retry the delete when the grid node and relay are healthy, and if it stays stuck cancel the contracts directly on the chain, then complete last session's parked review-and-merge of the search fix and rotate the test access tokens.
s190 (2026-06-01): The changes held back for a maintainer go-ahead last session were merged, and the talk-to-it voice bar is now live on every app's screen, proven on the live test user. With the go-ahead, six held changes were merged: the safe, sandbox-only way for the assistant to view pages behind the login gate, plus the voice bar on five more app interfaces. The login bypass was deployed and confirmed on the test machine. A request carrying the sandbox secret is shown the page as the machine's own user, while a request with no secret, or the wrong secret, is still sent to the login screen. The five updated app screens were built and deployed, and the voice bar was confirmed showing on all seven app screens. Along the way we found and fixed a real bug carried in from last session: for one app (the whiteboard) the voice bar had been added to a page template the app does not actually use, so it never appeared; we moved it to the real page and confirmed it now shows. One caveat for next time: the automated release build for the memory app is failing for a separate, pre-existing reason, so a brand new machine could receive an older memory screen until that is fixed (the live test machine already runs the correct build, since we deployed it directly). Next: prove a freshly provisioned tester comes up with the memory and voice screens already installed and running, remove an outdated leftover voice binary so installs pick the correct one, and rotate the test access tokens while tidying the existing test user.
s189 (2026-05-31): Voice now works end to end for a tester, and the three maintainer decisions from last session were all resolved. The Memory app's screen was already published; it was simply not installed on the hand-maintained test machine, so we installed and started it and it now opens. The assistant gained a safe, off-by-default way to view pages behind the login gate on a sandbox machine, locked to a secret only sandbox machines hold (an adversarial review caught and corrected an unsafe first design that relied on which network port a request used). The talk-to-it voice bar was added to every app's interface. Most importantly, speech-to-text was failing while read-aloud and the rewrite tool worked; we traced it to two causes and fixed both: the test machine ran an outdated, wrong build of the voice service, and the shared voice engine rejected microphone audio because it only accepted one specific audio quality and did not convert other rates. We installed the correct build plus its missing audio library, and changed the engine to convert any incoming audio to the quality it needs; a full round trip now transcribes correctly on the live machine. The engine fix, the audio-library install, and the runbook notes were merged, and a tracking issue was filed. The login-bypass and the per-app voice-bar changes are committed and pushed but wait for a maintainer go-ahead before merging. Next: merge those waiting changes, deploy the updated app interfaces, prove a freshly provisioned tester comes up with all of this automatically, and remove an outdated leftover binary so installs pick the correct one.
s188 (2026-05-31): The save-once answer cache is now published to all four library repositories and loads automatically. We generated each library's question-and-answer set once and committed it into the public repository, so every future tester gets it for free, and the Books service now replays that committed cache on startup at no model cost, so a brand new machine has grounded, searchable answers immediately with no manual step (proven on the live test user: about eight and a half thousand answer pairs loaded in roughly thirteen seconds with zero model calls). We also fixed the Ask the Librarian AI summary, which had failed for testers who only set an OpenRouter key, so it now uses whichever AI provider key the tester configured. Separately the cockpit gained an Apps page: a simple app-store style launcher that shows installed apps as tiles, opens a running one in one click, marks a stopped one with a hint to start it from Services, and hides apps that are not installed. Parked for a maintainer decision: whether the Memory app should be hidden or have its interface published, a safe way for the assistant to verify pages behind the login gate, and making sure a fresh tester starts with all user apps already running.
s187 (2026-05-31): The save-once answer cache is built, merged, and proven working on the live test user. The expensive question-and-answer generation can now be done one time by a maintainer and saved into each library as a small portable file, then reused for free by every other machine, which only re-embeds locally at no model cost. Two switches control it, both off by default and easy to remove: one that saves the generated answers back into the library, and one that reuses a saved cache instead of regenerating. Nothing in the memory service changed; the cache sits on top of it. We proved both halves on the live test user against its real Hero OS guide library: turning on the save switch wrote a cache file for every page with the real questions and answers, and turning on the reuse switch rebuilt the full set of answers from those files with no model calls at all, after which asking what Hero OS is returned correct, grounded results just as the paid path does. The saved files are committed with a safe, non-personal author identity because the libraries are public. The remaining steps all publish into the four public libraries and so wait for a go-ahead: do the one-time generation and publish for the other three libraries, add the small browsable-book definitions for those three, and wire the voice widget so a user can ask by speaking.
s186 (2026-05-31): The fix that makes a new user's documentation libraries work was shipped to the admin machine, and the reason a fresh machine showed no books was found and fixed. First, the change that pre-loads the four default libraries was deployed and confirmed: reinstalling the test user's machine now sets the default library list and clones all four libraries automatically. Second, we found why the library service never started on a fresh machine. It was set to depend on the AI-provider broker, which refuses to start until an AI key is present, so on a brand-new machine with no key yet the whole chain failed and the install errored. Since that broker is parked and the library service reaches AI through the memory service instead, we removed the dependency, and the library service now starts cleanly on a key-less fresh machine. Third, we fixed why the library web page showed no books: a fresh machine never turned its cloned libraries into browsable books because of a start-up ordering gap. We added a step that builds each library's books from the small definition the library carries, at no AI cost, and proved it live: the test user now shows the Hero OS Guide as a seven-page browsable book. We also sent the upstream team a concrete, easy-to-switch-off plan to restore saving the generated answers back into each library so the cost is paid once and never repeated. Next: do the same one-time book setup for the other three libraries, build the save-once answer cache behind an off switch, publish the answers into the four libraries, and wire the voice widget so a user can ask by speaking.
s185 (2026-05-30): A brand new test user was created and provisioned from scratch to confirm onboarding works for a fresh user, not just the hand maintained existing one. The grid provisioning error from earlier sessions was diagnosed and cleared (a wrong machine identifier was being passed; the correct one was already configured), so the new user's machine was created, the full stack installed automatically, and it reached the ready state. The core shared engine wiring is now proven on a genuinely fresh user: the machine came up pointed at both the shared embedding and voice engines with its own access tokens, the engines recognised it, and its dashboard is reachable. Two gaps that only a fresh install reveals were found and are being carried as fixes: the documentation library service does not start automatically and its default libraries are not pre loaded on a new machine, and one memory sub service failed to start. The existing test user was left in place as the known good reference until the fresh path is fully green. Next: deploy the already merged change that pre loads the default libraries, get the library service starting automatically on a fresh machine, fix the failed memory sub service, rotate the test access token, then retire the old test user.
s184 (2026-05-30): The documentation-library experience is proven end to end on the live test user, and the app list in the tester dashboard now matches how services really work. We refreshed the Hero OS guide content to today's design, then loaded all four default public libraries (the Hero guide plus the public Geomind, OurWorld, and Mycelium docs) on the test user and confirmed that asking a question scoped to each library returns correct, grounded answers. As designed, you ask a question inside one library (or one book), never across all libraries at once. The embedding work is done for free by the shared engine on the admin machine; only the one-time question-and-answer generation uses a paid model. We also reworked the dashboard's service list so each app is installed as a whole (one Install button brings up its background service, its web page, and its admin page together) and, once installed, shows its parts as separate manageable rows; the deployment runbook was updated to match. Two important gaps were found and written up for the upstream team to decide on, because a recent rework removed them: the generated question-and-answer data is no longer saved back to the library and published, so today every new machine would redo that paid generation from scratch, and the library web page shows no books because the step that turns a cloned library into browsable books was also removed. Separately, the shared build tool had stopped building across all components after an upstream rename; we traced and fixed it so released builds work again. Next: a full walk on a freshly provisioned tester to confirm it all comes up automatically (a grid issue still blocks fresh provisioning), rotate the test access token, tidy the existing test user, and act on the team's decision about restoring the publish-once library data.
s183 (2026-05-29): Worked the follow-up list from last session. First, the automated release builds: audited every component's latest publish run and fixed the ones still failing, so released binaries refresh again. The failures fell into two kinds: build filters left pointing at component or package names that earlier restructuring had renamed or removed (including the assistant, which was still linked to a retired embedding component and was rerouted to use the current memory service for its tool search), and a couple of components that did not pin their dependency versions, so the build picked up a moving upstream piece and broke; those now pin their versions like the others. Five components were fixed, merged, and confirmed publishing successfully again. Two unrelated ones are left for follow-up (one tangled in cross-project version mismatches, one a brand-new example owned by another author) and are noted on the build follow-up issue. Second, the documentation library service now starts cleanly as a managed service for testers: it had been told to place its socket under the administrator account's home folder, which the unprivileged service user cannot write to, so managed startup failed while a hand-started copy worked; corrected to use the per-user runtime location like the other services, and proven running on the live test user. Next: refresh the documentation library content to match today's architecture and re-prove incremental re-indexing, pre-load every new tester with the default public libraries, then the full walk on a freshly provisioned tester.
s182 (2026-05-29): The written book summary now works end to end on a test user, the goal of this phase. The library service was aligned to the memory service current workspace model (it had been omitting the workspace label, which is why an earlier search returned nothing), then proven live on the existing test user: ingesting a document produced question and answer pairs embedded through the shared engine on the admin machine, and asking what Hero OS is returned three correct, grounded results where before it returned nothing. While doing this we found the automated release build was broken across nearly every component, because a recent change to the build tool moved where it installs and the publish workflows still looked in the old location, so the build failed immediately and stopped refreshing the released binaries. We fixed the release workflows across about forty components and removed a dangerous cleanup step that would have erased already installed programs, and confirmed the fix by watching several components publish successfully again. A few components still fail later in their own build for unrelated reasons and are filed to fix. New follow ups filed: audit and fix the remaining build failures, refresh the documentation library to match today architecture, make the library service start cleanly as a managed service, and pre load every new tester library with the default public libraries. Next: work that list in order, then finish the full walk on a freshly provisioned tester (a grid issue blocked a fresh provision this session, so the proof ran on the existing test user).
s181 (2026-05-29): Phase 1 E3 is shipped and merged, and the shared engines are now consolidated and live. The deployer that wires each new tester to the shared services was generalized from the embedding engine alone to both engines (embedding and voice) in one loop: when a tester is provisioned it now issues that tester one access token, registers it for both engines on the admin machine, and points the tester's clients at both engines over the private network; when the tester is deleted the token is revoked for both engines (previously only one, which could have left a voice token live after deletion). On the admin machine the older duplicate embedding and memory services were stopped and removed, and the voice engine was switched from a hand-started process to a properly managed service that starts automatically. Both engines' security checks were re-confirmed live after the changes: no token refused, a valid token accepted with a real result, a wrong token refused, and a token used under another tester's identity refused, including real synthesized speech returned through the voice engine. The library-service alignment flagged last session was filed as its own task (the library service still omits the workspace label the memory service now requires). Next: align the library service, then a full end-to-end walk where a freshly provisioned tester gets a grounded summary of one of its own books (confirming both engines serve it and the security checks hold) plus retiring the last stale settings on the existing tester.
s180 (2026-05-29): Phase 1 E2 (the shared voice engine) is shipped, merged, and proven live. The voice service now runs once on the admin machine and serves many testers over the private network with the same per-tester security as the embedding engine: every call carries that tester's own access token and identity, and the engine refuses any call with no token, a wrong token, or a token used under another tester's identity, and will not start its private listener if it cannot reach the secret store. One per-tester token now works for both the embedding and voice engines. Each tester's voice client sends speech to the shared engine when its endpoint is configured and falls back to a local engine otherwise, with the token never written to logs. Proven live on the test grid: the voice engine's security check passed every case (no token refused, valid accepted with real audio returned, wrong token refused, wrong identity refused), and from a separate tester machine the engine returned real synthesized speech across the private network using only that tester's token. Both code changes were merged. The voice engine was started by hand for the test and then stopped, and the next step makes it start automatically. While preparing the written book-summary demo we found the library service still talks to the older memory interface and does not send the workspace label the memory service now requires, which is why an earlier search returned nothing, so that needs a small alignment in the library service before the written summary works. Next: wire each new tester to both engines automatically at provision time, register the voice engine as a managed service, retire the older duplicate services, then the library-service alignment and the end-to-end written and spoken book summary.
s179 (2026-05-29): Phase 1 E1 is shipped, merged, and proven live. The shared embedding engine now runs on the admin machine with per-tester security: every call carries that tester's own access token and identity, and the engine refuses any call with no token, a wrong token, or a token used under another tester's identity, and will not start its private network listener at all if it cannot reach the secret store. Each tester's library service was installed on the test machine and pointed at the engine; it embedded documents through the engine across the private network using its own token, and the security check passed every case (no token refused, valid accepted with a real result, wrong token refused, wrong identity refused). Both code changes were merged. The AI broker stays parked; AI access goes through the direct client in the shared library plus each tester's own provider key. Next: the same shared model for the voice engine, then automatic wiring of each new tester at provision time, then the end to end demo where the assistant summarizes one of the tester's own books.
s178 (2026-05-29): Phase 1 was re-based onto the dedicated provider services the upstream team now ships, after syncing with the latest development showed the embedder, voice, and memory pieces had changed underneath the earlier plan. The design is unchanged in spirit (shared engines on the admin VM, each tester's own data and apps on their own VM); what changed is the building blocks. The embedding engine is now hero_embedder_provider (an OpenAI-compatible service) with hero_memory as the per-tester entry point that uses it; the voice engine is now hero_voice_provider, which runs once on the host; and each tester runs only the light clients (memory, books, the assistant, the voice widget) that reach the engines over the private network. Per-user security is now required from the start: every tester call to an engine carries that tester's own token and identity, and an engine refuses any call without a valid matching token, so no tester can act under another tester's identity. Combined with each tester's data staying on their own VM, isolation is airtight. The shared AI broker is deferred for now (its status is uncertain after recent changes); each tester's assistant uses its own provider key, and we validate the assistant by having it summarize one of the tester's own books. The full re-scope is in the comment below. No code or live infrastructure was changed this session; it was a planning and documentation realignment.
s177 (2026-05-29): Phase 1d (shared voice) is code-complete. The voice service now has an authenticated private-network listener, so one voice engine on the admin VM can serve many tester VMs for speech-to-text and text-to-speech, with a per-tester access token and tenant identity check, instead of every tester loading the large speech models locally (the same shared-hub pattern already proven for the embedding service). Testers that present no token, a wrong token, or a token under another tester's identity are refused; the listener will not even start if it cannot reach the secret store to validate tokens. The consumer side falls back to a local engine automatically when no shared endpoint is configured, so single-machine and self-hosted deployments are unaffected. The two voice hub binaries build and pass linting cleanly and the change is committed on a feature branch, but it is not merged yet: the voice repository as a whole currently does not compile because an unrelated, in-progress upstream refactor (moving the voice data API to async) is only half-landed, and the merge waits for that to finish. The work was isolated so the voice hub builds independently of that broken area. Remaining and deferred to the next session: wiring the deployer to hand each tester its voice token and endpoint at provision time, and the live end-to-end test on the sandbox grid (both gated on the upstream merge and the test VM). Phase 1b (welcome email) and 1c (shared AI broker) remain available to pick up independently if the voice merge stays blocked. No live infrastructure was changed this session.
s176 (2026-05-28): Phase 1a is now fully automated and live-proven end to end. The deployer wires a freshly provisioned tester VM to the shared embedding service automatically: when it provisions a tester it issues that tester a unique access token, registers the token on the admin VM, and configures the tester's assistant to use the shared embedder over the private network; when the tester VM is deleted the token is revoked. Verified on a throwaway tester created and torn down on the test grid: the four configuration values were set correctly, the assistant authenticated to the shared embedder and got a valid response, a wrong token and a missing token were both rejected, a token presented under a different tester's identity was rejected, and deletion removed the token. Shipped as one change in the deployer, merged. Next session: the welcome-email pipeline, or sharing the AI-provider broker the same way, reusing this pattern.
s175 (2026-05-28): Phase 1a hub-shared embedder is functionally proven end to end. A tester VM's assistant now uses the shared embedding service running on the admin VM over the private mycelium network, with per-tester token plus context authentication. Embeddings are computed by local models on the admin VM (no cloud embedding API), and a full document index plus semantic search works with correct ranking (a query about forgotten login credentials returns the password-reset document first). The assistant connects across VMs and answers using the tester's own AI provider key. Shipped in hero_embedder and hero_agent: the real embedding call now speaks the inference daemon's protocol, plus fixes for a build-dependency break and an environment-variable fallback bug that had been silently ignoring a pasted AI key. The embedder generation choice is recorded in a workspace-private decision. Next session automates the per-tester wiring in the deployer so a freshly provisioned tester VM comes up already connected to the shared embedder.
s173.5 (2026-05-28): Arc opened. home#238 (admin and tester UX) closed alongside the filing of this arc after the visible UX surface shipped across the admin path (deployer admin UI) and tester path (cockpit Services / Settings / Manual / About / Feedback pages, Bootstrap modals, dark-mode contrast, log_tail in install result modal, Manual completeness, Expose surface hidden in sandbox mode, connection-status dot fix in hero_admin_lib). About 25 commits across hero_cockpit, hero_os_tfgrid_deployer, hero_website_framework, hero_demo. D-36 minted (Resend supersedes SendGrid). hero_proxy#57 filed as a carried-independent item under this arc. Next session is s174 = Phase 1a welcome-email pipeline.
Context
home#238 (Phase 3: admin and tester UX) closes alongside the filing of this arc. That arc shipped the visible UX surface for both the admin path (the tfgrid_deployer admin UI: per-user VMs table, install state, provisioning) and the tester path (the cockpit: Services / Settings / Manual / About / Feedback pages, install-from-catalog flow, Bootstrap modals, dark-mode contrast).
This arc moves from "the UI is great" to "the sandbox actually works end-to-end as a tester would use it."
Scope of this arc
Hub-and-spoke architecture for sandbox (D-37)
A late-s173.5 architectural decision restructures Phase 1 around a hub-and-spoke model for resource-heavy stateless services. Admin VM runs shared hub services:
hero_embedder,hero_voiceSTT/TTS,hero_aibroker. Tester VMs run only per-tenant spoke services and consume the hub services over mycelium IPv6 with per-tester bearer-token auth. Per-context isolation enforced at the API layer viaX-Hero-Contextheader injection.Rationale: 10 testers × 6 GB embedder = 60 GB wasted; 1 shared 6 GB embedder serving 10 testers = honest about sandbox economics. Sovereign-tier paid customers still get their own hub on their own admin VM (sandbox-only decision, no inheritance).
Decision locked at
decisions/D-37-hub-shared-services-for-sandbox.md(workspace-private).Phase 1: onboarding plumbing
1a (P0). Hub-shared
hero_embedderPoC. Validate the hub-and-spoke pattern end-to-end with the simplest hub-eligible service. Server side: bind admin VM'shero_embedder_serverHTTP listener to[admin_vm_mycelium_ipv6]:9988in addition to whatever local UDS it has. Add bearer-token validation against aembedder/TENANT_TOKENSregistry on the admin VM. Client side: every spoke service that callshero_embedder(currentlyhero_indexer_server,hero_books_server) readsHERO_EMBEDDER_URLandSHARED_EMBEDDER_TOKENfrom itscockpit/*hero_proc slots; if set, talk to the URL with the token; if empty, fall back to local embedder. Deployer side:handle_install_hero_stacksets the URL + token slots on the tester VM and skips installinghero_embedderon tester VMs (saves about 6 GB RAM per tester). Verify end-to-end on alice123: confirmhero_embedderis not installed, confirm an indexing call fromhero_books_serverreaches the admin VM's embedder, confirm round-trip latency is sub-100 ms.1b. Welcome-email pipeline. Lift the
EmailProvidertrait andResendProviderimplementation fromznzfreezone_backend/src/providers/email.rs(about 100 LOC, production-tested in the freezone workspace) into Hero. Port to ureq 3.x to match what shipped inhero_os_tfgrid_deployer/crates/hero_tfgrid_deployer_server/src/forge_oauth_admin.rs. Wire two send points: (a)deployer.create_usersends a welcome-and-how-to-register email to the operator-supplied address; (b)deployer.install_hero_stacksends a VM-is-ready email with the cockpit URL and initial password onceinstall_statetransitions toready. The Resend API key lives in the hero_proc secret store atdeployer/RESEND_API_KEY. Dev-mode fallback when the slot is empty: log to console and returndev-mode-no-send, matching the freezone reference pattern.Prerequisite for 1b: an operator-owned domain verified with Resend (SPF + DKIM records). Default:
noreply@hero.ourworld.tf. The workspace already controlsourworld.tf(Forge runs onforge.ourworld.tf); thehero.subdomain keeps Hero emails visually distinct from Forge platform emails. Non-usable noreply by default: no MX record on the from-address, replies bounce naturally; Resend only handles outbound. Operator can substitute (noreply@ourworld.tfornoreply@forge.ourworld.tfboth work technically) at Phase 1b start.1c. Hub-shared
hero_aibroker. Tester pastes an AI provider key (Anthropic, OpenAI, OpenRouter, Groq, etc.) into the cockpit Settings page. The save handler propagates the key to the hub aibroker's per-tenant key registry ataibroker/TENANT_<context>_<provider>_KEYon the admin VM. Dependent services on the tester VM (hero_books_server,hero_agent_server) start once at install time and stay running; they callHERO_AIBROKER_URLfor inference and the hub looks up the calling tester's key from the registry. This collapses the originally-scoped BYO-key auto-start cascade entirely: no service-start dependency chain to manage, no service restart on key paste, the AI calls just succeed once the registry is populated.1d. Hub-shared
hero_voiceSTT/TTS. Same pattern as 1a but for the voice models. Lower priority than 1a/1c because voice is not on the critical path for most Phase 2 app walks; can slot later in Phase 1 or even after Phase 2 if needed.Phase 2: per-app functional verification walks
Walk every catalog service end-to-end on the live tester VM (currently
alice123) as a real tester would. One pass / fail outcome per app, in dependency order so infrastructure issues surface first:hero_db_serverhero_embedder+hero_indexerhero_aibroker_server(with BYO key set)hero_agenthero_bookshero_slideshero_whiteboardhero_planner/hero_bizhero_collabhero_voicehero_office/hero_foundry/hero_archipelagosOutput: one Forge issue per service that fails, with the failure mode and reproduction steps captured live during the walk.
Phase 3: close the gaps
Fix the per-app issues filed during Phase 2. Each app's issue gets its own ship cycle; some will be quick config drift, some will be deep product gaps in the app itself.
Carried-independent items (slot anywhere, not blocking arc closure)
InstallManifestRust crate that replaceshero_demo/deploy/single-vm/scripts/setup-binaries.sh. About 1 to 2 days of work. Best done after Phase 2 because the app walks may surface install-runner gaps that change the typed manifest shape.Email-provider decision logged alongside this arc
The previously locked email provider choice (SendGrid via
EmailSendertrait) is superseded by Resend viaEmailProvidertrait, locked in a new workspace decision file. Rationale: the operator already has Resend credentials available, the freezone workspace has a production-tested reference implementation that can be ported, and Resend's dev-mode fallback pattern (log to console when no API key) is friendlier for testers who don't have email credentials yet.Definition of arc closure
This arc closes when:
Signed-by: mik-tf mik-tf@noreply.invalid
Update s173.5 close + 1: architectural restructure of Phase 1.
Mid-discussion at session close, the operator surfaced that running heavy AI services per-tester is wasteful when 10 testers can fit on one TFGrid node. A shared hub-and-spoke model where the admin VM hosts
hero_embedder,hero_voiceSTT/TTS, andhero_aibrokerwhile tester VMs only run per-tenant services is the right shape for sandbox economics and matches real-world AI infrastructure patterns. Mycelium IPv6 between co-located VMs gives us sub-millisecond intra-node transport without the public gateway path.Decision locked at
decisions/D-37-hub-shared-services-for-sandbox.md(workspace-private): hub set is embedder + voice + aibroker; spoke set is everything else; transport is mycelium IPv6 with per-tester bearer tokens; per-context isolation enforced at API layer via injectedX-Hero-Contextheader; sandbox-only, sovereign tier inherits nothing.This restructures Phase 1 of this arc:
hero_embedderPoC. Bind admin VM's embedder to mycelium IPv6, deployer setsHERO_EMBEDDER_URL+SHARED_EMBEDDER_TOKENon tester VMs at install, tester VMs skip hero_embedder install (saving about 6 GB of RAM per tester). Verify alice123 can embed a document via admin VM's embedder.hero_aibroker. Tester pastes provider key on cockpit Settings, save handler propagates to hub aibroker's tenant registry, dependent services stay running. Collapses the BYO-key auto-start cascade originally scoped as Phase 1b — that cascade is no longer needed because hub aibroker handles per-tenant credentials directly.hero_voiceSTT/TTS. Same pattern as embedder.The
s174session entry point moves from welcome-email to hub-sharedhero_embedderPoC. Welcome-email shifts tos175. Mycelium IPv6 connectivity between admin VM and tester VMs on the same node is confirmed performant for embedder-class payloads.Signed-by: mik-tf mik-tf@noreply.invalid
Session 176 complete: home#239 Phase 1a (shared embedding service for the sandbox) is fully automated and verified end to end.
A freshly provisioned tester VM now comes up already connected to the shared embedding service on the admin VM, with no manual wiring. On provision the deployer issues the tester a unique access token, registers it on the admin VM, and points the tester's assistant at the shared service over the private network; on deletion the token is revoked. This was verified live by creating a throwaway tester, installing, exercising the authentication path (valid request accepted, wrong token rejected, missing token rejected, and a token used under another tester's identity rejected), and deleting it, then confirming the token was removed and no leftover resources remained.
Shipped as a single merged change to the deployer. The shared-service pattern is now reusable for the remaining shared services (the AI-provider broker and the voice models).
Signed-by: mik-tf mik-tf@noreply.invalid
Starting 1d (shared voice). The direction follows what we already proved for embeddings in 1a, because the voice service is a good fit for the same model: it is heavy (it loads large speech models into memory) and it is stateless (it holds no per user data, it just turns audio into text and text back into audio). So instead of every tester VM running its own copy, one voice service runs on the admin VM and every tester reaches it over the private overlay network.
Each tester is wired to it automatically at provision time with its own access token, exactly like embeddings. A request is accepted only when it carries that tester's token bound to that tester's identity, so a tester cannot use the service without authorization or act as another tester; a wrong or missing token is rejected. There is no data leak between testers because the voice service stores nothing per user: each request is processed on its own and discarded. The local single machine path is unchanged, so a self hosted deployment still runs voice on its own box with no token.
Concretely: add the token check to the voice daemon's overlay listener while the local listener stays open for same machine use, add a switch in the voice consumer so it points at the shared service when configured and falls back to local otherwise, and extend the deployer so it mints, sets, and revokes each tester's voice token alongside the embeddings token it already manages. The gate before this is considered done is a fresh tester provision where the cockpit voice widget records and gets a transcription back from the shared service, the token is rejected when wrong or missing, and the token is revoked when the tester VM is deleted.
Signed-by: mik-tf mik-tf@noreply.invalid
Phase 1 re-scope (2026-05-29)
The shared-services design is unchanged in spirit, but the building blocks moved. Upstream now ships dedicated provider services for the heavy stateless engines, so we no longer hand-roll shared listeners on the embedder and voice services. This updates Phase 1 to use those providers and to place each piece correctly.
Admin VM runs the shared engines (compute only, no personal data, one copy for everyone):
Each tester VM runs that tester's own data and apps:
Rule: shared engines on the admin VM, personal data and UI on the tester VM. A tester's documents and vectors never leave their own VM. Only short-lived compute (audio to text, text to vector, text to answer) goes to the admin engines, over the internal mycelium network.
Per-user security (required): every call from a tester to a shared engine carries that tester's own access token and tenant identity. Each engine validates the token against the identity, refuses any call with no token, a wrong token, or a token used under another tenant's identity, and refuses to start its private-network listener at all if it cannot reach the secret store to validate tokens. With personal data also staying on each tester's own VM, this is airtight: no tester can read another tester's data or act under another tester's identity.
AI broker deferred: the AI broker has had many recent changes and its status is uncertain, so we are not sharing it this round. Instead each tester's assistant uses its own provider key directly. Validation: ask the assistant for a summary of one of the tester's own books (it retrieves passages from the tester's memory, grounded by the shared embedder, and writes the summary with the tester's own key). Sharing the AI broker is a later step.
Changes from the earlier Phase 1 text:
Done for this round: a freshly provisioned tester can ask their assistant (by voice or text) for a summary of one of their own books and get a grounded answer, with embedding and voice compute served by the admin engines; the negative security cases (no token, wrong token, wrong identity) are all refused; and deleting the tester revokes their tokens.
Signed-by: mik-tf mik-tf@noreply.invalid
Architecture confirmation (shared AI)
Confirming the shared AI design for the sandbox, now that it has settled and the core is proven on the test machine.
Shared engine on the admin machine: one embedding engine serves all testers. Every call from a tester carries that tester's own access token and identity, and the engine refuses any call with no token, a wrong token, or a token used under another tester's identity. It also will not start its private network listener at all if it cannot reach the secret store to validate tokens. Each tester's documents and vectors stay on the tester's own machine; only short lived compute (text to vector) crosses the private network.
No shared AI broker for now: the broker is parked. AI access goes through the direct AI client in the shared library, and each tester's assistant uses its own provider key. The tester's library service uses that same direct client to reach the shared embedding engine.
Proven this round on the test machine: the engine token check passed every case (no token refused, valid accepted with a real result, wrong token refused, wrong identity refused), and a tester's library service embedded documents through the shared engine across the private network using its own token.
Next: the same shared model for the voice engine, then automatic wiring of each new tester to the engine at provision time, then the end to end demo where the assistant summarizes one of the tester's own books.
Signed-by: mik-tf mik-tf@noreply.invalid
Session 179 complete. Phase 1 E1 (shared embedding engine with per-tester security + each tester's library service pointed at it) is merged and proven live on the test machine. Next session starts the voice engine on the same model, then automatic wiring at provision time and the book-summary demo.
Signed-by: mik-tf mik-tf@noreply.invalid
Session complete (2026-05-29): the shared voice engine work landed. Per-tester security was added to the voice engine the same way it was added to the embedding engine, and the tester voice client was wired to use the shared engine when configured. It was proven live on the test grid: the engine's security check passed every case and a tester machine got real synthesized speech back across the private network using only its own token. Both code changes were merged. Two follow-ups for the next session: make the voice engine start and register automatically at tester provision time (and retire the older duplicate services), and a small alignment in the library service so it sends the workspace label the memory service now requires, which is needed for the written book summary.
Signed-by: mik-tf mik-tf@noreply.invalid
Session complete. Item A (the release-build audit and fixes) is done: five components fixed, merged, and confirmed publishing again; two unrelated ones deferred with notes on the build follow-up issue. The documentation library service now starts cleanly as a managed service for testers, proven on the live test user. Next session refreshes the documentation library content to the current architecture and re-proves incremental re-indexing, then pre-loads new testers with the default public libraries, then the full fresh-tester walk.
Session complete. Shipped the deployer change that pre-loads the four default libraries to the admin machine and confirmed the library list and clones land on a reinstalled test user. Found and fixed the two reasons a fresh machine had no working library: the library service depended on the parked AI-provider broker (which will not start without an AI key, failing the whole install on a new machine), now removed since the service reaches AI through the memory service; and a fresh machine never turned its cloned libraries into browsable books, now fixed with a start-up step that builds each library's books from the definition it carries, at no AI cost. Proven live: the test user shows the Hero OS Guide as a seven-page browsable book. The library-service fixes are committed and proven on the live machine but not yet merged. A concrete, easy-to-switch-off plan to restore the save-once answer cache was sent to the library team for a decision.
Session 187 complete. The save-once answer cache shipped and merged, and both halves were proven on the live test user against its real documentation library. Saving wrote one small portable answer file per page; reusing rebuilt the full set of answers from those files with no model calls, and searching returned correct grounded results identical to the paid path. The feature is additive (the memory service is unchanged), off by default, and reversible with a single switch or revert. The remaining work all publishes into the four public libraries, so it is parked for a go-ahead: the one-time generation and publish for the other three libraries, their browsable-book definitions, and wiring the voice widget.
Session complete. The portable answer cache is published to all four libraries and now replays automatically at startup at no model cost, so a fresh tester gets free grounded search with no manual step. The Ask the Librarian summary is fixed to use the tester's own AI key, and the cockpit has a new app-store style Apps page. Parked for your call: the Memory app's missing interface (hide it or publish it), a safe way for the assistant to verify pages behind the login gate, and starting all user apps by default on a fresh tester.
Session s189 complete. Closed all three parked items and got voice working end to end (speech-to-text, read-aloud, and the rewrite tool) live on the test user. Speech-to-text was root-caused to an outdated voice build on the tester plus the shared engine rejecting non-standard audio rates; both fixed, the engine fix merged and deployed, and proven with a full round trip. The login-gate view-bypass for the assistant and the per-app voice bar are pushed and await a maintainer go-ahead to merge. Details: hero_voice_provider issue 1.
Session s191 update. Shipped and deployed the deployer fix so a freshly provisioned tester boots Hero Books in consume mode (grounded library search with no per-tester language-model cost), and removed stale duplicate voice binaries from the latest release that had been breaking speech-to-text. Confirmed the memory service publishes cleanly on the latest release. New-tester provisioning now runs end to end on a fresh node (account, VM, public gateway URL, and single-sign-on app all created), but the follow-on step that installs the full app stack onto the new VM errored partway through and is the first task for the next session. Because that new account is not yet fully working, the two existing demo accounts were deliberately left running and untouched. No design changes.
Session complete (s192). A brand-new test account was provisioned from scratch and taken all the way to a ready install: the four default libraries clone and browse as books, the per-account links to the shared engines are in place, the dashboard loads behind the login gate, and the talk-and-listen voice round-trip works (spoke a sentence, got it back as text). Reaching ready required fixing three breakages that only a genuinely fresh account hits, all now fixed and the corrected binaries republished so the next fresh account gets them: a stale published copy of the core process supervisor that crashed on start, and two start-up mis-configurations in the books service. One gap is left and tracked separately (hero_books#148): the instant no-cost AI answers do not appear yet on a fresh account because the prebuilt answer caches no longer line up with the current library text, so search returns empty until the caches are refreshed. Because of that the two older demo accounts were deliberately left running and untouched. Next session: refresh the caches so search works on a fresh account, then retire the two old demo accounts and rotate the shared-engine access token.
Signed-by: mik-tf mik-tf@noreply.invalid
Session 193 complete. Grounded, no-cost AI answers now work on a fresh test account. We disproved last session's guess that the saved answer caches had drifted from the library text: on the live machine all of the cached pages matched exactly. The real cause was a first-boot timing problem. The step that loads the saved answers ran before the memory service and the shared answer engine had finished starting, and those failures were swallowed silently, so search stayed empty until someone restarted the service by hand. The fix waits for the memory service, retries briefly while the engine warms up, loads the pages it can even if a few have changed, and logs clearly instead of silently. Proven on the live test machine: a restart loads about eight and a half thousand answer pairs at no model cost and all four libraries return grounded answers. The change is committed and proven but waits for a maintainer review and go-ahead before merging, because this app is outside the auto-merge set and a maintainer owns its working branch. Next: review and merge the fix so the published build picks it up, then retire the two older test accounts and rotate the test access token.
Session 194 complete (short, live-ops only, no code changes).
Goal was to retire the three old test accounts and their machines from the admin screen. This is blocked by a grid-side problem and the session was cut short for another priority.
What happened: the first delete failed because the admin machine's grid-control service had a stale long-lived connection to the grid ("background task closed, restart required"). Restarting that service fixed it cleanly, and the machine list then read fine again. The retry failed differently: cancelling the on-grid contracts is rejected with a 502 gateway error, reproducible in a fresh private browser window. Listing machines (a read from the chain) works, but deleting (which must reach the actual grid node to tear the machine down) does not, so the grid node or its relay looks unreachable, not anything on our side.
State at close: all three test accounts and machines left running and untouched, their contract cancellations stuck pending. No code touched, no commits.
Next session: retry the delete once the grid node and its relay are healthy; if it stays stuck, cancel the contracts directly on the chain since the node teardown cannot complete anyway. Then finish the parked items from last session, the review-and-merge of the search fix and the test access token rotation.
mik-tf referenced this issue2026-06-03 16:35:20 +00:00
Session s198 complete. The Kimi AI assistant is now part of the standard tester install and was proven driving the planner and whiteboard on the live demo machine: installing a tester brings the assistant up, writes its config, and reaches the planner and whiteboard as tools through the machine's local router, and a live test call created a workspace in each. The assistant uses the tester's own AI key. Voice was confirmed still working. The deployer change is merged. Carried to next session: finish moving voice to publish from the stable branch (a routine build fix on the voice engine plus a catch-up update on the voice app), a one-line fix so the pasted AI key reaches the assistant automatically, re-adding the assistant tile to the dashboard, and the full-stack tester with the documentation libraries and grounded search.
Signed-by: mik-tf mik-tf@noreply.invalid
Session 199 complete. The Kimi assistant now opens and chats on the live demo machine end to end (open the assistant, start a conversation, send a message, and it streams back a real answer). We traced why it had not responded to three deployment setup problems and fixed all three: the app launcher entry and the installer's provider setting are shipped from the stable branch, and the assistant's own two code fixes are ready on a branch and filed for the maintainer at lhumina_code/hero_kimi_rust#4 (with one follow-up to bundle the assistant's setup files into its program). Next: the full-stack tester with the documentation library and grounded search.
Signed-by: mik-tf mik-tf@noreply.invalid
Update and the plan from here. The gateway WebSocket fix and the assistant's setup-file embedding fix are merged to main, so the assistant now opens, is styled, and chats on a deployed machine with no manual staging, and a fresh install gets the same. Next, in order: (1) provision a fresh tester from the stable release and confirm the assistant and demo apps come up working with no hand-fixing (home#244); (2) add the welcome email so testers can be onboarded automatically (home#236); (3) confirm each demo app works standalone, by voice, and driven by the assistant (home#248); (4) the full documentation-library and memory Ask the Librarian stack (home#246).
Session 201 complete. We proved the AI assistant can read and write across the planner, whiteboard and slides apps on a live deployed machine, running on the newest Kimi model, and supported a live demo (fixed the assistant model key and added the slides app to its tools). Both release builds a fresh machine needs are published and verified. Open gap: the admin installer is an older build, so a freshly created machine does not yet get a working assistant, which the next session closes by updating the installer. Filed home#249 for assistant first action speed and the installer gaps.
s202 close (planning and issue session, no code or sandbox changes). The demo on the live tester went well; this session verified, read only, that the deployed assistant is healthy and on the intended model with its tools connected, brought all local repos up to date with main, and captured the new work as issues. Filed: web search and fetch need a Kimi subscription key plus a way for testers to add their own (#250), external viewers cannot open shared whiteboard links (#251), and the provision button shows no loading feedback (#252). Also noted on #249 that the assistant speed fix is merged upstream, and on #248 the plan to deploy the voice update and the orchestrator for the planner voice agent. Next is a short session to apply the Kimi key, deploy the faster assistant build, and switch on the planner voice agent.
Signed-by: mik-tf mik-tf@noreply.invalid
Session 203 complete. Closed a security gap found during demo prep: a tester created through the admin screens by hand was reachable with its login protection off. The tester creation tool now repairs a missing web address and sign-in protection automatically and refuses to finish a tester without sign-in protection, with a one-click repair button in the admin screen. Shipped and deployed to the admin machine, the previously exposed tester was repaired in place (kept, not deleted) and now requires sign-in, and a brand new throwaway tester was created from scratch to confirm the whole flow end to end and then removed. The underlying grid behaviour is logged for the maintainers at lhumina_code/hero_compute#133, the detail is tracked at #253, and a later idea for live video meetings is at #254. Next session resumes the planned assistant polish (search, speed, voice).
Signed-by: mik-tf mik-tf@noreply.invalid
Session 204 is complete. We finished the planned polish of the demo assistant on the live test machine: it is faster on its first action, it can now answer with live web search (pointed at the subscription search service, key kept off disk), and you can speak to the planning assistant and hear it answer back. We also removed a leftover test item and added a spinner and label to the admin tester-action buttons so they no longer look frozen. A person verified all three user-facing results by hand: web search, voice and the spinner all work. Small follow-ups for next time: rotate the shared search key after the demo, save the assistant's web-search settings into the tester setup tool so freshly built machines get them automatically, and refresh a couple of later release builds so other machines match the live one. Next up is opening shared board links without a sign-in wall, and letting each tester add their own assistant key.
Signed-by: mik-tf mik-tf@noreply.invalid
Session 206 complete. Planning plus a quick live fix, no new product code. We locked the plan to simplify tester onboarding: drop the SSH-key requirement for testers (it is only for terminal access, which a normal tester never uses, and sign-in is by ordinary account anyway), and allow reusing an existing account instead of always creating a new one. This is safe because sign-in reads only a person's basic profile, not their private projects, and a tester with no terminal access cannot read the AI keys we preload. The full plan and the code locations are written up in #247 . Separately we brought one existing test machine up to match the main demo machine: same fast assistant with web search and working keys, able to drive the planner, slides and whiteboard and use voice. Next: implement the simpler onboarding, then the welcome email.
Session 207 complete. The simpler tester onboarding is built, tested, merged, and live for newly created machines: no SSH key is required, an existing account can be reused, and there is a friendlier page for someone who signs in without access. The released builds for the deployer and the gateway refreshed successfully, and existing running machines were unaffected (the live test machines still respond). One honest caveat was recorded for a later proper fix: a determined tester could read the shared AI keys through the on-machine assistant, accepted for now because those keys are limited and are rotated after each demo. Next: the welcome email.
Signed-by: mik-tf mik-tf@noreply.invalid
Roadmap from here, one session at a time in order from the next. Ordering: onboarding first, then admin confidence, then the impressive app features, then extra apps as backlog. (Flip the admin and app blocks if a demo prioritises the apps.)
7 and 8. The full documentation-library stack (Ask the Librarian): books, memory and grounded search, then browsable books with AI summary and voice (#246).
Backlog, after the core, all the same add-an-app pattern: the research app (#243), the full Hero OS desktop and the orchestrator (#257), and real-time audio and video meetings (#254).
Signed-by: mik-tf mik-tf@noreply.invalid
Session 209 complete. Shipped the whole-domain sign-in gate (hero_proxy, commit 17f1f68 on main): the bare address, the service dashboard, and the metadata endpoints now require Forge sign-in on any gated machine, with only the health and login-callback paths public. Proven live on the admin machine and all three running test machines (bare address went from a 200 to a sign-in redirect; health still responds; the dashboard apps still work for signed-in users). Welcome email improved (hero_os_tfgrid_deployer, commit b24c757 on main): one email at ready states the username and the temporary password for a new account, or the existing password otherwise, and links to the cockpit app; any account name (capitals or dashes) now onboards. The setup runbook was updated (home, commit
d417ab3) with the email service key step and the whole-address gate behaviour, and this issue's admin-panel item was extended to let an admin set the email config from the dashboard. A fresh machine reproduces all of this from the main branch automatically; only the email key is a manual setup step, now documented. Remaining for next session: an end-to-end check of the welcome email on a freshly created machine, and dashboard visibility plus a reliable one-click update for keeping each machine current.Signed-by: mik-tf mik-tf@noreply.invalid
Session 212 complete: one-click onboarding shipped and proven.
The admin users page now has a single "Add and set up" action that creates or registers the person, provisions their machine, and installs the stack in one go, with staged progress (adding, provisioning, installing, ready) and a ready to copy sign in link at the end. A brand new account's one time password is surfaced inside the flow. The previous separate buttons stay on the person's page as the resume path for a failed or deferred step. We also added node capacity awareness: the form shows how many more testers fit on the host before you click, and a full or offline host is refused up front rather than half building a machine and leaving a stuck contract.
Proven end to end on the live system: a throwaway test tester was added through the new button and went all the way to ready (its address required signing in, welcome email sent, build version recorded); the free capacity readout tracked the host dropping by one tester and returning after teardown; the throwaway was removed cleanly and the three real test machines were untouched.
This closes #255. Next: setting the email service key from the dashboard.
Signed-by: mik-tf mik-tf@noreply.invalid
Session 214 complete. Shipped to the admin dashboard and live on the test admin machine: per provider default assistant keys with a per tester choice of which to apply (or none for bring your own key), per tester and instance wide on/off for the welcome email, editable welcome email wording with a send a test copy preview, a proper top menu with a dedicated Settings page and a Manual page, and per tester key checkboxes that grey out providers with no key set. Filed the next steps: #259 (assistant and voice on the tester and admin screens), #260 (dashboard polish), and #261 (manage the assistant key centrally).
Signed-by: mik-tf mik-tf@noreply.invalid
Session 215 complete. The administrator dashboard's access and key controls are done: managing the team's support sign-in keys from Settings, and a per tester Access and keys panel for choosing who may sign in to a tester (with the administrators and the tester always kept on the list so no one can be locked out) plus an optional, off by default command line key for a technical tester. Shipped to the stable branch and verified on the live admin machine. Next up is putting the assistant and voice on the tester and admin screens.
Session 218 complete. Built the onboarding-reliability fix (wait for a fresh machine's network before installing, set up the login gate directly and reliably, retry racy service starts) on a work branch and installed it on the admin machine, but held it back from the shared branch because we could not prove it live this session: a freshly published version of the core tooling currently cannot start the services on any brand new machine (see #268 , urgent), and despiegk's rebuild is separately blocked by a leftover network-name registration from a prior teardown. Also filed #266 (keep machines pre-built and ready) and #267 (show the setup log in the dashboard). The four real test machines are untouched and working.
Signed-by: mik-tf mik-tf@noreply.invalid
Session 219 complete: sandbox now fully builds and publishes from the stable branch (all eleven services), the urgent setup breakage is fixed and closed (#268), the held setup-reliability fix is merged, and the voice service is rescued and rebuilding from stable. Proven by a throwaway end-to-end setup (ready, login gate, voice answering). Status note: #269 . Signed-by: mik-tf mik-tf@noreply.invalid
Session 220 complete. Fixed the login crash that blocked every new tester (gateway security-library setup, lhumina_code/hero_proxy#60), rebuilt despiegk, and shipped three more stable-branch fixes (auto-compute the web address on a missing report, build server green again, code-generator parallel-build race). A brand-new test signup came fully live in about nine minutes. Login now reaches the final identity-check step and fails one layer later, which is the first task next session (lhumina_code/hero_proxy#61). Also filed: internal-network sign-in gap (#271), slow fresh-machine route (#272), capacity over-count (lhumina_code/hero_os_tfgrid_deployer#21), unique gateway names (lhumina_code/hero_os_tfgrid_deployer#22).
Session 221 complete. Tester login now works end to end (fixed the three stacked login-token problems, both login issues closed, proven with a real sign in). Both assistant keys now auto-provision to every new tester and were added to the existing four. despiegk's machine was rebuilt cleanly with the fix and both keys (account untouched); his public web link is still blocked by a grid network routing problem on ThreeFold's side, filed with the ThreeFold team and left running for them to inspect. Roadmap re-planned: harden the single host first (unique gateway names, honest capacity, a small pool of ready machines), then per-app verification, with multi-host as the explicit next scaling step since the current host is nearly full.