[deployer] Pre-warm a pool of tester VMs so onboarding is fast and reliable #266
Labels
No labels
meeting-notes
meeting-transcript
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/home#266
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The demo runs on a dedicated node we already pay for, so leaving virtual machines on it idle costs nothing extra. Instead of creating a tester VM on demand each time we add someone, which makes a person wait while a brand new machine boots and joins the network (and sometimes that network route never comes up in time, so the install fails outright), we could pre provision a pool of tester VMs up front, each already booted with the admin SSH keys and left ready. A periodic health check would ping each pool machine to confirm it is reachable, and tear down and recreate any that are unresponsive, so the pool stays known good. Adding a tester then becomes preparing their account and running the Hero stack setup on a machine that is already booted and reachable, which takes the slow and flaky part off the moment someone is actually waiting. A natural follow up is to also pre install the Hero binaries on the pool machines so only the per user configuration runs at assignment, which makes onboarding both reliable and fast. This needs the machine records to carry a pool and assignment model rather than one machine created per user, and the provision step to split into create a pool machine and assign a machine to a user. Capacity should be sized on real placement rather than raw slice counts, and the recreate path needs to handle teardown reliably.
Signed-by: mik-tf mik-tf@noreply.invalid
One refinement on the golden image follow up mentioned above: pre installing the binaries on pool machines is probably not worth it. The binaries are only about two minutes of the install, and main and development rebuild often, so a pre baked image would go stale quickly for marginal gain. The warm pool on its own already takes onboarding from around twenty minutes down to a few minutes by removing the brand new machine boot and network wait. So lets do the warm pool first and only revisit pre baking if we find a clean way to keep it fresh.
Signed-by: mik-tf mik-tf@noreply.invalid