Phase 7 — VM allocation + PoolAssignmentProvisioner #8

Open
opened 2026-05-21 12:51:15 +00:00 by mik-tf · 0 comments
Owner

Tracks the Phase 7 work in hero_onboarding#1 — the first product the user can actually buy with their accumulated credit balance: a VM allocated from a pre-provisioned operator-managed pool.

Phase 6 / Phase 7 naming note: the meta-issue §Phase 6 bundles three things — Idenfy KYC integration + PoolAssignmentProvisioner + production keys. In practice this split across two sessions:

  • Phase 6a = Idenfy KYC integration, shipped in s2-008 (hero_onboarding#7, commit 360a942, D-15)
  • Phase 6b = PoolAssignmentProvisioner + VM allocation, this issue (s2-009)

The "production keys" half stays deferred until both sandbox loops are operator-validated end-to-end.

Scope (v0)

A single in-house PoolAssignmentProvisioner impl behind a Provisioner async-trait. v1 (TfchainAutoDeployProvisioner) is a deliberate scope cut from this session — it depends on hero_compute#116 gap closure (Mahmoud's wait_vm_ready, vm_exec streaming clarification, metadata field, deployer auth model) and on Kristof's hero_os_tfgrid_deployer arc landing first. The v0 ships the trait surface + an impl that operates against an operator-pre-provisioned pool, so v1 slots in as a pure additional impl without touching the trait or the gate logic.

Locked trait shape

#[async_trait]
pub trait Provisioner: Send + Sync {
    fn name(&self) -> &'static str;
    fn is_demo(&self) -> bool;
    async fn allocate(&self, user_sid: &str, plan: Plan) -> Result<VmAllocation>;
    async fn release(&self, allocation_sid: &str) -> Result<()>;
    async fn status(&self, allocation_sid: &str) -> Result<AllocationStatus>;
}

pub struct Plan {
    pub cost_cents: i64,
    pub ssh_pub_key: Option<String>,
    pub duration_hours: u32,
}

Two deliberate deviations from the meta-issue §Provisioning sketch:

  1. Identity is user_sid: &str, not &User. Mirrors the two existing provider traits (PaymentProvider::create_top_up(user_sid, ...) in crates/hero_onboarding_server/src/payment.rs#L92, KycProvider::start_session(user_sid, ...) in crates/hero_onboarding_server/src/kyc.rs#L108). Keeps the trait file decoupled from the schema crate; if an impl needs richer user data it fetches from OSIS.
  2. status() included now, not deferred. v0 PoolAssignment status is a trivial OSIS read, but v1 Tfchain allocate() returns early and needs polling — pre-baking the surface means v1 is a pure impl-side change.

Decision lock + revisability notes will be captured in D-17 at session close.

v0 PoolAssignmentProvisioner

Reads the operator's pre-provisioned VM pool from a JSON source. Two modes:

  • Production: VM_POOL_JSON hero_proc secret (context onboarding) containing Vec<PoolVm { vm_id, ssh_address, mycelium_address?, status }>. Operator pre-populates; assignments mutate via OSIS (the JSON source remains the operator's truth for VM identity, OSIS is the assignment ledger).
  • Demo: PROVISIONER_DEMO=true loads an in-memory 3-row pool — same escape-hatch pattern as IDENFY_DEV_MODE=true/DEMO_KYC=true (s2-008). Keeps scripts/smoke_vm_allocate.sh self-contained.

Allocation algorithm: linear scan for first status=available row, mark assigned, persist VmAllocation row in OSIS, return.

New routes (server)

Method Path Notes
POST /vm/allocate Cookie-auth → KYC re-gate (defense-in-depth, mirrors D-15) → balance pre-flight (Billing.credit_balance_cents >= plan.cost_cents → 402 if not) → provisioner.allocate()billing.credit_balance_cents = saturating_sub(cost_cents) in same OSIS write. Returns 303 → /dashboard.
POST /vm/release/{sid} User releases their own allocation.
GET /vm/list User's own allocations (JSON).
GET /admin/list-allocations Admin-secret-gated JSON feed (mirror of s2-008 /admin/list-kyc-sessions).
POST /admin/release/{sid} Admin force-release (for stuck/abandoned allocations).

Credit deduction (load-bearing): this is the FIRST request-time decrement of Billing.credit_balance_cents. Until now the balance has only been incremented via Stripe/ClickPesa webhooks (s2-004/5) and decremented via the aggregator's batched cron-time saturating_sub (s2-007 usage_aggregate.rs). Phase 7 is the first user-action-time decrement — smoke must exercise the live decrement path (the s2-007 deferred Stripe-sandbox-seed assertion lands here).

Schema additions

New crates/hero_onboarding_schema/schemas/onboarding/vm_allocation.oschema:

  • AllocationStatus enum: requested | active | suspended | released
  • VmAllocation rootobject: sid + user_sid @index + provisioner_kind + status: AllocationStatus + assigned_pool_vm_id? + tfchain_contract_id? (v1 placeholder) + cost_cents + ssh_pub_key? + expires_at: u64 + created_at: u64

User schema unchanged. 7 trigger stubs (*_trigger_new_post/get_pre/get_post/list_pre/save_pre/save_post/delete_pre) added in rpc.rs per s2-007 Phase B finding 1. Avoid updated_at: otime / created_at: otime field names per s2-007 Phase B finding 2 + s2-008 Phase B finding 2 collision rule.

Dashboard surface

Flip the placeholder "Active services" card into a real per-allocation table with Allocate VM button when credit_balance_cents >= default_plan.cost_cents. Side-effects insufficient-credit → /dashboard?error=insufficient_credit and KYC-required → /kyc/start?reason=vm_allocate redirect cycles.

Admin surface

5th view in hero_onboarding_admin (nav: Overview / Users / Payments / Aggregator / KYC / Allocations). Per-allocation table reading via reqwest from server's /admin/list-allocations (no second OSIS handle on admin process per the s2-007 architectural rule). Manual force-release button.

Acceptance

Mirrors prior phases: cargo check workspace + cargo test (expect ~45/45 unit tests = 39 carry-over + ~5 provisioner + 1 schema CRUD auto for VmAllocation) + lab build --release --install --workspace VICTORY 3/3 + lab infocheck 3/3 clean + cargo fmt --check + cargo clippy --workspace --all-targets -- -D warnings clean + scripts/smoke_vm_allocate.sh ~25/25 GREEN against PROVISIONER_DEMO=true + IDENFY_DEV_MODE=true.

Smoke checks: allocate-happy / insufficient-credit-402 / KYC-required-303 / pool-exhausted-503 / double-allocate-blocked / release-then-re-allocate / admin-list-allocations / admin-force-release / live-balance-decrement (Stripe-sandbox webhook self-sign seeds the balance — addresses the s2-007 deferred live-balance-decrement assertion).

Open questions surfaced to Kristof/Emre

  • Q#6 (pool size for v0): how many VMs does ops pre-deploy for launch? Affects demo-day pool exhaustion behavior. v0 PROVISIONER_DEMO uses 3 rows; production VM_POOL_JSON content is operator-defined.
  • Q#8 (refunds): out of v0 scope. release() does NOT credit cost back to balance — this is a deliberate Phase 7 scoping choice; refund logic lives in a future ops-tooling phase. Smoke documents this behavior.

Cross-references


Edit 2026-05-21 (post-Track-A-s135 race resolution): All references to D-16 in this issue body have been re-numbered to D-17. Track A's s135 minted D-16 (cockpit-byok-user-forge-token-namespacing) 40 minutes before this issue was filed; per CLAUDE.md ID-NN race rule (first-minted wins), Track A keeps D-16 and this trait-shape decision lock is now D-17. The decision file is decisions/D-17-provisioner-trait-shape.md in the workspace (the file was renamed by Track A at /stop after the squash-merge race was resolved).

Tracks the Phase 7 work in [hero_onboarding#1](https://forge.ourworld.tf/lhumina_code/hero_onboarding/issues/1) — the first product the user can actually buy with their accumulated credit balance: a VM allocated from a pre-provisioned operator-managed pool. **Phase 6 / Phase 7 naming note:** the meta-issue §Phase 6 bundles three things — Idenfy KYC integration + `PoolAssignmentProvisioner` + production keys. In practice this split across two sessions: - **Phase 6a** = Idenfy KYC integration, shipped in s2-008 ([hero_onboarding#7](https://forge.ourworld.tf/lhumina_code/hero_onboarding/issues/7), commit `360a942`, D-15) - **Phase 6b** = `PoolAssignmentProvisioner` + VM allocation, this issue (s2-009) The "production keys" half stays deferred until both sandbox loops are operator-validated end-to-end. ## Scope (v0) A single in-house `PoolAssignmentProvisioner` impl behind a `Provisioner` async-trait. v1 (`TfchainAutoDeployProvisioner`) is a deliberate scope cut from this session — it depends on [hero_compute#116](https://forge.ourworld.tf/lhumina_code/hero_compute/issues/116) gap closure (Mahmoud's `wait_vm_ready`, `vm_exec` streaming clarification, metadata field, deployer auth model) and on Kristof's `hero_os_tfgrid_deployer` arc landing first. The v0 ships the trait surface + an impl that operates against an operator-pre-provisioned pool, so v1 slots in as a pure additional impl without touching the trait or the gate logic. ## Locked trait shape ```rust #[async_trait] pub trait Provisioner: Send + Sync { fn name(&self) -> &'static str; fn is_demo(&self) -> bool; async fn allocate(&self, user_sid: &str, plan: Plan) -> Result<VmAllocation>; async fn release(&self, allocation_sid: &str) -> Result<()>; async fn status(&self, allocation_sid: &str) -> Result<AllocationStatus>; } pub struct Plan { pub cost_cents: i64, pub ssh_pub_key: Option<String>, pub duration_hours: u32, } ``` Two deliberate deviations from the [meta-issue §Provisioning sketch](https://forge.ourworld.tf/lhumina_code/hero_onboarding/issues/1): 1. **Identity is `user_sid: &str`, not `&User`.** Mirrors the two existing provider traits (`PaymentProvider::create_top_up(user_sid, ...)` in [`crates/hero_onboarding_server/src/payment.rs#L92`](https://forge.ourworld.tf/lhumina_code/hero_onboarding/src/branch/development/crates/hero_onboarding_server/src/payment.rs#L92), `KycProvider::start_session(user_sid, ...)` in [`crates/hero_onboarding_server/src/kyc.rs#L108`](https://forge.ourworld.tf/lhumina_code/hero_onboarding/src/branch/development/crates/hero_onboarding_server/src/kyc.rs#L108)). Keeps the trait file decoupled from the schema crate; if an impl needs richer user data it fetches from OSIS. 2. **`status()` included now**, not deferred. v0 PoolAssignment status is a trivial OSIS read, but v1 Tfchain `allocate()` returns early and needs polling — pre-baking the surface means v1 is a pure impl-side change. Decision lock + revisability notes will be captured in **D-17** at session close. ## v0 `PoolAssignmentProvisioner` Reads the operator's pre-provisioned VM pool from a JSON source. Two modes: - **Production**: `VM_POOL_JSON` hero_proc secret (context `onboarding`) containing `Vec<PoolVm { vm_id, ssh_address, mycelium_address?, status }>`. Operator pre-populates; assignments mutate via OSIS (the JSON source remains the operator's truth for VM identity, OSIS is the assignment ledger). - **Demo**: `PROVISIONER_DEMO=true` loads an in-memory 3-row pool — same escape-hatch pattern as `IDENFY_DEV_MODE=true`/`DEMO_KYC=true` (s2-008). Keeps `scripts/smoke_vm_allocate.sh` self-contained. Allocation algorithm: linear scan for first `status=available` row, mark `assigned`, persist `VmAllocation` row in OSIS, return. ## New routes (server) | Method | Path | Notes | |---|---|---| | POST | `/vm/allocate` | Cookie-auth → KYC re-gate (defense-in-depth, mirrors D-15) → balance pre-flight (`Billing.credit_balance_cents >= plan.cost_cents` → 402 if not) → `provisioner.allocate()` → `billing.credit_balance_cents = saturating_sub(cost_cents)` in same OSIS write. Returns 303 → `/dashboard`. | | POST | `/vm/release/{sid}` | User releases their own allocation. | | GET | `/vm/list` | User's own allocations (JSON). | | GET | `/admin/list-allocations` | Admin-secret-gated JSON feed (mirror of s2-008 `/admin/list-kyc-sessions`). | | POST | `/admin/release/{sid}` | Admin force-release (for stuck/abandoned allocations). | **Credit deduction (load-bearing):** this is the FIRST request-time decrement of `Billing.credit_balance_cents`. Until now the balance has only been incremented via Stripe/ClickPesa webhooks (s2-004/5) and decremented via the aggregator's batched cron-time saturating_sub (s2-007 `usage_aggregate.rs`). Phase 7 is the first user-action-time decrement — smoke must exercise the live decrement path (the s2-007 deferred Stripe-sandbox-seed assertion lands here). ## Schema additions New `crates/hero_onboarding_schema/schemas/onboarding/vm_allocation.oschema`: - `AllocationStatus` enum: `requested | active | suspended | released` - `VmAllocation` rootobject: `sid` + `user_sid @index` + `provisioner_kind` + `status: AllocationStatus` + `assigned_pool_vm_id?` + `tfchain_contract_id?` (v1 placeholder) + `cost_cents` + `ssh_pub_key?` + `expires_at: u64` + `created_at: u64` User schema unchanged. 7 trigger stubs (`*_trigger_new_post/get_pre/get_post/list_pre/save_pre/save_post/delete_pre`) added in `rpc.rs` per [s2-007 Phase B finding 1](https://forge.ourworld.tf/lhumina_code/hero_onboarding/issues/6). Avoid `updated_at: otime` / `created_at: otime` field names per [s2-007 Phase B finding 2](https://forge.ourworld.tf/lhumina_code/hero_onboarding/issues/6) + [s2-008 Phase B finding 2](https://forge.ourworld.tf/lhumina_code/hero_onboarding/issues/7) collision rule. ## Dashboard surface Flip [the placeholder "Active services" card](https://forge.ourworld.tf/lhumina_code/hero_onboarding/src/branch/development/crates/hero_onboarding_server/src/main.rs#L952) into a real per-allocation table with `Allocate VM` button when `credit_balance_cents >= default_plan.cost_cents`. Side-effects insufficient-credit → `/dashboard?error=insufficient_credit` and KYC-required → `/kyc/start?reason=vm_allocate` redirect cycles. ## Admin surface 5th view in `hero_onboarding_admin` (nav: Overview / Users / Payments / Aggregator / KYC / Allocations). Per-allocation table reading via reqwest from server's `/admin/list-allocations` (no second OSIS handle on admin process per the s2-007 architectural rule). Manual force-release button. ## Acceptance Mirrors prior phases: `cargo check` workspace + `cargo test` (expect ~45/45 unit tests = 39 carry-over + ~5 provisioner + 1 schema CRUD auto for VmAllocation) + `lab build --release --install --workspace` VICTORY 3/3 + `lab infocheck` 3/3 clean + `cargo fmt --check` + `cargo clippy --workspace --all-targets -- -D warnings` clean + `scripts/smoke_vm_allocate.sh` ~25/25 GREEN against `PROVISIONER_DEMO=true` + `IDENFY_DEV_MODE=true`. Smoke checks: allocate-happy / insufficient-credit-402 / KYC-required-303 / pool-exhausted-503 / double-allocate-blocked / release-then-re-allocate / admin-list-allocations / admin-force-release / live-balance-decrement (Stripe-sandbox webhook self-sign seeds the balance — addresses the s2-007 deferred live-balance-decrement assertion). ## Open questions surfaced to Kristof/Emre - **Q#6 (pool size for v0):** how many VMs does ops pre-deploy for launch? Affects demo-day pool exhaustion behavior. v0 PROVISIONER_DEMO uses 3 rows; production VM_POOL_JSON content is operator-defined. - **Q#8 (refunds):** out of v0 scope. `release()` does NOT credit cost back to balance — this is a deliberate Phase 7 scoping choice; refund logic lives in a future ops-tooling phase. Smoke documents this behavior. ## Cross-references - **Meta-issue:** [hero_onboarding#1](https://forge.ourworld.tf/lhumina_code/hero_onboarding/issues/1) §Provisioning + §Phase 6 - **Previous phase:** [hero_onboarding#7 Phase 6 KYC](https://forge.ourworld.tf/lhumina_code/hero_onboarding/issues/7) (gate site pattern at `payment_intent_post`) - **v1 dependency:** [hero_compute#116](https://forge.ourworld.tf/lhumina_code/hero_compute/issues/116) — Mahmoud's deployer-integration gaps (deliberately out of v0 critical path) - **Track A demo-deployer arc:** [hero_cockpit#1](https://forge.ourworld.tf/lhumina_code/hero_cockpit/issues/1) + [hero_os_tfgrid_deployer#1](https://forge.ourworld.tf/lhumina_code/hero_os_tfgrid_deployer/issues/1) — zero file overlap with this issue's scope. - **Decision artifact:** D-17 (locked at session close) --- **Edit 2026-05-21 (post-Track-A-s135 race resolution):** All references to D-16 in this issue body have been re-numbered to D-17. Track A's s135 minted D-16 (cockpit-byok-user-forge-token-namespacing) 40 minutes before this issue was filed; per CLAUDE.md ID-NN race rule (first-minted wins), Track A keeps D-16 and this trait-shape decision lock is now D-17. The decision file is decisions/D-17-provisioner-trait-shape.md in the workspace (the file was renamed by Track A at /stop after the squash-merge race was resolved).
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_onboarding#8
No description provided.