Hero onboarding website + billing backend — full plan #1

Open
opened 2026-05-20 15:54:04 +00:00 by mik-tf · 3 comments
Owner

Hero onboarding website + billing backend — full plan

Source: meeting Kristof + Emre, 2026-05-20.

This issue captures the full plan for the onboarding website (login + payment + dashboard) and the billing backend (per-node forge billing-record repos + centralized aggregator). It is intentionally one meta-issue rather than many small ones — the parts are tightly coupled and the cross-cutting design decisions need a single review surface.

Sub-tasks are listed under "Phased delivery" below. Each phase will get its own follow-up issue once this meta-plan is agreed.


Scope

In scope (this repo):

  • (A) Onboarding website — mycelium-address login, payment (Stripe + ClickPesa, sandbox first then live), dashboard with credit balance + active services + usage breakdown.
  • (B) Billing backend — usage logging on every Hero node, per-node Forge repos that hold the billing records, centralized aggregator service that pulls all node repos and consolidates per-user balances.

Out of scope (tracked elsewhere):

  • (C) Common-services migrationhero_embedder / hero_aibroker / hero_voice / hero_proxy moving from per-user to multi-tenant common hosts attached over Mycelium. Owners: embedder / voice / aibroker / proxy teams.
  • (D) macOS CI runner — needed for full-matrix builds; needs an owner with a Mac kept up. Operational, not code.
  • Auto-deploy of the underlying VM — when a user pays, the VM itself may come from a pre-deployed pool (v0) or be spun up via TFchain (v1+). The provisioner mechanism is abstracted (see "Architecture > Provisioning"); the actual TFchain auto-deploy work is a separate effort.

Pricing model (from meeting notes)

Resource Price
VMs (2 GB each) $10 / month for 5 machines · $20 / month for 10 machines
Hero OS per-instance $0.10 / hour + 10% margin
LLM passthrough (OpenRouter / Groq / etc.) provider list price + 10% margin

Discount ladder (applies to Hero OS hourly + LLM):

  • After 1 week of continuous usage: −50%
  • After 1 month of continuous usage: an additional −50% on top of the first discount

Auth model

Mycelium-address only. Users authenticate by proving control of a Mycelium address. No email/password fallback in v0.

This matches the hero_login design pattern: every Hero context is mycelium-addressable, so the same identity flows through onboarding, billing, and later service install / start-stop admin actions.

Email/password / OAuth providers can be added later if a real user requirement surfaces — kept out of v0 to avoid surface area we don't yet need.


Architecture

Service shape

  • Website itself ships as a Hero service (kind=web per the canonical lab / hero_proc convention). Deployed via lab + hero_proc on a VM. No fancy orchestration in v0 — single instance, redeploy via the existing lab build --upload + lab build --download --install flow.
  • Workspace layout follows the canonical D-10 / D-11 pattern: hero_onboarding_server/ (web), hero_onboarding_admin/ (admin daemon), hero_onboarding/ (CLI launcher), hero_onboarding_schema/ (oschema definitions), 3× service.toml, build.rs for codegen.

Schema

User data + billing state live in an OSIS schema written in oschema (the existing Hero schema definition language under hero_skills/skills/oschema/). build.rs runs oschema_code_generation to produce Rust types + OSIS handlers + an RPC server.

Entities (v0):

  • User — mycelium address (canonical identifier), display name, created-at, last-active.
  • Billing — credit balance, lifetime paid, current discount tier (week / month).
  • UsageRecord — append-only log of "user X consumed Y units of resource Z at timestamp T on node N." This is the atomic unit pushed to per-node Forge repos.
  • PaymentEvent — Stripe / ClickPesa webhook outcome: provider, amount, currency, status, external-ref, applied-to-user.

Schema is the canonical interface; everything else (RPC, UI, billing aggregator) generates from it.

Per-node billing-record repos (the "real" design, no shortcuts)

Each Hero node — embedder host, aibroker host, hero_proxy host, individual user VM host, etc. — runs a hero_proc cron service that pushes its local usage log to a dedicated per-node Forge repo every hour.

  • One Forge repo per node, under a service account (proposed: hero_ops or hero_deploy — naming TBD; see Open questions).
  • Standard naming so the aggregator can enumerate without manual config (proposed: hero_ops/billing-<node-name> — naming TBD).
  • Cron pushes use the existing lab repo push mechanic (or equivalent).
  • Records are append-only TSV / JSONL files committed in chronological chunks (one file per hour, or rotated by size — TBD in Phase 4 design).

Hash-resume semantics (the "HARD part" called out in the meeting):

  • The aggregator tracks the last-consumed commit hash per node-repo.
  • On each poll: git log <last-hash>..HEAD enumerates new commits → new usage records.
  • Successful aggregation advances the hash; failed aggregation does NOT advance — the next poll retries from the same hash.
  • Explicit success / failure modes:
    • Success: every new record consumed, balance updated, hash advanced.
    • Failure: any parse error, any consistency violation (e.g. negative balance attempt without explicit refund), or any duplicate idempotency-key collision → hash stays put, alert emitted, no partial application.
  • Double-charge prevention: every UsageRecord carries an idempotency key (proposed: (node_id, local_seq, timestamp)). The aggregator rejects duplicates.

Centralized aggregator

A separate Hero service (hero_onboarding_aggregator/ — naming TBD) that:

  1. Enumerates known node-repos (configured list or service-account-repo-pattern discovery).
  2. For each repo: git pull → walk new commits → parse new UsageRecord entries → apply to the corresponding user's Billing row in the central OSIS schema.
  3. Emits aggregation metrics (records consumed, balances updated, errors).
  4. Runs on the same host as the onboarding website in v0 (separate host later when scale demands).

Provisioning (agnostic, plug-in)

Define a Provisioner trait / interface. v0 ships one impl + room for more:

trait Provisioner {
    fn allocate(&self, user: &User, sku: ResourceSku) -> Result<Allocation, ProvisionError>;
    fn release(&self, allocation: &Allocation) -> Result<(), ProvisionError>;
    fn status(&self, allocation: &Allocation) -> Result<AllocationStatus, ProvisionError>;
}
  • v0 impl: PoolAssignmentProvisioner — ops pre-deploys a pool of VMs (the same way the existing herodemo VM was deployed); the website assigns an unallocated VM from the pool when a user pays.
  • v1+ impl: TfchainAutoDeployProvisioner — full TFchain contract creation on demand, no pre-deployed pool. Separate effort, separate session.

This way the website's payment-success flow and the underlying VM acquisition flow are decoupled; we don't lock in a design that's wrong for v1.

Reused in-house components

A neighboring in-house product has already shipped, end-to-end-tested, and proven in production all three external integrations Phase 2 / 3 / 6 need. We lift, we don't reinvent.

  • Stripe + ClickPesa top-up flows — wallet top-up paths covering checkout, webhook signature verification, idempotent payment recording, currency handling, refund mechanics. Implementation includes provider abstractions, SDK models, end-to-end Playwright coverage of the happy path + retry-on-failure + duplicate-webhook protection. Lift posture: SDK models + provider abstractions + webhook handlers transfer cleanly (same wire contracts as Stripe / ClickPesa public APIs). UI components are Dioxus-based and get rewritten into the hero_website_framework Tera-template idiom — the business logic doesn't change, only the presentation layer.
  • Idenfy KYC integration — production-grade flow covering session creation, browser redirect, callback handling, retry-on-failure, KYC-reset-and-re-create, and tier-based access gating. Already-shipped artifacts:
    • kyc.oschema schema — written in the same canonical Hero schema language we're using for hero_onboarding. Direct copy possible at the schema layer; zero rewrite.
    • Backend crate with OpenRPC interface (openrpc.json + rpc.rs), code-generated types (types_generated.rs), and core logic cleanly separated into a core/ module. Transfers cleanly modulo renaming.
    • Frontend Dioxus components (step_kyc_terms.rs + multi-step wizard integration) — same UI impedance-mismatch as Stripe / ClickPesa; logic transfers, UI rewrites.
    • 6+ Playwright e2e scenarios including KYC session creation, retry on failed verification, browser-flow happy path, full reset → re-create → complete arc.
    • Provider: Idenfy. Same vendor account / tenant credentials can be reused (subject to account-level confirmation) or fresh Idenfy credentials provisioned for the Hero tenant.
  • oschema + oschema_code_generation skills (under hero_skills/skills/oschema/) — canonical Hero schema definition language. No separate template to wait on.
  • hero_website_frameworkkind=web scaffold with pages, blog, auth, admin.
  • hero_proc cron — built-in scheduled-job mechanism for the hourly billing-record push.
  • lab repo push — for the cron's actual git push to each node-repo.

Phased delivery

Each phase will get its own follow-up issue linked back to this meta-issue once this plan is agreed.

Phase 1 — Schema scaffold + skeleton web service

  • Workspace layout (Cargo.toml, 3 crates, 3 service.toml, build.rs).
  • hero_onboarding_schema/User, Billing, UsageRecord, PaymentEvent in oschema.
  • hero_onboarding_server/kind=web skeleton, mycelium-address login, dashboard (balance + placeholder for active services), payment stubbed (button → 200 OK → fake credit applied).
  • hero_onboarding_admin/kind=admin daemon skeleton.
  • README + service.toml entries follow canonical D-10 / D-11 shape.
  • Acceptance: lab build --release --install clean; lab infocheck clean; lab service hero_onboarding_server --start opens the page; mycelium-address login + dashboard render.

Phase 2 — Stripe sandbox integration

  • Lift Stripe SDK + provider components from the in-house wallet top-up flow.
  • Sandbox keys in hero_proc secrets context (per existing META env convention).
  • Top-up flow: user clicks pay → Stripe Checkout sandbox → webhook → PaymentEvent recorded → Billing.credit_balance incremented.
  • Acceptance: end-to-end sandbox top-up applies a credit visible on the dashboard; webhook idempotency tested (replay does NOT double-credit).

Phase 3 — ClickPesa sandbox integration

  • Symmetric to Phase 2, sibling provider.
  • Acceptance: same as Phase 2 against ClickPesa sandbox.

Phase 4 — Per-node billing-record push (hero_proc cron + lab repo push)

  • Define the UsageRecord wire format (TSV or JSONL; TBD).
  • Implement a small library / nu module a Hero node embeds to write local usage records.
  • hero_proc cron service that, every hour, commits the new records and pushes to the node's Forge repo.
  • Acceptance: a test node generates synthetic usage records; cron pushes them on schedule; the per-node Forge repo shows hourly commits.

Phase 5 — Centralized aggregator + hash-resume idempotency

  • hero_onboarding_aggregator/ service.
  • Hash-resume logic: per-node last-consumed-commit tracking + safe retry semantics.
  • Idempotency-key enforcement.
  • Failure modes (parse error / consistency violation / dup key) → alert + no partial application.
  • Acceptance: aggregator consumes records from a real test node-repo; balance updates appear in the central schema; failure injection (corrupt a commit) → aggregation halts cleanly on that repo, others continue.

Phase 6 — Production keys, Idenfy KYC, pool-assignment Provisioner, live wiring

  • Stripe + ClickPesa production keys (after sandbox cycles are clean).
  • Idenfy KYC integration — lift from the in-house implementation (see "Reused in-house components" above). Direct copy of kyc.oschema into hero_onboarding_schema/; lift the backend crate (rename + adapt to hero_onboarding's OSIS dispatcher); rewrite the Dioxus wizard step into a Tera template + small JS for the Idenfy browser redirect; reuse the 6+ Playwright e2e scenarios as the acceptance gate. Decision: same Idenfy account or fresh tenant credentials.
  • PoolAssignmentProvisioner impl: ops pre-deploys VMs, the website assigns one on payment success.
  • End-to-end happy-path: user logs in → completes KYC → tops up → an actual VM is assigned + provisioned with their Hero context.

Optional rescope: Idenfy KYC could move earlier (e.g. between Phase 3 and Phase 4) if KYC is a hard gate before any payment — that decision is one of the open questions below. Phase 6 keeps it as the default to land it after sandbox payment flows are clean.


Open questions for Kristof + Emre

  1. Service account name for the per-node billing-record repos: hero_ops / hero_deploy / hero_billing / something else?
  2. Per-node repo naming convention: hero_ops/billing-<node-name> / hero_ops/<node-name>-billing / something else? The aggregator's discoverability depends on this being stable.
  3. UsageRecord wire format: TSV / JSONL / oschema-OTOML? Trade-off: TSV is easiest for git diff, JSONL is easiest for tooling, OTOML matches the rest of the Hero stack.
  4. Idenfy KYC tenancy: reuse the same Idenfy account / tenant credentials as the in-house wallet product, or provision fresh Idenfy credentials for the Hero tenant? Both are technically viable; question is account / billing / branding.
  5. KYC gating order: is KYC a hard gate before any payment (move to Phase 4 or earlier), or after the first top-up (stays in Phase 6)? Affects flow shape.
  6. Pool size for v0: how many VMs does ops pre-deploy for the launch? (Phase 6 question.)
  7. Discount ladder mechanics: is "1 week of continuous usage" measured wall-clock since first payment, or total active hours? Edge case: user pays, doesn't use for 8 days, then uses heavily — do they get the −50%?
  8. Refunds: out of v0 entirely (manual ops intervention), or do we wire a refund path through Stripe / ClickPesa from day 1?
  9. Multi-currency: Stripe defaults to USD; ClickPesa is regional. Do we display unified USD balances or per-currency balances?

Out of scope / follow-up issues to file

  • Common-services migration (embedder / aibroker / voice / proxy → multi-tenant common hosts attached over Mycelium) — separate META.
  • macOS CI runner — operational, needs Mac-equipped owner.
  • TFchain auto-deploy of underlying VMs — replaces the PoolAssignmentProvisioner impl with TfchainAutoDeployProvisioner impl; separate session post-v0.
  • Multi-region / HA aggregator — single instance is fine until scale demands otherwise.

  • hero_website_frameworkkind=web skeleton.
  • hero_skills/skills/oschema/ + oschema_code_generation — schema → codegen pipeline.
  • hero_proc — supervision + cron + secrets context.
  • hero_router — service entry / discovery / MCP gateway (the onboarding website registers behind it).
  • hero_login pattern (when / if it exists as a separate crate; otherwise wire mycelium-address auth directly using the documented context + claim format).
  • lab — build / install / publish / repo-push orchestrator used by Phase 4's cron.
# Hero onboarding website + billing backend — full plan Source: meeting Kristof + Emre, 2026-05-20. This issue captures the full plan for the onboarding website (login + payment + dashboard) and the billing backend (per-node forge billing-record repos + centralized aggregator). It is intentionally one meta-issue rather than many small ones — the parts are tightly coupled and the cross-cutting design decisions need a single review surface. Sub-tasks are listed under "Phased delivery" below. Each phase will get its own follow-up issue once this meta-plan is agreed. --- ## Scope **In scope (this repo):** - **(A) Onboarding website** — mycelium-address login, payment (Stripe + ClickPesa, sandbox first then live), dashboard with credit balance + active services + usage breakdown. - **(B) Billing backend** — usage logging on every Hero node, per-node Forge repos that hold the billing records, centralized aggregator service that pulls all node repos and consolidates per-user balances. **Out of scope (tracked elsewhere):** - **(C) Common-services migration** — `hero_embedder` / `hero_aibroker` / `hero_voice` / `hero_proxy` moving from per-user to multi-tenant common hosts attached over Mycelium. Owners: embedder / voice / aibroker / proxy teams. - **(D) macOS CI runner** — needed for full-matrix builds; needs an owner with a Mac kept up. Operational, not code. - **Auto-deploy of the underlying VM** — when a user pays, the VM itself may come from a pre-deployed pool (v0) or be spun up via TFchain (v1+). The provisioner mechanism is abstracted (see "Architecture > Provisioning"); the actual TFchain auto-deploy work is a separate effort. --- ## Pricing model (from meeting notes) | Resource | Price | |---|---| | VMs (2 GB each) | $10 / month for 5 machines · $20 / month for 10 machines | | Hero OS per-instance | $0.10 / hour + 10% margin | | LLM passthrough (OpenRouter / Groq / etc.) | provider list price + 10% margin | **Discount ladder** (applies to Hero OS hourly + LLM): - After 1 week of continuous usage: −50% - After 1 month of continuous usage: an additional −50% on top of the first discount --- ## Auth model **Mycelium-address only.** Users authenticate by proving control of a Mycelium address. No email/password fallback in v0. This matches the `hero_login` design pattern: every Hero context is mycelium-addressable, so the same identity flows through onboarding, billing, and later service install / start-stop admin actions. Email/password / OAuth providers can be added later if a real user requirement surfaces — kept out of v0 to avoid surface area we don't yet need. --- ## Architecture ### Service shape - **Website itself** ships as a Hero service (`kind=web` per the canonical lab / hero_proc convention). Deployed via lab + hero_proc on a VM. No fancy orchestration in v0 — single instance, redeploy via the existing `lab build --upload` + `lab build --download --install` flow. - **Workspace layout** follows the canonical D-10 / D-11 pattern: `hero_onboarding_server/` (web), `hero_onboarding_admin/` (admin daemon), `hero_onboarding/` (CLI launcher), `hero_onboarding_schema/` (oschema definitions), 3× `service.toml`, `build.rs` for codegen. ### Schema User data + billing state live in an OSIS schema written in **oschema** (the existing Hero schema definition language under `hero_skills/skills/oschema/`). `build.rs` runs `oschema_code_generation` to produce Rust types + OSIS handlers + an RPC server. **Entities (v0):** - `User` — mycelium address (canonical identifier), display name, created-at, last-active. - `Billing` — credit balance, lifetime paid, current discount tier (week / month). - `UsageRecord` — append-only log of "user X consumed Y units of resource Z at timestamp T on node N." This is the atomic unit pushed to per-node Forge repos. - `PaymentEvent` — Stripe / ClickPesa webhook outcome: provider, amount, currency, status, external-ref, applied-to-user. Schema is the canonical interface; everything else (RPC, UI, billing aggregator) generates from it. ### Per-node billing-record repos (the "real" design, no shortcuts) Each Hero node — embedder host, aibroker host, hero_proxy host, individual user VM host, etc. — runs a `hero_proc` cron service that pushes its local usage log to a dedicated per-node Forge repo every hour. - One Forge repo per node, under a service account (proposed: `hero_ops` or `hero_deploy` — naming TBD; see Open questions). - Standard naming so the aggregator can enumerate without manual config (proposed: `hero_ops/billing-<node-name>` — naming TBD). - Cron pushes use the existing `lab repo push` mechanic (or equivalent). - Records are append-only TSV / JSONL files committed in chronological chunks (one file per hour, or rotated by size — TBD in Phase 4 design). **Hash-resume semantics** (the "HARD part" called out in the meeting): - The aggregator tracks the last-consumed commit hash per node-repo. - On each poll: `git log <last-hash>..HEAD` enumerates new commits → new usage records. - Successful aggregation advances the hash; failed aggregation does NOT advance — the next poll retries from the same hash. - **Explicit success / failure modes:** - Success: every new record consumed, balance updated, hash advanced. - Failure: any parse error, any consistency violation (e.g. negative balance attempt without explicit refund), or any duplicate idempotency-key collision → hash stays put, alert emitted, no partial application. - Double-charge prevention: every `UsageRecord` carries an idempotency key (proposed: `(node_id, local_seq, timestamp)`). The aggregator rejects duplicates. ### Centralized aggregator A separate Hero service (`hero_onboarding_aggregator/` — naming TBD) that: 1. Enumerates known node-repos (configured list or service-account-repo-pattern discovery). 2. For each repo: `git pull` → walk new commits → parse new `UsageRecord` entries → apply to the corresponding user's `Billing` row in the central OSIS schema. 3. Emits aggregation metrics (records consumed, balances updated, errors). 4. Runs on the same host as the onboarding website in v0 (separate host later when scale demands). ### Provisioning (agnostic, plug-in) Define a `Provisioner` trait / interface. v0 ships **one impl** + room for more: ``` trait Provisioner { fn allocate(&self, user: &User, sku: ResourceSku) -> Result<Allocation, ProvisionError>; fn release(&self, allocation: &Allocation) -> Result<(), ProvisionError>; fn status(&self, allocation: &Allocation) -> Result<AllocationStatus, ProvisionError>; } ``` - **v0 impl: `PoolAssignmentProvisioner`** — ops pre-deploys a pool of VMs (the same way the existing herodemo VM was deployed); the website assigns an unallocated VM from the pool when a user pays. - **v1+ impl: `TfchainAutoDeployProvisioner`** — full TFchain contract creation on demand, no pre-deployed pool. Separate effort, separate session. This way the website's payment-success flow and the underlying VM acquisition flow are decoupled; we don't lock in a design that's wrong for v1. ### Reused in-house components A neighboring in-house product has already shipped, end-to-end-tested, and proven in production all three external integrations Phase 2 / 3 / 6 need. We lift, we don't reinvent. - **Stripe + ClickPesa top-up flows** — wallet top-up paths covering checkout, webhook signature verification, idempotent payment recording, currency handling, refund mechanics. Implementation includes provider abstractions, SDK models, end-to-end Playwright coverage of the happy path + retry-on-failure + duplicate-webhook protection. **Lift posture:** SDK models + provider abstractions + webhook handlers transfer cleanly (same wire contracts as Stripe / ClickPesa public APIs). UI components are Dioxus-based and get rewritten into the hero_website_framework Tera-template idiom — the business logic doesn't change, only the presentation layer. - **Idenfy KYC integration** — production-grade flow covering session creation, browser redirect, callback handling, retry-on-failure, KYC-reset-and-re-create, and tier-based access gating. Already-shipped artifacts: - **`kyc.oschema` schema** — written in the same canonical Hero schema language we're using for `hero_onboarding`. **Direct copy possible at the schema layer; zero rewrite.** - **Backend crate** with OpenRPC interface (`openrpc.json` + `rpc.rs`), code-generated types (`types_generated.rs`), and core logic cleanly separated into a `core/` module. Transfers cleanly modulo renaming. - **Frontend Dioxus components** (`step_kyc_terms.rs` + multi-step wizard integration) — same UI impedance-mismatch as Stripe / ClickPesa; logic transfers, UI rewrites. - **6+ Playwright e2e scenarios** including KYC session creation, retry on failed verification, browser-flow happy path, full reset → re-create → complete arc. - **Provider: Idenfy.** Same vendor account / tenant credentials can be reused (subject to account-level confirmation) or fresh Idenfy credentials provisioned for the Hero tenant. - **oschema + oschema_code_generation skills** (under `hero_skills/skills/oschema/`) — canonical Hero schema definition language. No separate template to wait on. - **hero_website_framework** — `kind=web` scaffold with pages, blog, auth, admin. - **hero_proc cron** — built-in scheduled-job mechanism for the hourly billing-record push. - **lab repo push** — for the cron's actual git push to each node-repo. --- ## Phased delivery Each phase will get its own follow-up issue linked back to this meta-issue once this plan is agreed. ### Phase 1 — Schema scaffold + skeleton web service - Workspace layout (`Cargo.toml`, 3 crates, 3 `service.toml`, `build.rs`). - `hero_onboarding_schema/` — `User`, `Billing`, `UsageRecord`, `PaymentEvent` in oschema. - `hero_onboarding_server/` — `kind=web` skeleton, mycelium-address login, dashboard (balance + placeholder for active services), payment **stubbed** (button → 200 OK → fake credit applied). - `hero_onboarding_admin/` — `kind=admin` daemon skeleton. - README + service.toml entries follow canonical D-10 / D-11 shape. - Acceptance: `lab build --release --install` clean; `lab infocheck` clean; `lab service hero_onboarding_server --start` opens the page; mycelium-address login + dashboard render. ### Phase 2 — Stripe sandbox integration - Lift Stripe SDK + provider components from the in-house wallet top-up flow. - Sandbox keys in `hero_proc` secrets context (per existing META env convention). - Top-up flow: user clicks pay → Stripe Checkout sandbox → webhook → `PaymentEvent` recorded → `Billing.credit_balance` incremented. - Acceptance: end-to-end sandbox top-up applies a credit visible on the dashboard; webhook idempotency tested (replay does NOT double-credit). ### Phase 3 — ClickPesa sandbox integration - Symmetric to Phase 2, sibling provider. - Acceptance: same as Phase 2 against ClickPesa sandbox. ### Phase 4 — Per-node billing-record push (hero_proc cron + lab repo push) - Define the `UsageRecord` wire format (TSV or JSONL; TBD). - Implement a small library / nu module a Hero node embeds to write local usage records. - `hero_proc` cron service that, every hour, commits the new records and pushes to the node's Forge repo. - Acceptance: a test node generates synthetic usage records; cron pushes them on schedule; the per-node Forge repo shows hourly commits. ### Phase 5 — Centralized aggregator + hash-resume idempotency - `hero_onboarding_aggregator/` service. - Hash-resume logic: per-node last-consumed-commit tracking + safe retry semantics. - Idempotency-key enforcement. - Failure modes (parse error / consistency violation / dup key) → alert + no partial application. - Acceptance: aggregator consumes records from a real test node-repo; balance updates appear in the central schema; failure injection (corrupt a commit) → aggregation halts cleanly on that repo, others continue. ### Phase 6 — Production keys, Idenfy KYC, pool-assignment Provisioner, live wiring - Stripe + ClickPesa production keys (after sandbox cycles are clean). - **Idenfy KYC integration** — lift from the in-house implementation (see "Reused in-house components" above). Direct copy of `kyc.oschema` into `hero_onboarding_schema/`; lift the backend crate (rename + adapt to hero_onboarding's OSIS dispatcher); rewrite the Dioxus wizard step into a Tera template + small JS for the Idenfy browser redirect; reuse the 6+ Playwright e2e scenarios as the acceptance gate. Decision: same Idenfy account or fresh tenant credentials. - `PoolAssignmentProvisioner` impl: ops pre-deploys VMs, the website assigns one on payment success. - End-to-end happy-path: user logs in → completes KYC → tops up → an actual VM is assigned + provisioned with their Hero context. **Optional rescope:** Idenfy KYC could move earlier (e.g. between Phase 3 and Phase 4) if KYC is a hard gate before any payment — that decision is one of the open questions below. Phase 6 keeps it as the default to land it after sandbox payment flows are clean. --- ## Open questions for Kristof + Emre 1. **Service account name** for the per-node billing-record repos: `hero_ops` / `hero_deploy` / `hero_billing` / something else? 2. **Per-node repo naming convention**: `hero_ops/billing-<node-name>` / `hero_ops/<node-name>-billing` / something else? The aggregator's discoverability depends on this being stable. 3. **`UsageRecord` wire format**: TSV / JSONL / oschema-OTOML? Trade-off: TSV is easiest for `git diff`, JSONL is easiest for tooling, OTOML matches the rest of the Hero stack. 4. **Idenfy KYC tenancy**: reuse the same Idenfy account / tenant credentials as the in-house wallet product, or provision fresh Idenfy credentials for the Hero tenant? Both are technically viable; question is account / billing / branding. 5. **KYC gating order**: is KYC a hard gate **before** any payment (move to Phase 4 or earlier), or **after** the first top-up (stays in Phase 6)? Affects flow shape. 6. **Pool size for v0**: how many VMs does ops pre-deploy for the launch? (Phase 6 question.) 7. **Discount ladder mechanics**: is "1 week of continuous usage" measured wall-clock since first payment, or total active hours? Edge case: user pays, doesn't use for 8 days, then uses heavily — do they get the −50%? 8. **Refunds**: out of v0 entirely (manual ops intervention), or do we wire a refund path through Stripe / ClickPesa from day 1? 9. **Multi-currency**: Stripe defaults to USD; ClickPesa is regional. Do we display unified USD balances or per-currency balances? --- ## Out of scope / follow-up issues to file - Common-services migration (embedder / aibroker / voice / proxy → multi-tenant common hosts attached over Mycelium) — separate META. - macOS CI runner — operational, needs Mac-equipped owner. - TFchain auto-deploy of underlying VMs — replaces the `PoolAssignmentProvisioner` impl with `TfchainAutoDeployProvisioner` impl; separate session post-v0. - Multi-region / HA aggregator — single instance is fine until scale demands otherwise. --- ## Related repos / patterns - `hero_website_framework` — `kind=web` skeleton. - `hero_skills/skills/oschema/` + `oschema_code_generation` — schema → codegen pipeline. - `hero_proc` — supervision + cron + secrets context. - `hero_router` — service entry / discovery / MCP gateway (the onboarding website registers behind it). - `hero_login` pattern (when / if it exists as a separate crate; otherwise wire mycelium-address auth directly using the documented context + claim format). - `lab` — build / install / publish / repo-push orchestrator used by Phase 4's cron.
Author
Owner

Filed on the free-demo arc side: home#236 — META Email / notifications strategy, locked at D-20 (decisions/D-20-email-provider-sendgrid.md).

Decision: SendGrid for all transactional emails originated by either arc. No Resend, no self-hosted SMTP. Provider abstraction (EmailSender trait) wraps the SendGrid impl.

Acceptance criteria for the first hero_onboarding session that ships an email-sending code path are listed in home#236 — pick sender domain, source API key + DNS records operationally, implement the trait + SendGrid impl, smoke-test deliverability, append the chosen domain to D-20 as a follow-up note.

No immediate action — the rule applies whenever an hero_onboarding phase first wires transactional email.

## Cross-link: email / notifications strategy (cross-arc) Filed on the free-demo arc side: [home#236 — META Email / notifications strategy](https://forge.ourworld.tf/lhumina_code/home/issues/236), locked at D-20 (`decisions/D-20-email-provider-sendgrid.md`). Decision: **SendGrid** for all transactional emails originated by either arc. No Resend, no self-hosted SMTP. Provider abstraction (`EmailSender` trait) wraps the SendGrid impl. Acceptance criteria for the first hero_onboarding session that ships an email-sending code path are listed in home#236 — pick sender domain, source API key + DNS records operationally, implement the trait + SendGrid impl, smoke-test deliverability, append the chosen domain to D-20 as a follow-up note. No immediate action — the rule applies whenever an `hero_onboarding` phase first wires transactional email.
Author
Owner

Added a cross-arc overview doc at home/docs/channels/free-and-paid.md (commit bfbf552 on the home repo).

Audience: engineers + stakeholders. Walks through the paid commercial product (this issue's scope) and the free testing channel (home#235) end-to-end — four UX flows, shared substrate, where the channels touch each other, and explicit out-of-scope per channel.

Not a replacement for this issue. This issue stays the engineering tracker — per-phase status, design decision lockfile, scope splits. The new doc is the cross-arc reader's-eye view this issue intentionally doesn't try to be.

## Cross-link: two-channels overview Added a cross-arc overview doc at [home/docs/channels/free-and-paid.md](https://forge.ourworld.tf/lhumina_code/home/src/branch/development/docs/channels/free-and-paid.md) (commit [`bfbf552`](https://forge.ourworld.tf/lhumina_code/home/commit/bfbf552) on the home repo). Audience: engineers + stakeholders. Walks through the paid commercial product (this issue's scope) and the free testing channel ([home#235](https://forge.ourworld.tf/lhumina_code/home/issues/235)) end-to-end — four UX flows, shared substrate, where the channels touch each other, and explicit out-of-scope per channel. Not a replacement for this issue. This issue stays the engineering tracker — per-phase status, design decision lockfile, scope splits. The new doc is the cross-arc reader's-eye view this issue intentionally doesn't try to be.
Author
Owner

Q#7 (discount ladder mechanics) — LOCKED pre-session for s2-014 Phase 12 (2026-05-21).

Documenting here so the meta-issue open-Qs section reflects the lock without needing a body edit. Phase 12 implementation (next session) builds against these exact answers.

Lock summary

(a) "Continuous usage" measurement = wall-clock since Billing.first_paid_at. No inactivity reset in v0.

  • Simplest to ship; easiest to explain in support tickets ("you paid on X, week-tier kicks in on X+7").
  • Schema slot Billing.last_active_at: otime reserved (cron will write it from the start, but doesn't consume it for tier promotion in v0).
  • Revisability: flip to inactivity-reset (e.g. >7 days no usage resets the clock) → +15 LOC in the daily cron + a last_active_at write at vm_allocate_post + the aggregator decrement site.

(b) Discount applicability = hero_os_hourly + llm ONLY; VM cost NEVER discounted.

  • Matches the meta-issue's literal interpretation ("applies to Hero OS hourly + LLM passthrough").
  • Cost-recovery floor logic: VM allocation is the operator's TFGrid pass-through ($10/month = TFGrid cost recovery). Discounting VM cost would push the user below operator cost.
  • Schema addition: new ResourceCategory enum on UsageRecord.applies_to with variants hero_os_hourly | llm | vm; default hero_os_hourly for back-compat with the existing rows from Phase 5 aggregator.
  • Revisability: flip to all-categories-discounted → -20 LOC (drop the applies_to enum + use a single multiplier).

(c) Stacking math = multiplicative.

  • Canonical multiplier table: none = ×1.0, week = ×0.5, month = ×0.25 (total 75% off at month-tier).
  • Only valid parse of meta-issue's "additional -50% on top after 1 month" — additive cap would be a single-tier scheme dressed up as a ladder.
  • Implementation: 2-tuple match against (record.applies_to, billing.discount_tier) at the usage_aggregate.rs:367 decrement site.
  • Revisability: flip to additive cap (max 50% off total) → 1-character change (0.250.5 at the Month arm).

Canonical multiplier table (s2-014 ships this verbatim)

let multiplier = match (record.applies_to, billing.discount_tier) {
    (ResourceCategory::Vm, _)         => 1.0,    // (b): VM cost never discounted
    (_, DiscountTier::None)           => 1.0,
    (_, DiscountTier::Week)           => 0.5,
    (_, DiscountTier::Month)          => 0.25,   // (c): multiplicative stacking
};
let effective_cost = (record.cost_cents as f64 * multiplier).round() as i64;
billing.credit_balance_cents = billing.credit_balance_cents.saturating_sub(effective_cost);

What lands at s2-014 Phase 12

  • Schema: Billing.first_paid_at: otime + Billing.last_active_at: otime (additive); UsageRecord.applies_to: ResourceCategory (additive); VmAllocation.plan_id: str + VmAllocation.plan_allocation_sid: str (additive, for multi-VM grouping).
  • New cron action hero_onboarding_discount_ladder_cron (daily): walks Billing rows; promotes discount_tier when now - first_paid_at crosses the 7-day / 30-day thresholds; writes last_active_at opportunistically from the aggregator's per-user touch.
  • New Plan config (NOT a schema rootobject — env-var JSON loaded at startup): defaults {"5vms": {cost_cents: 1000, duration_hours: 720, vm_count: 5}, "10vms": {cost_cents: 2000, duration_hours: 720, vm_count: 10}} matching the meta-issue pricing table.
  • vm_allocate_post: multi-VM allocate (vm_count rows in a single transaction sharing plan_allocation_sid); atomicity TBD at s2-014 design pass (single OSIS write batch vs per-VM with rollback).
  • Discount-aware aggregator decrement using the multiplier table above.
  • Dashboard plan-selector UI + admin /allocations per-plan grouping.
  • New smoke scripts/smoke_discount_ladder.sh (~20 checks): seed billing with first_paid_at 8 days ago + add hero_os_hourly UsageRecord + run aggregate → assert balance decremented by 50% of raw cost; 31-day tenure → assert 75% discount; VM allocation NOT discounted; week-1 user NOT discounted.

D-20 expected

s2-014 /stop2 mints D-20 locking the three Q#7 answers above + the Plan registry shape + multi-VM allocation atomicity semantics.

This comment supersedes the previous Q#7 phrasing in the meta-issue body's open-Qs section as the canonical Phase 12 input.

Signed-by: mik-tf mik-tf@noreply.invalid

**Q#7 (discount ladder mechanics) — LOCKED pre-session for s2-014 Phase 12** (2026-05-21). Documenting here so the meta-issue open-Qs section reflects the lock without needing a body edit. Phase 12 implementation (next session) builds against these exact answers. ## Lock summary **(a) "Continuous usage" measurement** = wall-clock since `Billing.first_paid_at`. No inactivity reset in v0. - Simplest to ship; easiest to explain in support tickets ("you paid on X, week-tier kicks in on X+7"). - Schema slot `Billing.last_active_at: otime` reserved (cron will write it from the start, but doesn't consume it for tier promotion in v0). - **Revisability**: flip to inactivity-reset (e.g. >7 days no usage resets the clock) → +15 LOC in the daily cron + a `last_active_at` write at `vm_allocate_post` + the aggregator decrement site. **(b) Discount applicability** = `hero_os_hourly` + `llm` ONLY; VM cost NEVER discounted. - Matches the meta-issue's literal interpretation ("applies to Hero OS hourly + LLM passthrough"). - Cost-recovery floor logic: VM allocation is the operator's TFGrid pass-through ($10/month = TFGrid cost recovery). Discounting VM cost would push the user below operator cost. - Schema addition: new `ResourceCategory` enum on `UsageRecord.applies_to` with variants `hero_os_hourly | llm | vm`; default `hero_os_hourly` for back-compat with the existing rows from Phase 5 aggregator. - **Revisability**: flip to all-categories-discounted → -20 LOC (drop the `applies_to` enum + use a single multiplier). **(c) Stacking math** = multiplicative. - Canonical multiplier table: `none` = ×1.0, `week` = ×0.5, `month` = ×0.25 (total 75% off at month-tier). - Only valid parse of meta-issue's "additional -50% on top after 1 month" — additive cap would be a single-tier scheme dressed up as a ladder. - Implementation: 2-tuple match against `(record.applies_to, billing.discount_tier)` at the `usage_aggregate.rs:367` decrement site. - **Revisability**: flip to additive cap (max 50% off total) → 1-character change (`0.25` → `0.5` at the `Month` arm). ## Canonical multiplier table (s2-014 ships this verbatim) ```rust let multiplier = match (record.applies_to, billing.discount_tier) { (ResourceCategory::Vm, _) => 1.0, // (b): VM cost never discounted (_, DiscountTier::None) => 1.0, (_, DiscountTier::Week) => 0.5, (_, DiscountTier::Month) => 0.25, // (c): multiplicative stacking }; let effective_cost = (record.cost_cents as f64 * multiplier).round() as i64; billing.credit_balance_cents = billing.credit_balance_cents.saturating_sub(effective_cost); ``` ## What lands at s2-014 Phase 12 - Schema: `Billing.first_paid_at: otime` + `Billing.last_active_at: otime` (additive); `UsageRecord.applies_to: ResourceCategory` (additive); `VmAllocation.plan_id: str` + `VmAllocation.plan_allocation_sid: str` (additive, for multi-VM grouping). - New cron action `hero_onboarding_discount_ladder_cron` (daily): walks Billing rows; promotes `discount_tier` when `now - first_paid_at` crosses the 7-day / 30-day thresholds; writes `last_active_at` opportunistically from the aggregator's per-user touch. - New `Plan` config (NOT a schema rootobject — env-var JSON loaded at startup): defaults `{"5vms": {cost_cents: 1000, duration_hours: 720, vm_count: 5}, "10vms": {cost_cents: 2000, duration_hours: 720, vm_count: 10}}` matching the meta-issue pricing table. - `vm_allocate_post`: multi-VM allocate (`vm_count` rows in a single transaction sharing `plan_allocation_sid`); atomicity TBD at s2-014 design pass (single OSIS write batch vs per-VM with rollback). - Discount-aware aggregator decrement using the multiplier table above. - Dashboard plan-selector UI + admin /allocations per-plan grouping. - New smoke `scripts/smoke_discount_ladder.sh` (~20 checks): seed billing with `first_paid_at` 8 days ago + add hero_os_hourly UsageRecord + run aggregate → assert balance decremented by 50% of raw cost; 31-day tenure → assert 75% discount; VM allocation NOT discounted; week-1 user NOT discounted. ## D-20 expected s2-014 /stop2 mints **D-20** locking the three Q#7 answers above + the Plan registry shape + multi-VM allocation atomicity semantics. This comment supersedes the previous Q#7 phrasing in the meta-issue body's open-Qs section as the canonical Phase 12 input. Signed-by: mik-tf <mik-tf@noreply.invalid>
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_onboarding#1
No description provided.