Phase 5 — centralized aggregator + admin surface #6

Open
opened 2026-05-21 11:23:37 +00:00 by mik-tf · 0 comments
Owner

Consumer side of the billing pipeline. Reads what Phase 4 (#5) writes to lhumina_code/hero_onboarding_billing and applies each UsageRecord to the central Billing.credit_balance_cents. Also lifts the placeholder admin page into the real Phase 5 admin surface (read-only views).

Tracking parent: #1 §Centralized aggregator + §Admin.

Wire-format + repo-layout contract is locked by D-14: single repo lhumina_code/hero_onboarding_billing, per-node subdir nodes/<node-name>/usage_<YYYY-MM-DD-HH>.otoml, OTOML wire format with UsageBatch { records: Vec<UsageRecord> } wrapper.

Scope

Aggregator core

  • Two new oschema rootobjects:
    • BillingAggregatorState — per node-subdir: last_commit_sha, last_run_at, records_consumed_total. @index on node_name.
    • ConsumedRecordIndex — per applied UsageRecord: idempotency_key @index, applied_at, cost_cents, user_sid, node_name. Drives the dedup of already-applied records on commit-replay / retry.
  • crates/hero_onboarding_server/src/usage_aggregate.rs (~350 LOC + unit tests):
    • AggregateConfig (mirror of PushConfig — same forge token + repo identity)
    • aggregate_unconsumed_records(osis, http, cfg) enumerates nodes/ in the billing repo, then for each node-subdir: GET /repos/.../commits?path=nodes/<node>&sha=main&limit=1 → if the latest sha equals stored last_commit_sha, skip; otherwise list nodes/<node>/ contents, GET each usage_*.otoml, parse UsageBatch, for each record: check ConsumedRecordIndex by idempotency_key → if absent, decrement Billing.credit_balance_cents by cost_cents and append ConsumedRecordIndex row; finally update BillingAggregatorState.
    • Per-node failure logs at warn and continues to the next node (mirrors push_unsent_records error semantics).
    • Balance may go negative — credit-gating happens at the consume-request layer (Phase 6+ when VM allocation lands), not at aggregate time.

Server + admin wiring

  • POST /admin/aggregate-now (admin-secret-gated) — mirror of /admin/push-usage-now. Returns aggregate summary JSON.
  • GET /admin/list-users, GET /admin/list-payments, GET /admin/aggregator-state, GET /admin/list-billing (admin-secret-gated) — read-only JSON feeds for the admin UI.
  • hero_onboarding_admin aggregate --once CLI subcommand (mirror of push-usage --once).
  • hero_onboarding_aggregator_cron action with SchedulePolicy.interval_ms = 300_000 (5 min — balance updates feel live).

Real Phase 5 admin UI

Replaces the Phase 4 placeholder at crates/hero_onboarding_admin/src/main.rs. All routes admin-secret-gated; admin reads via reqwest against the server's new list endpoints (no second OSIS handle on the admin process).

  • GET / — overview tiles (total users, total balance, consumed records, aggregator health)
  • GET /users — list of User + Billing rows
  • GET /payments — PaymentEvent audit log
  • GET /aggregator — per-node BillingAggregatorState rows (last commit, last run, records consumed)

Smoke

  • scripts/smoke_aggregate.sh — seed test billing repo with synthetic nodes/A/usage_*.otoml; run aggregate --once; assert balance decremented by sum of cost_cents; assert ConsumedRecordIndex populated; assert second aggregate is no-op (commit-sha skip); add a new record via direct forge commit; assert next aggregate consumes only the new record. Cleanup.

Acceptance

  • cargo check workspace green
  • cargo test workspace green (existing 20 + new aggregator unit tests)
  • lab build --release --install --workspace VICTORY 3/3
  • lab infocheck 3/3 clean
  • cargo fmt --check clean
  • cargo clippy --workspace --all-targets -- -D warnings clean
  • scripts/smoke_aggregate.sh end-to-end green against real forge.ourworld.tf

Possible follow-ups

  • Decision lock as D-15 (aggregator-hash-resume) if recovery semantics (per-node halt-on-failure, idempotency-key dedup index) turn out to be load-bearing across operators.
  • L-NN if commit-listing pagination edges need a coping-fix note.
  • End-to-end cron loop exercise (smoke triggers via HTTP route, not the hero_proc scheduler tick) — deferred to launch-day rehearsal.
Consumer side of the billing pipeline. Reads what Phase 4 ([#5](https://forge.ourworld.tf/lhumina_code/hero_onboarding/issues/5)) writes to `lhumina_code/hero_onboarding_billing` and applies each UsageRecord to the central `Billing.credit_balance_cents`. Also lifts the placeholder admin page into the real Phase 5 admin surface (read-only views). Tracking parent: [#1](https://forge.ourworld.tf/lhumina_code/hero_onboarding/issues/1) §Centralized aggregator + §Admin. Wire-format + repo-layout contract is locked by [D-14](https://forge.ourworld.tf/lhumina_code/hero_work/decisions/D-14-per-node-billing-layout.md): single repo `lhumina_code/hero_onboarding_billing`, per-node subdir `nodes/<node-name>/usage_<YYYY-MM-DD-HH>.otoml`, OTOML wire format with `UsageBatch { records: Vec<UsageRecord> }` wrapper. ## Scope ### Aggregator core - Two new oschema rootobjects: - `BillingAggregatorState` — per node-subdir: `last_commit_sha`, `last_run_at`, `records_consumed_total`. `@index` on `node_name`. - `ConsumedRecordIndex` — per applied UsageRecord: `idempotency_key @index`, `applied_at`, `cost_cents`, `user_sid`, `node_name`. Drives the dedup of already-applied records on commit-replay / retry. - `crates/hero_onboarding_server/src/usage_aggregate.rs` (~350 LOC + unit tests): - `AggregateConfig` (mirror of `PushConfig` — same forge token + repo identity) - `aggregate_unconsumed_records(osis, http, cfg)` enumerates `nodes/` in the billing repo, then for each node-subdir: `GET /repos/.../commits?path=nodes/<node>&sha=main&limit=1` → if the latest sha equals stored `last_commit_sha`, skip; otherwise list `nodes/<node>/` contents, GET each `usage_*.otoml`, parse `UsageBatch`, for each record: check `ConsumedRecordIndex` by `idempotency_key` → if absent, decrement `Billing.credit_balance_cents` by `cost_cents` and append `ConsumedRecordIndex` row; finally update `BillingAggregatorState`. - Per-node failure logs at warn and continues to the next node (mirrors `push_unsent_records` error semantics). - Balance may go negative — credit-gating happens at the consume-request layer (Phase 6+ when VM allocation lands), not at aggregate time. ### Server + admin wiring - `POST /admin/aggregate-now` (admin-secret-gated) — mirror of `/admin/push-usage-now`. Returns aggregate summary JSON. - `GET /admin/list-users`, `GET /admin/list-payments`, `GET /admin/aggregator-state`, `GET /admin/list-billing` (admin-secret-gated) — read-only JSON feeds for the admin UI. - `hero_onboarding_admin aggregate --once` CLI subcommand (mirror of `push-usage --once`). - `hero_onboarding_aggregator_cron` action with `SchedulePolicy.interval_ms = 300_000` (5 min — balance updates feel live). ### Real Phase 5 admin UI Replaces the Phase 4 placeholder at `crates/hero_onboarding_admin/src/main.rs`. All routes admin-secret-gated; admin reads via reqwest against the server's new list endpoints (no second OSIS handle on the admin process). - `GET /` — overview tiles (total users, total balance, consumed records, aggregator health) - `GET /users` — list of User + Billing rows - `GET /payments` — PaymentEvent audit log - `GET /aggregator` — per-node `BillingAggregatorState` rows (last commit, last run, records consumed) ### Smoke - `scripts/smoke_aggregate.sh` — seed test billing repo with synthetic `nodes/A/usage_*.otoml`; run `aggregate --once`; assert balance decremented by sum of `cost_cents`; assert `ConsumedRecordIndex` populated; assert second aggregate is no-op (commit-sha skip); add a new record via direct forge commit; assert next aggregate consumes only the new record. Cleanup. ## Acceptance - cargo check workspace green - cargo test workspace green (existing 20 + new aggregator unit tests) - `lab build --release --install --workspace` VICTORY 3/3 - `lab infocheck` 3/3 clean - `cargo fmt --check` clean - `cargo clippy --workspace --all-targets -- -D warnings` clean - `scripts/smoke_aggregate.sh` end-to-end green against real `forge.ourworld.tf` ## Possible follow-ups - Decision lock as **D-15** (`aggregator-hash-resume`) if recovery semantics (per-node halt-on-failure, idempotency-key dedup index) turn out to be load-bearing across operators. - L-NN if commit-listing pagination edges need a coping-fix note. - End-to-end cron loop exercise (smoke triggers via HTTP route, not the hero_proc scheduler tick) — deferred to launch-day rehearsal.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_onboarding#6
No description provided.