Phase 10 — Production keys prep + operator runbook (s2-012) #11
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Phase 10 of #1. Pre-flight gate for production deployment: validate that every payment/KYC/VM/login surface is shaped for production before a deploy lands. No live charges or KYC verifications — those stay on the launch-day go/no-go checklist.
Landed in s2-012
13 files +987 LOC across worktree
hero_onboarding-track-agent-2/on branchtrack-agent-2/phase-10-prod-keys, squash-merged todevelopment.is_production() -> boolon every provider/config surface (~80 LOC)PaymentProvidertrait method (defaultfalse); impls onStripeProvider(sk_live_*+ webhook secret) +ClickPesaProvider(creds + non-sandbox api_url + non-empty webhook_url).KycProvidertrait method (defaultfalse); impl onIdenfyProvider(no demo/dev escape hatch + creds + webhook secret + non-sandbox api_url).Provisionertrait method (defaultfalse); impl onPoolAssignmentProvisioner(delegates to pure-fn helperpool_assignment_is_production(demo, pool_size)— unit-testable without OSIS).forge_oauth::is_production(&ForgeOAuthConfig)(non-localhost base_url + non-loopback redirect_uri + non-placeholder creds).--check-prod-configvalidator (~130 LOC)hero_onboarding_server: builds the full provider set (using the existingbuild_*functions, no duplication), iteratesis_production()on each registered surface, prints one line per surface with explicitOK/FAIL/SKIPprefix + reason, printsverdict: READYorverdict: NOT READY, exits 0 or 1.hero_onboardingCLI: execs~/hero/bin/hero_onboarding_server --check-prod-configand proxies the exit code. Both forms work; runbook documents either.Operator runbook (
docs/operator-runbook.md, ~532 LOC)Per-environment config matrix for dev / staging / prod × 5 trait surfaces = 15 rows. Layout: quick comparison table → per-surface section (purpose, required slots, hero_proc keys + env-var fallbacks, how-to-get-credentials, sandbox vs prod, gotchas) →
--check-prod-configusage → launch-day go/no-go checklist → per-verdict troubleshooting → reference key list.Production Forge OAuth section marked TBD — production hostname for hero_onboarding is not yet locked. Until then,
--check-prod-configreportsforge_oauth FAILbecause the dev redirect URI still loops back to127.0.0.1. Runbook documents the "register prod OAuth app once hostname is locked" follow-up as a launch-day item.Smoke (
scripts/smoke_prod_config.sh, 32 checks)Drives
--check-prod-configacross 12 env-var combos: all-skip → exit 1 with 5 SKIPs; Stripe sandbox key → FAIL; Stripe live key without webhook secret → FAIL; Stripe live + webhook → OK; KYC demo flag → FAIL; KYC live creds + non-sandbox URL → OK; KYC sandbox URL → FAIL; Provisioner demo flag → FAIL; Provisioner real pool → OK; Forge OAuth localhost → FAIL; Forge OAuth remote → OK; all five surfaces OK → exit 0 + READY; ClickPesa sandbox URL → FAIL. 32 individual assertions across the per-surface output prefixes + final verdict + exit code.Acceptance gates
cargo test --workspace77/77 (62 from s2-011 + 15 newis_productionunit tests: 2 in payment.rs, 5 in kyc.rs, 3 in provisioner.rs, 5 in forge_oauth.rs).lab build --release --install --workspaceVICTORY 3/3 (27.2s, build #13).lab infocheck3/3 clean / 0 findings.cargo fmt --checkclean (after autofix on 3 multi-arg println sites + one function signature).cargo clippy --workspace --all-targets -- -D warningsclean.scripts/smoke_prod_config.sh32/32 GREEN./,/login,/dashboard,/account,/login/forge/start,/vm/list,/kyc/start, plus all 7 admin-secret-gated endpoints with + without secret,/logout) all return expected status codes — the additive trait method is transparent to existing route handlers.Note on existing smoke scripts: the workstation was under heavy load (load avg ~38) during the regression window; the existing
scripts/smoke_*.shscripts' 6s server-startup wait windows were too tight under that load (eachhero_proc secret.getreturning empty took ~700ms instead of ~50ms). This is environmental, not a code regression — the direct curl-based route regression above confirms behavior. The smoke scripts pass cleanly on a less-loaded host (and passed at s2-011's acceptance).No D-NN or L-NN minted
Phase 10 is pure-additive — no design lock-in. The check semantics (per-surface
is_production+ AND across registered set + SKIP-counts-as-NOT-READY) is the obvious shape. ID slots stay at D-19 / L-09.What
--check-prod-configdoes NOT coverscripts/smoke_forge_oauth.shagainst a mock; live test with prod OAuth app is a go/no-go item).find_or_create_userexercises that).Launch-day go/no-go checklist in §5 of the runbook covers the gaps.
Open follow-ups
onboarding).--check-prod-configthen reportsforge_oauth OKinstead ofFAIL.scripts/smoke_payments.shand friends timing-flaky under load — bump the server-startup wait window fromseq 1 30× 0.2s toseq 1 90× 0.2s, or add an explicit--ready-probeserver flag that printsREADYon stdout oncelisteninghappens. Out of scope for s2-012.Next
s2-013 Phase 11 — Refund posture + multi-currency (Q#8 + Q#9).
RELEASE_REFUNDS_ENABLED=trueenv flag (default false — freezone-aligned no-refund posture stays default).Billing.balance_by_currency: map<str, i64>for multi-currency rollup. D-19 candidate if refund posture locks load-bearing.