feat: implementation gaps to match PRD (resumable flows, generic pause, herolib_base, unified sub-flow API) #29
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Tracks the work to bring the codebase in line with the new PRD.md. Concrete deltas grouped by surface.
1. Generic pause / resume primitive (subsumes the original
ask_userdesign from #28)Rename and generalize:
flow.pause(name, *, schema=None, ui=None)inhero_tracing.py. Returns the resume payload on replay. Exits subprocess with code 75 on first hit, persists aResumeRequest.ask_user.text / number / choice / multi_choice / confirmbecome UI-flavored helpers overflow.pause(..., ui={...}). They do not become a separate mechanism.play_resume(play_sid, resume_id, payload_json) -> {ok, resumed_at}— one method, used by UI form posts, webhooks, cron triggers, and inter-service signals alike.play_pending_resumes(play_sid) -> [ResumeRequest].awaiting_resume. Lifecycle:pending → running → awaiting_resume → running → ... → success | failed | cancelled | timed_out.ResumeRequestschema:{id, name, schema, ui, asked_at_span_id, asked_at, payload, resumed_at}.idis deterministic from span path + call sequence.ui != nullfiltering; non-UI resumes don't render forms.2. Step memoization + replay
step_key = sha1(workflow_version_sid + '|' + parent_path + '|' + flow_name + '|' + canonical_json(sorted_kwargs))in the@flowwrapper.@flowcalls whose key is in_STEP_CACHE. Emit the span withstatus=replayed.step_outputevent over the span socket; server writesPlay.step_outputs.HERO_REPLAY_STEP_OUTPUTS_FILEandHERO_REPLAY_RESUMES_FILE(env-pointed) into_STEP_CACHEand_ANSWER_CACHEbefore the user's flow runs.@flow(outputs=...)declarations on cache write; raise on non-serializable returns.workflow_version_sidis part of the step key; mismatch invalidates globally and the UI prompts to restart fresh or rollback.@flowsignatures (defeats memoization). Pull non-determinism inside the body.3. Unified sub-flow API:
flow.invoke(name, *, spawn=False, **inputs)spawn=False= in-process (current behaviour from F7).spawn=True= callsLogicService.play_run_asyncover the local RPC socket, waits viaplay_wait(timeout_ms=0), returns the child'soutput_data.kind=subflowspan on the parent; spawned variant additionally carrieschild_play_sid.from <flow_name> import <fn>meta-path importer: either keep as sugar that compiles toflow.invoke(name, ...), or remove in favour of one explicit API. Recommend removing to keep one mental model.4. Schema additions (
logic.oschema)Play.pending_resumes: [ResumeRequest]Play.received_resumes: str(JSON{resume_id: payload})Play.step_outputs: str(JSON{step_key: output})Play.total_cost_usd: f64ResumeRequest(shape above).PlayStatus(renamed fromExecutionStatus) addsawaiting_resume.SpanStatusaddsreplayed.Spanaddskind: SpanKind,source_file: str,source_line: u32.SpanKind = flow_root | step | rpc | subflow | other.FlowField(rename ofFlowInput);Workflow.outputs: [FlowField]is populated from@flow(outputs=...)on save.5. RPC additions
play_start(workflow_sid, input_data, name, prefill_resumes) -> Playplay_run_async(workflow_sid, input_data, parent_span_id, prefill_resumes) -> strplay_resume(play_sid, resume_id, payload_json) -> {ok, resumed_at}play_pending_resumes(play_sid) -> [ResumeRequest]6. Admin UI — bottom-bar island
In the play-detail page, below the flow tree:
ResumeRequestwithui != null, badged with count, renders form perui.kind, posts toplay_resume), Events (full Play.spans history).7.
herolib_baselifecycle migrationAll three binaries (
hero_logic,hero_logic_server,hero_logic_admin) move to the latest hero standard:service.tomlat each crate root declaring kind, sockets, protocols, env vars.service_base!();macro at module scope.main.rsorder:validate_service_toml → handle_info_flag → Args::parse → print_startup_banner → prepare_sockets(server/admin only).print_startup_info(), socket dir resolution, stale-socket cleanup.BUILD_NRconstant viaoption_env!("HERO_BUILD_NR").hero_logicCLI keeps--start/--stopsemantics; replace theHeroServices::new(...)registration path with the lab/proc pattern fromherolib_baseandhero_service.8. Pre-filled resumes for non-interactive runs
play_start/play_run_asyncacceptprefill_resumes: {resume_name_or_pattern → payload}./examples/use this so plays run headlessly.9.
/examples/E2E scaffold/examples/directory exists (README.mdalready added). Populate with at least one driver script per major scenario: simple play, sub-flow composition, pause/resume cycle, multi-pause replay.Acceptance
flow.pause("approve", ui={"kind":"confirm"})exits withawaiting_resume; UI renders a confirm button;play_resumeresumes the play and the flow returns the payload.*_setfollowed by a pause does NOT double-create the OSIS record on resume.flow.invoke(name, **inputs)andflow.invoke(name, spawn=True, **inputs)both work; the latter shows the child Play expandable inline in the graph view.lab infocheckreports 0 issues for all three binaries; startup banners are consistent withservice.toml;--inforeturns the manifest./examples/directory has at least one runnable driver covering pause/resume withprefill_resumes.