Story 1: Foundation — Python-flow executor + schema + span socket #11
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Goal
Replace the DAG executor with a Python-flow executor that listens on a span socket and persists the span tree to the Play record. Also ship
hero_tracing.py— the generic Python module that flow authors import to get@flow,flow.step,instrument(), etc.Parent epic: #10
Prerequisite: hero_rpc#29 (service methods in generated clients). hero_rpc#30 was closed — its scope moved here.
Scope
Schema (
schemas/logic/logic.oschema)WorkflowVersion.python_source: strreplacesnodes: [Node]+edges: [Edge]Play.spans: [Span]replacesnode_runs: [NodeRun]SpanStatusenum replacesExecutionStatus(values:pending | running | ok | failed | timed_out | cancelled)Spantype:Node,Edge,NodeConfig,NodeRun,NodeType,ExecutionStatusNew service methods on LogicService
play_run_async(workflow_sid, input_data, parent_span_id) -> str— non-blocking; returns play_sidplay_wait(play_sid, timeout_ms) -> Play— block until terminalspan_push(play_sid, event_json) -> bool— internal, used by executor's socket listener but exposed for testsflow_library_search(query, limit) -> [str]— keyword first; Story 3 lands the semantic-match versionplay_start,play_status,play_retry,play_cancel,play_resumestay)hero_tracing.py— new Python moduleLives at
sdk/python/hero_tracing.pyin the hero_logic repo. Embedded in the hero_logic_server binary viainclude_str!or rust-embed. Staged on startup at~/.hero/var/flows/sdk/hero_tracing.py. The executor adds that path toPYTHONPATHwhen spawning flow subprocesses.Contains:
Plus plumbing:
HERO_FLOW_SPAN_SOCKenv var; if set, connects and writes JSONL span events@flow-decorated function), helpers are silent no-opsasyncio.create_task/gatherGeneric, service-agnostic, ~400-500 lines of stdlib-only Python.
Executor rewrite (
src/engine/executor.rs)hero_tracing.pyto~/.hero/var/flows/sdk/hero_tracing.py(mkdir, overwrite on each start so updates land)play_start:/tmp/spans-{play_sid}.sockUDSHERO_FLOW_SPAN_SOCK=/tmp/spans-{play_sid}.sockPYTHONPATHprepended with~/.hero/var/flows/sdk(sofrom hero_tracing import flowworks) and~/.hero/var/router/python(so service clients work)WorkflowVersion.python_sourcewith a boot-stub appended: load inputs from an env var, call the decorated functionspan_start,span_end,span_log) — persist each toPlay.spansincrementally; publish over SSEparent_span_id, its root span uses that as parentDelete
src/engine/node_executors.rssrc/engine/template_loader.rssrc/engine/schema.rs(DAG-specific validation){{outputs.X}}/{{var}}interpolation pathsNodeType::Action | Conditional | Transform | Wait | HumanInputDone when
cargo buildclean after all deletions~/.hero/var/flows/sdk/hero_tracing.pyis written on server startup@flow+ 3with flow.step(...)blocks + 1instrument()ed client + 6 RPC calls produces a 9-span tree persisted correctly inPlay.spansplay_run_asynccalls in parallel from a parent flow (withinstrument()ed HeroLogicClient) produces a nested tree (parent → 2 children → their spans)instrument()produces 2 disconnected child-play trees (confirming opt-in propagation)GET /api/plays/:sid/spansstreams span events live as the flow runsplay_status,play_cancel, etc.) — just against the new data modelEstimate: ~2 weeks (unchanged — tracing module is ~500 lines of straightforward Python)
Clarification on the tracing model (reflected in hero_rpc#30 after a design-review pushback):
@flow,flow.step,flow.Failed,instrument(), socket plumbing) live in the sharedhero_tracing.pymodule that hero_rpc emits alongside each service's client.wb = instrument(HeroWhiteboardClient()). Unwrapped clients don't emit RPC spans.Span+Play.spans, still listen onHERO_FLOW_SPAN_SOCK, still collect events. What changes is where those events come from:flow.stepalways,instrument()-wrapped clients conditionally, unwrapped clients never.Test fixture for the story's done-criterion updates accordingly: the sample flow should exercise all three tiers — some clients instrumented, some not — and produce the correct mix of step spans and rpc spans.
Schema addendum — enriched Span fields
Story 2's viewer (#12) is being scoped as a graph-style rendering (Cytoscape, same visual richness as the old DAG editor — node cards with inputs/outputs/function-sig, edges showing parent-child + fan-out + sequential, drawer for details). To feed that viewer, the
Spantype needs fields beyond the minimal set in the issue body.Add to
Span:Why:
args+resultlet the viewer render input/output previews directly on the card, not just hidden behind a drawer.source_file+source_linelet clicking a node in the graph jump to the code that opened it — a critical UX for Story 2's split view.kindlets the viewer style cards distinctly (flow roots get a big prominent card; step cards look different from rpc cards; subflow cards show a "nested play" affordance).Size concerns:
argsandresultcan be large. hero_tracing.py should truncate them with a reasonable cap (suggest 8 KB each; configurable via env var). Full bodies are still in the logs stream if users need them.Hero_tracing.py contract update:
flow.step(name, **tags)—tagsis the args payload at span openflow.current_span.record(**kv)— records result payload (overwrites if called multiple times)instrument(client)proxy — captures request args (method kwargs) as the span'sargs, captures response asresult@flowspan — args = the flow's declared inputs resolved from the input_data JSON, result = the function's return valueDone-criteria in the original issue body stay; just extend the test-fixture flow to confirm the enriched fields land correctly.