feat(11-C3): python3 subprocess + Tier 0 sandbox + boot stub #19

Merged
timur merged 1 commit from feat/11-phase-c3-spawn-sandbox into development 2026-05-05 12:50:20 +00:00
Owner

Summary

Phase C3 of #11. The whole runtime path: PythonFlowExecutor::execute(opts) stages the user's flow source to a per-Play workdir, binds the Phase C2 span socket, spawns python3 with Tier 0 sandbox (per hero_logic#14), and awaits both the subprocess exit + listener drain.

What lands

Component Purpose
PythonFlowExecutor::execute(opts) Full lifecycle. Returns ExecuteOutcome { exit_code, timed_out, stderr }.
BOOT_STUB (Python) Walks globals() for __hero_flow__-stamped functions, decodes HERO_FLOW_INPUT, calls _bootstrap_run. Exits 2 with a diagnostic if no @flow is present.
Tier0Sandbox env_clear + allowlist, per-Play workdir, wall-clock timeout (SIGTERM → grace → SIGKILL), best-effort setrlimit on Linux only (RLIMIT_AS / RLIMIT_CPU / RLIMIT_NOFILE / RLIMIT_NPROC).
ExecuteOptions Overrides for every external dep (sdk_dir, clients_dir, workdir_root, span_socket_path, python_bin) so tests don't touch the operator's real paths.
SpanListener::with_socket_path Now public (was cfg(test) in C2). Used whenever ExecuteOptions::span_socket_path is set.

Tier 0 details

Per the hero_logic#14 threat model:

  • (A) Mistaketime.sleep(60) inside a flow with wall_clock = 2s gets SIGTERMed at 2s, SIGKILLed at 2.5s. ✓
  • Env scrub — the user's flow can't read secrets the operator set on the hero_logic_server process. Allowlist: PATH, HOME, LANG, PYTHONPATH, PYTHONDONTWRITEBYTECODE, HERO_FLOW_*.
  • Workdir~/.hero/var/plays/{play_sid}/work becomes cwd. flow.py is staged there.
  • ulimits — Linux only; macOS-side ulimits (and bwrap/sandbox-exec FS isolation) land in Tier 1, separate PR per hero_logic#14.

Listener drain bound

After subprocess exit, the listener task gets grace + 2s to finish naturally; if still stuck, abort. macOS doesn't always deliver EOF promptly to a UnixStream whose peer was SIGKILLed. Losing late span events on a killed flow is acceptable; hanging the executor is not.

Tests

13 total — 9 unit + 4 integration. Integration tests actually run python3 and skip cleanly if it's not on PATH.

  • end_to_end_flow_writes_spans@flow + flow.step → spans persisted with names + parent linkage + SpanStatus::Ok
  • flow_failure_records_failed_root_spanraise flow.Failed("nope") → root span persists as Failed with the exception text
  • wall_clock_timeout_kills_runaway_flowtime.sleep(60) killed at 2s; outcome reports timed_out=true
  • missing_flow_decoration_exits_with_diagnostic — boot stub exits 2 with "no @flow-decorated function" message

What this PR is NOT

  • New RPC methods (C4)
  • SSE endpoint (C4)
  • play_start routing by python_source non-empty (C4)
  • Migration / DAG deletion (D)
  • Tier 1 sandbox (separate PR — hero_logic#14)

Phase plan (#11)

  • A — schema additive (#15, merged)
  • B — hero_tracing.py SDK (#16, merged)
  • C1 — staging (#17, merged)
  • C2 — span socket listener (#18, merged)
  • C3 — this PR — subprocess + Tier 0 sandbox
  • C4 — new RPCs + SSE + play_start routing
  • D — migration tool + delete legacy DAG

Test plan

  • cargo test -p hero_logic --lib python_executor::tests — 9/9 pass
  • cargo test -p hero_logic --lib python_executor::integration_tests — 4/4 pass
  • cargo test --workspace --lib — 32 total, all green
  • cargo build --workspace clean

🤖 Generated with Claude Code

## Summary Phase C3 of #11. The whole runtime path: `PythonFlowExecutor::execute(opts)` stages the user's flow source to a per-Play workdir, binds the Phase C2 span socket, spawns python3 with Tier 0 sandbox (per hero_logic#14), and awaits both the subprocess exit + listener drain. ## What lands | Component | Purpose | |---|---| | `PythonFlowExecutor::execute(opts)` | Full lifecycle. Returns `ExecuteOutcome { exit_code, timed_out, stderr }`. | | `BOOT_STUB` (Python) | Walks `globals()` for `__hero_flow__`-stamped functions, decodes `HERO_FLOW_INPUT`, calls `_bootstrap_run`. Exits 2 with a diagnostic if no `@flow` is present. | | `Tier0Sandbox` | env_clear + allowlist, per-Play workdir, wall-clock timeout (SIGTERM → grace → SIGKILL), best-effort `setrlimit` on Linux only (RLIMIT_AS / RLIMIT_CPU / RLIMIT_NOFILE / RLIMIT_NPROC). | | `ExecuteOptions` | Overrides for every external dep (`sdk_dir`, `clients_dir`, `workdir_root`, `span_socket_path`, `python_bin`) so tests don't touch the operator's real paths. | | `SpanListener::with_socket_path` | Now public (was cfg(test) in C2). Used whenever `ExecuteOptions::span_socket_path` is set. | ## Tier 0 details Per the hero_logic#14 threat model: - **(A) Mistake** — `time.sleep(60)` inside a flow with `wall_clock = 2s` gets SIGTERMed at 2s, SIGKILLed at 2.5s. ✓ - **Env scrub** — the user's flow can't read secrets the operator set on the hero_logic_server process. Allowlist: `PATH`, `HOME`, `LANG`, `PYTHONPATH`, `PYTHONDONTWRITEBYTECODE`, `HERO_FLOW_*`. - **Workdir** — `~/.hero/var/plays/{play_sid}/work` becomes cwd. flow.py is staged there. - **ulimits** — Linux only; macOS-side ulimits (and bwrap/sandbox-exec FS isolation) land in Tier 1, separate PR per hero_logic#14. ## Listener drain bound After subprocess exit, the listener task gets `grace + 2s` to finish naturally; if still stuck, abort. macOS doesn't always deliver EOF promptly to a UnixStream whose peer was SIGKILLed. Losing late span events on a killed flow is acceptable; hanging the executor is not. ## Tests 13 total — 9 unit + 4 integration. Integration tests actually run python3 and skip cleanly if it's not on PATH. - `end_to_end_flow_writes_spans` — `@flow` + `flow.step` → spans persisted with names + parent linkage + `SpanStatus::Ok` - `flow_failure_records_failed_root_span` — `raise flow.Failed("nope")` → root span persists as `Failed` with the exception text - `wall_clock_timeout_kills_runaway_flow` — `time.sleep(60)` killed at 2s; outcome reports `timed_out=true` - `missing_flow_decoration_exits_with_diagnostic` — boot stub exits 2 with "no @flow-decorated function" message ## What this PR is NOT - New RPC methods (C4) - SSE endpoint (C4) - `play_start` routing by `python_source` non-empty (C4) - Migration / DAG deletion (D) - Tier 1 sandbox (separate PR — hero_logic#14) ## Phase plan (#11) - A — schema additive (#15, merged) - B — `hero_tracing.py` SDK (#16, merged) - C1 — staging (#17, merged) - C2 — span socket listener (#18, merged) - **C3 — this PR** — subprocess + Tier 0 sandbox - C4 — new RPCs + SSE + `play_start` routing - D — migration tool + delete legacy DAG ## Test plan - [x] `cargo test -p hero_logic --lib python_executor::tests` — 9/9 pass - [x] `cargo test -p hero_logic --lib python_executor::integration_tests` — 4/4 pass - [x] `cargo test --workspace --lib` — 32 total, all green - [x] `cargo build --workspace` clean 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Phase C3 of #11. Spawns the Python flow subprocess, points it at the
Phase C2 span listener, applies Tier 0 sandbox per hero_logic#14, and
provides the boot stub that calls the user's @flow function.

What landed:

- `PythonFlowExecutor::execute(opts)` — full lifecycle: stage flow.py
  to per-Play workdir, bind span socket, spawn python3, await both
  subprocess exit and listener drain, return ExecuteOutcome with
  exit_code / timed_out / stderr.

- BOOT_STUB — short Python snippet appended to the user's flow source.
  Walks globals() for callables with `__hero_flow__` (stamped by the
  @flow decorator from Phase B's SDK) and calls _bootstrap_run with
  the JSON-decoded HERO_FLOW_INPUT. Picks the first decorated function
  by convention (one @flow per file); warns and uses the first if the
  file declares more than one. Diagnostic message + exit 2 when no
  @flow function is found.

- Tier 0 sandbox knobs (Tier0Sandbox struct):
  * env_clear() + allowlisted env vars (PATH, HOME, LANG, PYTHONPATH,
    PYTHONDONTWRITEBYTECODE, HERO_FLOW_*). The user's flow can't
    inherit secrets the operator set on the hero_logic_server process.
  * Per-Play workdir (`~/.hero/var/plays/{play_sid}/work`) used as
    cwd. Stages flow.py there.
  * Wall-clock timeout: tokio::time::timeout drives the SIGTERM path,
    grace period, then SIGKILL. ExecuteOutcome.timed_out reports the
    kill path so the caller can mark Play.status = timed_out.
  * Best-effort setrlimit for RLIMIT_AS / RLIMIT_CPU / RLIMIT_NOFILE /
    RLIMIT_NPROC via pre_exec on Linux only. macOS-side ulimits land
    in Tier 1 — see hero_logic#14 for the threat model.

- Listener drain bound: after subprocess exit, the listener task gets
  `grace + 2s` to finish naturally; if still stuck, we abort it. macOS
  doesn't always deliver EOF promptly to the read side of a UnixStream
  whose peer was SIGKILLed — losing late span events on a killed flow
  is acceptable; hanging the executor is not.

- ExecuteOptions has overrides for every external dependency
  (sdk_dir, clients_dir, workdir_root, span_socket_path, python_bin)
  so tests can run against a tempdir without touching the operator's
  real `~/.hero/var/{flows,plays,router}` paths.

- SpanListener::with_socket_path is now a public production method
  (was cfg(test)-only in C2). The executor uses it whenever
  ExecuteOptions::span_socket_path is set, primarily for tests.

Tests:

- 9 unit tests (existing C1 coverage + sanitized_env + build_pythonpath +
  boot_stub sanity).
- 4 integration tests that actually invoke python3:
  * end-to-end: @flow + flow.step → spans persisted to Play with names,
    parent linkage, and SpanStatus::Ok
  * flow_failure: raise flow.Failed("nope") → root span persists as
    SpanStatus::Failed with the exception text
  * wall_clock_timeout: time.sleep(60) inside a flow → killed at 2s,
    ExecuteOutcome reports timed_out=true
  * missing_flow_decoration: source with no @flow → boot stub exits
    with code 2 and a "no @flow-decorated function" diagnostic
- All integration tests skip cleanly when python3 isn't on PATH (CI
  image compatibility).

Cargo:

- libc 0.2 added under [target.'cfg(target_os = "linux")'.dependencies]
  for the setrlimit pre_exec hook. macOS / Windows builds don't link
  libc.

Phase plan:

- A — schema additive (#15, merged)
- B — hero_tracing.py SDK (#16, merged)
- C1 — staging (#17, merged)
- C2 — span socket listener (#18, merged)
- C3 — this PR — subprocess + Tier 0 sandbox
- C4 — new RPC methods + SSE endpoint + play_start routing
- D — migration tool + delete legacy DAG

Refs hero_logic#11 (Story 1: Foundation), hero_logic#14 (sandbox
roadmap), hero_logic#10 (epic).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
timur merged commit 07c28cada6 into development 2026-05-05 12:50:20 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_logic!19
No description provided.