Generated Python clients should emit a shared tracing module + auto-span every RPC call #30

Closed
opened 2026-04-21 14:10:45 +00:00 by timur · 1 comment
Owner

Companion to #29

#29 surfaces service methods in generated Python clients (currently only rootobject CRUD is emitted). This issue is about a second capability: generating the tracing helpers that flow authors need.

Important revision (after discussion): the original draft of this issue proposed auto-wrapping every generated client method in a span. That was the wrong answer — it couples tracing into every client, pollutes standalone-script use, and leaks observability concerns into plain RPC code. The cleaner design, reflected below, is that clients stay pure and tracing is a wrapping concern applied explicitly via instrument(client).

Problem

Hero is moving toward Python flows as the primary authoring artifact (see hero_logic docs/flows-as-python.md, hero_logic#10). Flow authors need:

  • A decorator @flow(...) that registers a function as a flow with declared inputs
  • A context manager with flow.step(name): that marks a named step
  • An exception type flow.Failed for typed failures
  • Socket plumbing that ships span events to the flow's parent runtime
  • Parent-span context propagation across asyncio tasks and across sub-flow spawns
  • An instrument(client) helper for flow authors who want per-RPC spans

None of this exists today in generated Python clients. Every hero_logic template re-invents JSON-RPC plumbing by hand.

What to generate

Per repo that calls hero_rpc client codegen, emit two Python artifacts:

1. The service client (extended, per #29)

One per service, as today. Pure RPC client — no tracing code. Includes every service method (not just CRUD). Zero runtime imports beyond stdlib. Works identically whether called from a standalone script, a Rhai-like sandbox, a test, or a @flow-decorated Python function.

2. A shared hero_tracing.py module (new — identical bytes per repo)

Boilerplate generator output; same source code every time. Imports stdlib only. Contains:

# --- Decorator + context managers --------------------------------------

@flow(name: str, inputs: dict = None, description: str = '')    # decorator
flow.step(name: str, **tags) -> context manager
flow.span(name: str, **tags) -> context manager   # generic, no "step" framing
flow.Failed(reason: str)                          # exception
flow.current_span                                 # for mid-step tags / logs

# --- Opt-in RPC-level instrumentation ----------------------------------

instrument(client) -> proxy                       # wraps any client so every
                                                  # method call opens a
                                                  # rpc:{service}.{method} span

# --- Plumbing (internals, not public API) ------------------------------

# Reads HERO_FLOW_SPAN_SOCK env var; connects and writes JSONL span events.
# If unset (e.g. standalone script), helpers are no-ops. No stderr pollution.
# contextvars-based parent span propagation across asyncio.create_task / gather.

The three tiers of use

Tier 1 — standalone script (plain RPC, no tracing)

from hero_whiteboard_client import HeroWhiteboardClient
wb = HeroWhiteboardClient()
wb.workspace_create(name='foo')

No hero_tracing import. No sockets. No spans. Just RPC.

Tier 2 — flow with step-level tracing

from hero_tracing import flow
from hero_whiteboard_client import HeroWhiteboardClient

@flow('Create workspace', inputs={...})
async def run(inputs):
    wb = HeroWhiteboardClient()
    with flow.step('create'):
        wb.workspace_create(name=inputs['name'])

One span per with flow.step block. RPC calls inside are opaque to the viewer. Fine for most flows.

Tier 3 — flow with step + RPC tracing (opt-in)

from hero_tracing import flow, instrument
from hero_whiteboard_client import HeroWhiteboardClient

@flow('...')
async def run(inputs):
    wb = instrument(HeroWhiteboardClient())   # explicit opt-in
    with flow.step('create'):
        wb.workspace_create(name=inputs['name'])
        # ^ emits a child rpc:hero_whiteboard.workspace_create span

One line of opt-in. Generic instrument() works on any client.

Wire protocol — span events (JSONL over UDS)

{"type":"span_start","span_id":"a1","parent_id":null,"name":"create","t_ms":1776760000123}
{"type":"span_start","span_id":"a2","parent_id":"a1","name":"rpc:hero_whiteboard.workspace_create","t_ms":1776760000163}
{"type":"span_end","span_id":"a2","t_ms":1776760000203,"status":"ok","tags":{"workspace_id":3}}
{"type":"span_end","span_id":"a1","t_ms":1776760000443,"status":"ok"}

Stable, forward-compatible. No schema version needed; add fields optionally.

Cross-process span propagation

When a parent flow calls HeroLogicClient.play_run_async(flow_name='X', inputs=...), the parent flow author is expected to instrument() the client if they want the spawn site traced. If instrumented:

  • Wrapper reads flow.current_span.id from contextvars
  • Sends it as X-Hero-Parent-Span header with the JSON-RPC request
  • hero_logic's executor injects it into the child Play's span socket
  • Child's @flow root span uses it as parent_id

Result: nested tree in the viewer. Same mechanism whether the child is another Python flow or a generated client call.

Scope / definition of done

  1. hero_tracing.py emitted by the Python generator, identical content per repo
  2. Generated clients remain pure — no tracing imports or decoration (this is a delta from the original issue draft)
  3. instrument(client) helper wraps any client with per-method spans via __getattr__ proxy
  4. When HERO_FLOW_SPAN_SOCK is unset, all tracing helpers are silent no-ops (no stderr, no stdout)
  5. contextvars-based parent-span propagation works across asyncio.create_task / gather
  6. Cross-process parent-span propagation via X-Hero-Parent-Span header in outgoing RPC
  7. Sample end-to-end: a Python flow with 3 with flow.step(...) blocks and 1 instrumented client making 6 RPC calls produces a 9-span tree rooted at the @flow root

Downstream

hero_logic gets:

  • Span rootobject (parent_id, start/end, name, status, tags, logs)
  • Play.spans: [Span] (replaces node_runs)
  • A span collector on the executor that listens on HERO_FLOW_SPAN_SOCK per-Play

These land in hero_logic#11.

  • hero_rpc#29 (prerequisite — service methods in generated client)
  • hero_logic#10 (epic tracking the Python-flow work)
  • hero_logic docs/flows-as-python.md
## Companion to #29 #29 surfaces service methods in generated Python clients (currently only rootobject CRUD is emitted). **This issue is about a second capability: generating the tracing helpers that flow authors need.** **Important revision (after discussion):** the original draft of this issue proposed auto-wrapping every generated client method in a span. That was the wrong answer — it couples tracing into every client, pollutes standalone-script use, and leaks observability concerns into plain RPC code. The cleaner design, reflected below, is that **clients stay pure** and tracing is a wrapping concern applied explicitly via `instrument(client)`. ## Problem Hero is moving toward Python flows as the primary authoring artifact (see hero_logic `docs/flows-as-python.md`, hero_logic#10). Flow authors need: - A decorator `@flow(...)` that registers a function as a flow with declared inputs - A context manager `with flow.step(name):` that marks a named step - An exception type `flow.Failed` for typed failures - Socket plumbing that ships span events to the flow's parent runtime - Parent-span context propagation across `asyncio` tasks and across sub-flow spawns - An `instrument(client)` helper for flow authors who want per-RPC spans None of this exists today in generated Python clients. Every hero_logic template re-invents JSON-RPC plumbing by hand. ## What to generate Per repo that calls `hero_rpc` client codegen, emit **two** Python artifacts: ### 1. The service client (extended, per #29) One per service, as today. **Pure RPC client — no tracing code.** Includes every service method (not just CRUD). Zero runtime imports beyond stdlib. Works identically whether called from a standalone script, a Rhai-like sandbox, a test, or a `@flow`-decorated Python function. ### 2. A shared `hero_tracing.py` module (new — identical bytes per repo) Boilerplate generator output; same source code every time. Imports stdlib only. Contains: ```python # --- Decorator + context managers -------------------------------------- @flow(name: str, inputs: dict = None, description: str = '') # decorator flow.step(name: str, **tags) -> context manager flow.span(name: str, **tags) -> context manager # generic, no "step" framing flow.Failed(reason: str) # exception flow.current_span # for mid-step tags / logs # --- Opt-in RPC-level instrumentation ---------------------------------- instrument(client) -> proxy # wraps any client so every # method call opens a # rpc:{service}.{method} span # --- Plumbing (internals, not public API) ------------------------------ # Reads HERO_FLOW_SPAN_SOCK env var; connects and writes JSONL span events. # If unset (e.g. standalone script), helpers are no-ops. No stderr pollution. # contextvars-based parent span propagation across asyncio.create_task / gather. ``` ## The three tiers of use ### Tier 1 — standalone script (plain RPC, no tracing) ```python from hero_whiteboard_client import HeroWhiteboardClient wb = HeroWhiteboardClient() wb.workspace_create(name='foo') ``` No `hero_tracing` import. No sockets. No spans. Just RPC. ### Tier 2 — flow with step-level tracing ```python from hero_tracing import flow from hero_whiteboard_client import HeroWhiteboardClient @flow('Create workspace', inputs={...}) async def run(inputs): wb = HeroWhiteboardClient() with flow.step('create'): wb.workspace_create(name=inputs['name']) ``` One span per `with flow.step` block. RPC calls inside are opaque to the viewer. Fine for most flows. ### Tier 3 — flow with step + RPC tracing (opt-in) ```python from hero_tracing import flow, instrument from hero_whiteboard_client import HeroWhiteboardClient @flow('...') async def run(inputs): wb = instrument(HeroWhiteboardClient()) # explicit opt-in with flow.step('create'): wb.workspace_create(name=inputs['name']) # ^ emits a child rpc:hero_whiteboard.workspace_create span ``` One line of opt-in. Generic `instrument()` works on any client. ## Wire protocol — span events (JSONL over UDS) ```json {"type":"span_start","span_id":"a1","parent_id":null,"name":"create","t_ms":1776760000123} {"type":"span_start","span_id":"a2","parent_id":"a1","name":"rpc:hero_whiteboard.workspace_create","t_ms":1776760000163} {"type":"span_end","span_id":"a2","t_ms":1776760000203,"status":"ok","tags":{"workspace_id":3}} {"type":"span_end","span_id":"a1","t_ms":1776760000443,"status":"ok"} ``` Stable, forward-compatible. No schema version needed; add fields optionally. ## Cross-process span propagation When a parent flow calls `HeroLogicClient.play_run_async(flow_name='X', inputs=...)`, the **parent flow author** is expected to `instrument()` the client if they want the spawn site traced. If instrumented: - Wrapper reads `flow.current_span.id` from contextvars - Sends it as `X-Hero-Parent-Span` header with the JSON-RPC request - hero_logic's executor injects it into the child Play's span socket - Child's `@flow` root span uses it as `parent_id` Result: nested tree in the viewer. Same mechanism whether the child is another Python flow or a generated client call. ## Scope / definition of done 1. `hero_tracing.py` emitted by the Python generator, identical content per repo 2. Generated clients remain pure — no tracing imports or decoration (this is a **delta** from the original issue draft) 3. `instrument(client)` helper wraps any client with per-method spans via `__getattr__` proxy 4. When `HERO_FLOW_SPAN_SOCK` is unset, all tracing helpers are silent no-ops (no stderr, no stdout) 5. contextvars-based parent-span propagation works across `asyncio.create_task` / `gather` 6. Cross-process parent-span propagation via `X-Hero-Parent-Span` header in outgoing RPC 7. Sample end-to-end: a Python flow with 3 `with flow.step(...)` blocks and 1 `instrument`ed client making 6 RPC calls produces a 9-span tree rooted at the `@flow` root ## Downstream hero_logic gets: - `Span` rootobject (parent_id, start/end, name, status, tags, logs) - `Play.spans: [Span]` (replaces `node_runs`) - A span collector on the executor that listens on `HERO_FLOW_SPAN_SOCK` per-Play These land in hero_logic#11. ## Related - hero_rpc#29 (prerequisite — service methods in generated client) - hero_logic#10 (epic tracking the Python-flow work) - hero_logic `docs/flows-as-python.md`
Author
Owner

Closing — scope moves to hero_logic.

After further design review: the tracing helpers (@flow, flow.step, flow.Failed, instrument(), socket plumbing, contextvars propagation) are service-agnostic generic code. Having hero_rpc emit them alongside every service's client would mean identical-bytes duplication across every Hero repo.

The cleaner architecture: hero_tracing.py is a hand-written module in hero_logic (the service that owns the flow concept), embedded in the hero_logic_server binary, staged at ~/.hero/var/flows/sdk/hero_tracing.py on startup, and PYTHONPATH-injected into flow subprocesses by the executor.

hero_rpc stays focused on its actual job: typed RPC client codegen. Pure clients, per service, that work standalone without any tracing knowledge. Issue #29 is the only hero_rpc work needed.

Scope moved to hero_logic#11 (Foundation story), which now includes:

  • Authoring hero_tracing.py (generic module)
  • Staging it at a well-known path
  • Injecting PYTHONPATH when spawning flow subprocesses
Closing — scope moves to hero_logic. After further design review: the tracing helpers (`@flow`, `flow.step`, `flow.Failed`, `instrument()`, socket plumbing, contextvars propagation) are **service-agnostic generic code**. Having hero_rpc emit them alongside every service's client would mean identical-bytes duplication across every Hero repo. The cleaner architecture: hero_tracing.py is a hand-written module in hero_logic (the service that owns the flow concept), embedded in the hero_logic_server binary, staged at `~/.hero/var/flows/sdk/hero_tracing.py` on startup, and PYTHONPATH-injected into flow subprocesses by the executor. hero_rpc stays focused on its actual job: typed RPC client codegen. Pure clients, per service, that work standalone without any tracing knowledge. Issue #29 is the only hero_rpc work needed. Scope moved to **hero_logic#11** (Foundation story), which now includes: - Authoring `hero_tracing.py` (generic module) - Staging it at a well-known path - Injecting PYTHONPATH when spawning flow subprocesses
timur closed this issue 2026-04-21 14:37:15 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_rpc#30
No description provided.