Populate action depends_on from service.toml [[dependencies]] so the supervisor enforces start-ordering (not just lab) #135

Open
opened 2026-06-03 16:41:10 +00:00 by sameh-farouk · 1 comment
Member

Summary

hero_proc's action model has a runtime dependency mechanism (ActionSpec.depends_on: Vec<ActionDep>, the WaitingDeps run state, dependency-aware shutdown ordering), but nothing populates depends_on from a service's declared service.toml [[dependencies]]. As a result, start-ordering / "ensure dependency running first" is enforced only by lab (via ensure_dependency_running), not by the supervisor itself — so it doesn't apply when a service is started by its own CLI or by anything other than lab.

Evidence

service.toml already carries a first-class dependency declaration (herolib_core::base::Dependency { repo, crate, bin, socket }). Example — hero_collab's web/CLI service.toml:

[[dependencies]]
repo   = "lhumina_code/hero_collab"
crate  = "hero_collab_server"
bin    = "hero_collab_server"
socket = "rpc.sock"

⚠️ Key-form caveat (added 2026-06-09): the on-disk .toml files actually use the full-name keys (crate_name / bin_name / socket_name, matching lab's CanonicalDep), not the short crate/bin/socket shown above. herolib_core::Dependency uses the short keys (#[serde(rename="crate")], no deny_unknown_fields), so it parses the full-name keys as unknown and captures only repo — a naive supervisor-side parse resolves every dep to "repo only" and silently no-ops. Reconcile (serde alias on herolib_core::Dependency, or migrate the files to short keys) before implementing. See the key-form comment below.

But the resulting hero_proc action has:

action.get hero_collab_web  →  depends_on: null

Meanwhile hero_proc clearly supports the concept:

  • crates/hero_proc_server/src/db/actions/model.rs:190pub depends_on: Vec<ActionDep>
  • crates/hero_proc_server/src/supervisor/mod.rs:733if !job.spec.depends_on.is_empty() { … }
  • crates/hero_proc_server/src/supervisor/mod.rs:969 — "Phase 0: advance Created/WaitingDeps runs whose cross-run dependencies are satisfied."
  • crates/hero_proc_server/src/supervisor/shutdown.rs:277 — dependency-aware shutdown ordering ("requires"/"after").

So the supervisor can order by depends_on, but the field arrives empty because the registration path (lab / per-service CLI) doesn't translate service.toml [[dependencies]] into it.

Why this matters

Today, start-ordering is enforced only on the lab path (hero_skills service_manager.rs:1171 → ensure_dependency_running, which auto-starts a missing dependency before starting the service). When a service is started by:

  • its own CLI (e.g. hero_collab --start, which registers actions directly via ServiceBuilder), or
  • a direct service.start / run.submit,

…the declared dependencies are ignored, because the supervisor's depends_on was never populated. The behaviour then differs by which tool started the service — the same root split tracked in hero_skills#308 (lab SERVICE_MAP vs service.toml/registry).

Concrete case: hero_collab in proxy auth mode requires hero_proxy (X-Hero-User injection). Declaring that as a service.toml dependency only gates the lab start path; via the CLI or a raw run-submit, collab can start before hero_proxy with no ordering guarantee.

Proposal

Populate hero_proc's action depends_on from service.toml [[dependencies]] at registration time, so the supervisor itself owns dependency ordering / "ensure running before start," independent of which client registered the service.

  • Map each Dependency { repo, crate, bin, socket } → an ActionDep referencing the dependency action/service (+ optional socket-ready gate).
  • Have the registration paths (lab ActionBuilder, the per-service-CLI ServiceBuilder, and service.define) carry the deps through, OR have hero_proc derive them from the embedded service.toml it already has access to.
  • The WaitingDeps machinery (mod.rs:969) then gates start until the dependency's socket/job is ready — uniformly, for every registrant.

This makes dependency ordering a property of the supervisor (the source of truth), not of one particular client tool, and removes the lab-vs-CLI behavioural split for dependencies.

Related: hero_skills#308 (lab/CLI register two different service shapes; dependency enforcement is one of the behaviours that diverges).

Found 2026-06-03 while investigating whether collab's service.toml could ensure hero_proxy is up before starting collab — the mechanism exists in service.toml + lab, but the supervisor doesn't enforce it.


Note (2026-06-09): the service-config DependencyDef.requires/after is also inert on start

Beyond service.toml [[dependencies]] not reaching the supervisor, the hero_proc service config's own DependencyDef { requires, wants, after, conflicts } is likewise ignored on the start path. handle_start (rpc/service.rs) only enqueues the service's own actions and never reads requires/after. Those fields are consumed only by:

  • set-time existence validation (service.set checks a requires name is defined),
  • boot autostart topological_order (orders status: start services by requires+after — but this is job-creation/dispatch order, not a readiness barrier; the poll loop then spawns concurrently and never pulls in an unwanted dependency),
  • shutdown ordering (shutdown.rs / stop.rs — the one place it genuinely works).

So service/SPECS.md documenting requires as "hard service deps" is a spec-vs-impl gap on the start path: the supervisor enforces neither service.toml [[dependencies]] nor DependencyDef.requires/after on service.start — the only supervisor-enforced ordering is action-level depends_on within a single service's run. Fixing this issue (populate action depends_on) should close both facets; worth asserting that in the acceptance criteria.

## Summary hero_proc's action model has a runtime dependency mechanism (`ActionSpec.depends_on: Vec<ActionDep>`, the `WaitingDeps` run state, dependency-aware shutdown ordering), but **nothing populates `depends_on` from a service's declared `service.toml [[dependencies]]`**. As a result, start-ordering / "ensure dependency running first" is enforced only by **lab** (via `ensure_dependency_running`), not by the supervisor itself — so it doesn't apply when a service is started by its own CLI or by anything other than lab. ## Evidence `service.toml` already carries a first-class dependency declaration (`herolib_core::base::Dependency { repo, crate, bin, socket }`). Example — hero_collab's web/CLI service.toml: ```toml [[dependencies]] repo = "lhumina_code/hero_collab" crate = "hero_collab_server" bin = "hero_collab_server" socket = "rpc.sock" ``` > ⚠️ **Key-form caveat (added 2026-06-09):** the on-disk `.toml` files actually use the **full-name** keys (`crate_name` / `bin_name` / `socket_name`, matching lab's `CanonicalDep`), **not** the short `crate`/`bin`/`socket` shown above. `herolib_core::Dependency` uses the short keys (`#[serde(rename="crate")]`, no `deny_unknown_fields`), so it parses the full-name keys as *unknown* and captures only `repo` — a naive supervisor-side parse resolves every dep to "repo only" and silently no-ops. Reconcile (serde `alias` on `herolib_core::Dependency`, or migrate the files to short keys) before implementing. See the key-form comment below. But the resulting hero_proc action has: ``` action.get hero_collab_web → depends_on: null ``` Meanwhile hero_proc clearly supports the concept: - `crates/hero_proc_server/src/db/actions/model.rs:190` — `pub depends_on: Vec<ActionDep>` - `crates/hero_proc_server/src/supervisor/mod.rs:733` — `if !job.spec.depends_on.is_empty() { … }` - `crates/hero_proc_server/src/supervisor/mod.rs:969` — "Phase 0: advance Created/WaitingDeps runs whose cross-run dependencies are satisfied." - `crates/hero_proc_server/src/supervisor/shutdown.rs:277` — dependency-aware shutdown ordering ("requires"/"after"). So the supervisor can order by `depends_on`, but the field arrives empty because the registration path (lab / per-service CLI) doesn't translate `service.toml [[dependencies]]` into it. ## Why this matters Today, start-ordering is enforced **only on the lab path** (`hero_skills` `service_manager.rs:1171 → ensure_dependency_running`, which auto-starts a missing dependency before starting the service). When a service is started by: - its **own CLI** (e.g. `hero_collab --start`, which registers actions directly via `ServiceBuilder`), or - a direct `service.start` / `run.submit`, …the declared dependencies are ignored, because the supervisor's `depends_on` was never populated. The behaviour then differs by *which tool* started the service — the same root split tracked in hero_skills#308 (lab `SERVICE_MAP` vs `service.toml`/registry). Concrete case: hero_collab in `proxy` auth mode requires `hero_proxy` (X-Hero-User injection). Declaring that as a `service.toml` dependency only gates the lab start path; via the CLI or a raw run-submit, collab can start before hero_proxy with no ordering guarantee. ## Proposal Populate hero_proc's action `depends_on` from `service.toml [[dependencies]]` at registration time, so the **supervisor itself** owns dependency ordering / "ensure running before start," independent of which client registered the service. - Map each `Dependency { repo, crate, bin, socket }` → an `ActionDep` referencing the dependency action/service (+ optional socket-ready gate). - Have the registration paths (lab `ActionBuilder`, the per-service-CLI `ServiceBuilder`, and `service.define`) carry the deps through, OR have hero_proc derive them from the embedded `service.toml` it already has access to. - The `WaitingDeps` machinery (`mod.rs:969`) then gates start until the dependency's socket/job is ready — uniformly, for every registrant. This makes dependency ordering a property of the supervisor (the source of truth), not of one particular client tool, and removes the lab-vs-CLI behavioural split for dependencies. Related: hero_skills#308 (lab/CLI register two different service shapes; dependency enforcement is one of the behaviours that diverges). Found 2026-06-03 while investigating whether collab's service.toml could ensure hero_proxy is up before starting collab — the mechanism exists in service.toml + lab, but the supervisor doesn't enforce it. --- ## Note (2026-06-09): the service-config `DependencyDef.requires/after` is *also* inert on start Beyond `service.toml [[dependencies]]` not reaching the supervisor, the hero_proc **service config's own** `DependencyDef { requires, wants, after, conflicts }` is likewise ignored on the start path. `handle_start` (`rpc/service.rs`) only enqueues the service's *own* actions and never reads `requires`/`after`. Those fields are consumed only by: - **set-time existence validation** (`service.set` checks a `requires` name is defined), - **boot autostart** `topological_order` (orders `status: start` services by `requires`+`after` — but this is *job-creation/dispatch order*, not a readiness barrier; the poll loop then spawns concurrently and never pulls in an unwanted dependency), - **shutdown ordering** (`shutdown.rs` / `stop.rs` — the one place it genuinely works). So `service/SPECS.md` documenting `requires` as "hard service deps" is a spec-vs-impl gap on the start path: **the supervisor enforces neither `service.toml [[dependencies]]` nor `DependencyDef.requires/after` on `service.start`** — the *only* supervisor-enforced ordering is **action-level `depends_on`** within a single service's run. Fixing this issue (populate action `depends_on`) should close both facets; worth asserting that in the acceptance criteria.
Author
Member

Heads-up before implementing: the on-disk dependency key form differs from herolib_core::Dependency

While applying this model to hero_livekit, I hit a key-name split that will silently break the supervisor-side enforcement this issue proposes, unless reconciled.

Two parsers, two key forms — for the same [[dependencies]] block

herolib_core's Dependency (what the supervisor/binary would parse) expects the short keys:

// hero_lib/crates/core/src/base/service.rs
pub struct Dependency {
    pub repo: String,
    #[serde(rename = "crate")] pub crate_name: Option<String>, // TOML key: `crate`
    pub bin: Option<String>,                                    // TOML key: `bin`
    pub socket: Option<String>,                                 // TOML key: `socket`
}

lab's CanonicalDep expects the full-name keys, by design:

// hero_skills lab/src/service/model.rs:12-13
// "The on-disk keys are the full-name form (`crate_name`, `socket_type`) — no serde renames."

But the real files use the full-name keys

hero_collab and hero_livekit both write crate_name / bin_name / socket_name:

# hero_collab_admin/service.toml
[[dependencies]]
repo        = "lhumina_code/hero_collab"
crate_name  = "hero_collab_server"
socket_name = "rpc.sock"
# hero_livekit_admin/service.toml
[[dependencies]]
repo       = "lhumina_code/hero_livekit"
crate_name = "hero_livekit_server"
bin_name   = "hero_livekit_server"

Consequence

  • lab parses these correctly today (it's the current enforcer via ensure_dependency_running).
  • herolib_core's Dependency sees crate_name/bin_name/socket_name as unknown fields (no deny_unknown_fields), so it captures only repo and drops crate/bin/socketNone.

So if the supervisor populates depends_on by parsing the embedded service.toml with herolib_core::Dependency, every existing [[dependencies]] block resolves to "repo only, no target" and the enforcement silently no-ops. Note this issue's own example above uses the short keys (crate/bin/socket), which is not what the files on disk actually contain.

Suggestion

Pick one canonical key form and reconcile before/with this work:

  • add #[serde(alias = "crate_name")] / alias = "bin_name" / alias = "socket_name" to herolib_core::Dependency (lowest-churn, keeps existing files valid), or
  • migrate the files to the short keys and align lab's CanonicalDep.

Either way, add a fixture test asserting a real on-disk [[dependencies]] block round-trips through whichever struct the supervisor uses.

Related: hero_skills#308 (lab vs service.toml/registry shape split — this is the dependency-field instance of it), hero_livekit#46 / #47 (where this surfaced).

## Heads-up before implementing: the on-disk dependency key form differs from `herolib_core::Dependency` While applying this model to hero_livekit, I hit a key-name split that will silently break the supervisor-side enforcement this issue proposes, unless reconciled. ### Two parsers, two key forms — for the same `[[dependencies]]` block **herolib_core's `Dependency`** (what the supervisor/binary would parse) expects the *short* keys: ```rust // hero_lib/crates/core/src/base/service.rs pub struct Dependency { pub repo: String, #[serde(rename = "crate")] pub crate_name: Option<String>, // TOML key: `crate` pub bin: Option<String>, // TOML key: `bin` pub socket: Option<String>, // TOML key: `socket` } ``` **lab's `CanonicalDep`** expects the *full-name* keys, by design: ``` // hero_skills lab/src/service/model.rs:12-13 // "The on-disk keys are the full-name form (`crate_name`, `socket_type`) — no serde renames." ``` ### But the real files use the full-name keys hero_collab and hero_livekit both write `crate_name` / `bin_name` / `socket_name`: ```toml # hero_collab_admin/service.toml [[dependencies]] repo = "lhumina_code/hero_collab" crate_name = "hero_collab_server" socket_name = "rpc.sock" ``` ```toml # hero_livekit_admin/service.toml [[dependencies]] repo = "lhumina_code/hero_livekit" crate_name = "hero_livekit_server" bin_name = "hero_livekit_server" ``` ### Consequence - **lab** parses these correctly today (it's the current enforcer via `ensure_dependency_running`). - **herolib_core's `Dependency`** sees `crate_name`/`bin_name`/`socket_name` as *unknown* fields (no `deny_unknown_fields`), so it captures only `repo` and drops `crate`/`bin`/`socket` → `None`. So if the supervisor populates `depends_on` by parsing the embedded `service.toml` with `herolib_core::Dependency`, **every existing `[[dependencies]]` block resolves to "repo only, no target"** and the enforcement silently no-ops. Note this issue's own example above uses the short keys (`crate`/`bin`/`socket`), which is *not* what the files on disk actually contain. ### Suggestion Pick one canonical key form and reconcile before/with this work: - add `#[serde(alias = "crate_name")]` / `alias = "bin_name"` / `alias = "socket_name"` to `herolib_core::Dependency` (lowest-churn, keeps existing files valid), **or** - migrate the files to the short keys and align lab's `CanonicalDep`. Either way, add a fixture test asserting a real on-disk `[[dependencies]]` block round-trips through whichever struct the supervisor uses. Related: hero_skills#308 (lab vs service.toml/registry shape split — this is the dependency-field instance of it), hero_livekit#46 / #47 (where this surfaced).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_proc#135
No description provided.