Populate action depends_on from service.toml [[dependencies]] so the supervisor enforces start-ordering (not just lab) #135
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_proc#135
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
hero_proc's action model has a runtime dependency mechanism (
ActionSpec.depends_on: Vec<ActionDep>, theWaitingDepsrun state, dependency-aware shutdown ordering), but nothing populatesdepends_onfrom a service's declaredservice.toml [[dependencies]]. As a result, start-ordering / "ensure dependency running first" is enforced only by lab (viaensure_dependency_running), not by the supervisor itself — so it doesn't apply when a service is started by its own CLI or by anything other than lab.Evidence
service.tomlalready carries a first-class dependency declaration (herolib_core::base::Dependency { repo, crate, bin, socket }). Example — hero_collab's web/CLI service.toml:But the resulting hero_proc action has:
Meanwhile hero_proc clearly supports the concept:
crates/hero_proc_server/src/db/actions/model.rs:190—pub depends_on: Vec<ActionDep>crates/hero_proc_server/src/supervisor/mod.rs:733—if !job.spec.depends_on.is_empty() { … }crates/hero_proc_server/src/supervisor/mod.rs:969— "Phase 0: advance Created/WaitingDeps runs whose cross-run dependencies are satisfied."crates/hero_proc_server/src/supervisor/shutdown.rs:277— dependency-aware shutdown ordering ("requires"/"after").So the supervisor can order by
depends_on, but the field arrives empty because the registration path (lab / per-service CLI) doesn't translateservice.toml [[dependencies]]into it.Why this matters
Today, start-ordering is enforced only on the lab path (
hero_skillsservice_manager.rs:1171 → ensure_dependency_running, which auto-starts a missing dependency before starting the service). When a service is started by:hero_collab --start, which registers actions directly viaServiceBuilder), orservice.start/run.submit,…the declared dependencies are ignored, because the supervisor's
depends_onwas never populated. The behaviour then differs by which tool started the service — the same root split tracked in hero_skills#308 (labSERVICE_MAPvsservice.toml/registry).Concrete case: hero_collab in
proxyauth mode requireshero_proxy(X-Hero-User injection). Declaring that as aservice.tomldependency only gates the lab start path; via the CLI or a raw run-submit, collab can start before hero_proxy with no ordering guarantee.Proposal
Populate hero_proc's action
depends_onfromservice.toml [[dependencies]]at registration time, so the supervisor itself owns dependency ordering / "ensure running before start," independent of which client registered the service.Dependency { repo, crate, bin, socket }→ anActionDepreferencing the dependency action/service (+ optional socket-ready gate).ActionBuilder, the per-service-CLIServiceBuilder, andservice.define) carry the deps through, OR have hero_proc derive them from the embeddedservice.tomlit already has access to.WaitingDepsmachinery (mod.rs:969) then gates start until the dependency's socket/job is ready — uniformly, for every registrant.This makes dependency ordering a property of the supervisor (the source of truth), not of one particular client tool, and removes the lab-vs-CLI behavioural split for dependencies.
Related: hero_skills#308 (lab/CLI register two different service shapes; dependency enforcement is one of the behaviours that diverges).
Found 2026-06-03 while investigating whether collab's service.toml could ensure hero_proxy is up before starting collab — the mechanism exists in service.toml + lab, but the supervisor doesn't enforce it.
Note (2026-06-09): the service-config
DependencyDef.requires/afteris also inert on startBeyond
service.toml [[dependencies]]not reaching the supervisor, the hero_proc service config's ownDependencyDef { requires, wants, after, conflicts }is likewise ignored on the start path.handle_start(rpc/service.rs) only enqueues the service's own actions and never readsrequires/after. Those fields are consumed only by:service.setchecks arequiresname is defined),topological_order(ordersstatus: startservices byrequires+after— but this is job-creation/dispatch order, not a readiness barrier; the poll loop then spawns concurrently and never pulls in an unwanted dependency),shutdown.rs/stop.rs— the one place it genuinely works).So
service/SPECS.mddocumentingrequiresas "hard service deps" is a spec-vs-impl gap on the start path: the supervisor enforces neitherservice.toml [[dependencies]]norDependencyDef.requires/afteronservice.start— the only supervisor-enforced ordering is action-leveldepends_onwithin a single service's run. Fixing this issue (populate actiondepends_on) should close both facets; worth asserting that in the acceptance criteria.Heads-up before implementing: the on-disk dependency key form differs from
herolib_core::DependencyWhile applying this model to hero_livekit, I hit a key-name split that will silently break the supervisor-side enforcement this issue proposes, unless reconciled.
Two parsers, two key forms — for the same
[[dependencies]]blockherolib_core's
Dependency(what the supervisor/binary would parse) expects the short keys:lab's
CanonicalDepexpects the full-name keys, by design:But the real files use the full-name keys
hero_collab and hero_livekit both write
crate_name/bin_name/socket_name:Consequence
ensure_dependency_running).Dependencyseescrate_name/bin_name/socket_nameas unknown fields (nodeny_unknown_fields), so it captures onlyrepoand dropscrate/bin/socket→None.So if the supervisor populates
depends_onby parsing the embeddedservice.tomlwithherolib_core::Dependency, every existing[[dependencies]]block resolves to "repo only, no target" and the enforcement silently no-ops. Note this issue's own example above uses the short keys (crate/bin/socket), which is not what the files on disk actually contain.Suggestion
Pick one canonical key form and reconcile before/with this work:
#[serde(alias = "crate_name")]/alias = "bin_name"/alias = "socket_name"toherolib_core::Dependency(lowest-churn, keeps existing files valid), orCanonicalDep.Either way, add a fixture test asserting a real on-disk
[[dependencies]]block round-trips through whichever struct the supervisor uses.Related: hero_skills#308 (lab vs service.toml/registry shape split — this is the dependency-field instance of it), hero_livekit#46 / #47 (where this surfaced).