service add-job overwrites existing actions; daemon exit-0 skips retry_policy #106
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_proc#106
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
cmd_add_job builds a fresh ServiceSpec with actions: vec![name], so a second add-job under the same service replaces the start trigger (e.g. hero_aibroker_admin clobbers hero_aibroker_server).
apply_exit_status short-circuits on status.success() for process jobs and marks them Failed without consulting retry_policy, so a daemon that voluntarily exits 0 never retries.
03e7ed8Mixed: code fix landed but end-to-end retry still doesn't behave as expected.
Kristof's
03e7ed8cleanly fixed both halves at the source level:cmd_add_jobincrates/hero_proc/src/cli/commands.rs:775-830now fetches the existing service and merges intomerged_actionsinstead of clobbering.apply_exit_statusincrates/hero_proc_server/src/supervisor/executor.rs:1095-1170treatsis_processdaemons exiting 0 asunexpectedand routes throughretry_policy.But the covering test fails empirically against a freshly built
hero_proc_serverfromorigin/development719ba10:So a job that should retry once (max_attempts: 2) only runs a single attempt. The
should_retrybranch evaluatesjob.attempt < rp.max_attempts— possibly an attempt-counter issue (related to HP-04, restart attempt counter resets across daemon restarts). Not isolated yet. Issue stays open while the retry path is end-to-end-debugged.Re-verified on kristof5 under canonical setup — CONFIRMED real defect, not sandbox artifact.
Ran
hero_proc_test --basic --functionalon kristof5 (canonical bootstrap, hero_proc auto-started via lab, binary from yesterday'slatestpublish). Result:Same failure shape as local sandbox. So Kristof's
03e7ed8fixed the code-level routing (exit-0 now goes through retry_policy branch), but end-to-end the retry doesn't fire a second attempt. Likely related to HP-04 (restart attempt counter resets every daemon boot —supervisor/mod.rs:405-498); when the supervisor crashes/restarts mid-retry it loses the attempt counter.Real defect, not a sandbox artifact. Reverting my earlier "provisional" caveat.
Done — both halves are fixed on
development:cmd_add_job(hero_proc_clicommands.rs) now buildsmerged_actions.push(name)(merges into the existing action set) instead ofactions: vec![name].apply_exit_status(executor.rs) now computesunexpected = !status.success() || job.is_process, so daemon exit-0 routes through the same retry path as non-zero andretry_policyapplies.Closing as resolved.