Integration tests leak scheduled actions into persistent hero_proc.db #126
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_proc#126
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Integration tests register scheduled actions (cron and interval) into the persistent
actionstable at /home/pctwo/hero/var/hero_proc.db. The tests never clean up on exit, so the actions survive in the database. On the nexthero_proc_serverrestart the supervisor loads them from disk and immediately begins firing them at full cadence. Today a single workstation had 45 such leaked actions (names likesched-interval-short-8398,sched-rapid-8398,sched-multi-context,sched19-66171, plus 14sched-window-*-8398variants) that together created 2,429 jobs in 1h17m of supervisor uptime, 9,276 invalid-cron WARN entries, sustained 83% CPU, and exhausted the RPC budget within roughly 45 minutes of every restart. Recovery isDELETE FROM actions WHERE name LIKE 'sched-%' OR name LIKE 'sched%-66171'plusVACUUM, which reduces the database from 352 MB to 909 KB. Two structural fix options to consider: (a) the integration test harness uses an ephemeral test database under a temp directory and never touches the operator database, or (b) every test-registered action carries atest=truetag and the supervisor purges alltest=trueactions on startup before resuming the scheduler.Signed-by: mik-tf mik-tf@noreply.invalid
hero_proc.dbhas no retention or VACUUM scheduling — grows unboundedly on long-lived daemons #131