logs.get returns empty for completed jobs — likely log-flush race in executor (uc15 + runs::structured_logs/wildcard cluster) #117
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_proc#117
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Symptom
A
service_define_oneshotjob that completes successfully has emptylogs.getoutput, even when the script body explicitlyechos a recognizable marker.uc15_oneshot_service_runs_to_completioninhero_proc_testexhibits this:After today's
477889afix (which got the service-define+children plumbing working),uc15now reaches the log-assertion step but the logs vector is empty. The job ran (state transitioned to inactive/succeeded), the script DID exit 0, butlogs.getreturns nothing.Repro
Likely scope
Smells like a log-flush ordering bug: the executor marks the job terminal before the log sink finishes flushing the stdout buffer to SQLite. Related symptoms in the same test run:
runs::structured_logs_all_levels—level=0 (debug) should have >=1 log entry, got 0runs::logs_query_by_service_src_wildcard—wildcard src query should return >=12 entries, got 0All three are "wrote logs, can't read them back" failures. Worth investigating as a cluster — possibly the same root cause.
Surface to investigate
crates/hero_proc_server/src/supervisor/executor.rs::run_job— where stdout/stderr are piped andapply_exit_statusis called. Verify the log sink is awaited/flushed before phase transitions to terminal.crates/hero_proc_server/src/logging/—LogSink/LogStoreflush semantics.crates/hero_proc_server/src/rpc/logs.rs(if exists) — whatlogs.getactually reads.Why I didn't bundle this with #113
#113 was
service.defineinterpreter +service.childrenjob_id — both inrpc/service.rs+service/define.rs. This islogging/+supervisor/executor.rs— different surface, different root cause, different reviewer (likely @maintainer for the executor half).Surfaced while landing #113. Worth its own ticket.
Closing as superseded —
uc15_oneshot_service_runs_to_completionand theruns::structured_logs/wildcard cluster now pass (test-level read-after-write polling added during the suite hardening). The underlying root cause (the logger has no drain/flush-sync API) is tracked in #141; the symptom-level repro here is resolved.