Logger lacks a drain/flush-sync API (root cause of read-after-write races) #141
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_proc#141
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
herolib_core's file logger has no drain / flush-sync / read-after-write API. Writes are buffered and flushed asynchronously, and there is no way for a caller to wait until "everything written so far is on disk and queryable."This single gap is the root cause behind several work-arounds in the hero_proc integration suite (and a real, if minor, product sharp edge):
run_allconcurrency cap (default 3,HERO_PROC_TEST_CONCURRENCY) — needed because unbounded concurrency outruns the logger.poll_untilscattered across ~40 read sites — tests must poll because a log/count/file read immediately after a job can see stale/missing data.clean_test_data_removes_everythingFS→store assertion shift — a deleted log dir can be recreated by a late buffered flush, so a raw-filesystem "is it empty" check is inherently racy. The test now asserts the queryable store instead.Why it keeps resurfacing
It is a product gap, not just test flakiness: any consumer that writes logs and then immediately reads/deletes them can observe the same race. The suite just hits it hardest because it does delete-then-assert-empty in tight loops.
Proposed follow-up
Add a flush/drain primitive to the logger in
herolib_core(e.g.LogStore::flush()/drain()that blocks until the in-flight buffer is persisted), then:poll_untilshims and revert the cleanup test to a direct assertion.Context
Surfaced while stabilizing the integration suite for #136 (PR #139). PR #139 fixes the real server-side log-leak bugs and works around the logger limitation; this issue tracks the underlying drain-API gap as the durable fix.