hero_log: sources_in_root cache (5s TTL) makes newly-created source dirs invisible to wildcard queries #142
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_lib#142
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Symptom
functional::runs::logs_query_by_service_src_wildcardpasses in a fresh sandbox but fails when run after another test that inserts logs against the same daemon. Specifically: afterstructured_logs_all_levelspopulatescore.run_log_levels.*entries, a subsequent wildcard query forcore.run-svc-query.*(a different src family) returns 0 entries even though the test inserts 12 of them.Repro
Root cause
In
herolib_core/src/logger/query.rs::search_logs(lines 138-164), there is a fast path that skips the source-tree walk only whensrc_prefixresolves to an exact existing source directory:For a wildcard query like
core.run_svc_query(which is a 2-segment prefix that doesn't name a single directory because entries are stored ascore/run_svc_query/main/and.../worker/), the code falls back tosources_in_root— a cached source listing with a 5-second TTL.The cache is populated the first time anyone queries that root. Once populated, new source directories created AFTER the cache population are INVISIBLE until the TTL expires. The test inserts 12 entries (creating new src dirs
run_svc_query.mainandrun_svc_query.worker) and waits only 300 ms before querying — well within the 5 s TTL, so the new dirs aren't returned bysources_in_rootand the wildcard match retains nothing.The fast path comment even documents this:
Why this is in
herolib_core, nothero_procThe cache lives in
herolib_core::logger::query, shared by every hero_log consumer. Any wildcard query on a freshly-created source family will hit the same staleness window across the whole ecosystem (not just hero_proc tests). Fix must land in herolib_core.Possible fixes (in priority order)
sources_in_rootcache on log writes — when a new src is registered (a new directory created), bump a counter that the cache reader checks before returning. Most direct.src_prefix = ""(list-all). Anything else walks the directory tree once.force_refresh()API — let consumers opt out per query.Related
Surfaced while fixing the rest of the runs/logs cluster (hero_proc commits land separately).
The structured_logs and wildcard tests individually pass; only sequential runs trigger this. uc15 oneshot logs (
hero_proc#117) is a different bug (executor flush ordering, not cache TTL).cc the herolib_core logger owner.