Grounded answers are empty on a fresh install (cache replay never populates search) #148

Open
opened 2026-06-01 03:22:30 +00:00 by mik-tf · 0 comments
Owner

On a freshly provisioned machine the four default libraries clone and browse as books, but semantic search and the AI summary return nothing because no vectors are ever written for them. The startup cache replay does run, but it only loads a library's committed question-and-answer cache when every page in that library matches, and it treats a page as a match only when the cache's stored content hash equals the hash the memory service computes for the freshly scanned page. On a clean clone those hashes do not line up (the source content has moved on since the caches were generated), so the replay silently skips every library and writes nothing, leaving search empty. The all-or-nothing rule per library makes this worse, because a single changed page disables replay for the whole library. Options to fix: regenerate and commit fresh caches that match the current content, make the page hashing stable between publish time and scan time so the hashes agree, or replay the pages that do match instead of requiring all of them. Until one of these lands, a fresh tester has browsable books but no grounded answers without a manual, paid re-ingest.

Signed-by: mik-tf mik-tf@noreply.invalid

On a freshly provisioned machine the four default libraries clone and browse as books, but semantic search and the AI summary return nothing because no vectors are ever written for them. The startup cache replay does run, but it only loads a library's committed question-and-answer cache when every page in that library matches, and it treats a page as a match only when the cache's stored content hash equals the hash the memory service computes for the freshly scanned page. On a clean clone those hashes do not line up (the source content has moved on since the caches were generated), so the replay silently skips every library and writes nothing, leaving search empty. The all-or-nothing rule per library makes this worse, because a single changed page disables replay for the whole library. Options to fix: regenerate and commit fresh caches that match the current content, make the page hashing stable between publish time and scan time so the hashes agree, or replay the pages that do match instead of requiring all of them. Until one of these lands, a fresh tester has browsable books but no grounded answers without a manual, paid re-ingest. Signed-by: mik-tf <mik-tf@noreply.invalid>
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_books#148
No description provided.