[P1] 150s reconciler stale-kill regression has no test gate #39

Open
opened 2026-05-23 21:52:24 +00:00 by thabeta · 0 comments
Owner

Problem
The reconciler once killed live-but-slow jobs at the 150s threshold; fixed by heartbeating the running job row, but no regression test fences the threshold/heartbeat cadence — a future "optimization" could silently reintroduce it.

Evidence

  • crates/hero_shrimp_server/src/rpc/methods/job/proof_run.rs (heartbeat); ARCHITECTURE_CLEANUP_PLAN.md.

Proposed fix
Add a test that simulates a slow job emitting heartbeats and asserts the reconciler does not mark it stale before the deadline.


Filed from a comparative audit of Hero Shrimp vs Qwen-Code / kimi-cli / picoclaw (2026-05-23). Severity in title: P0=correctness/trust, P1=reliability/UX, P2=cleanup.

**Problem** The reconciler once killed live-but-slow jobs at the 150s threshold; fixed by heartbeating the running job row, but no regression test fences the threshold/heartbeat cadence — a future "optimization" could silently reintroduce it. **Evidence** - `crates/hero_shrimp_server/src/rpc/methods/job/proof_run.rs` (heartbeat); `ARCHITECTURE_CLEANUP_PLAN.md`. **Proposed fix** Add a test that simulates a slow job emitting heartbeats and asserts the reconciler does not mark it stale before the deadline. --- _Filed from a comparative audit of Hero Shrimp vs Qwen-Code / kimi-cli / picoclaw (2026-05-23). Severity in title: P0=correctness/trust, P1=reliability/UX, P2=cleanup._
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_shrimp#39
No description provided.