[SIGNIFICANT] prepare_restart leaves gap with no monitoring or log capture #26

Open
opened 2026-05-11 10:52:02 +00:00 by thabeta · 1 comment
Owner

Problem

During prepare_restart, services are left as orphans while the old server exits and the new one starts. The reconnection process has several issues:

  1. No log capture during the gap -- new log buffer is initialized, old ring buffer logs are lost
  2. No health checks running during the gap -- a service could die undetected
  3. 500ms polling interval for restored process monitoring -- a fast-crashing service could restart multiple times before detection
  4. No event stream during the gap -- clients polling status see stale data

Impact

If a critical service crashes during the restart window, it goes undetected until the new supervisor polls. Log history from the old instance is lost.

Files

  • crates/my_init_server/src/main.rs -- PrepareRestart handling
  • crates/my_init_server/src/supervisor/mod.rs -- reconnect_restored_services

Suggested Fix

  • Preserve log buffers across restart (via FD preservation or state serialization)
  • Reduce polling interval for restored services
  • Have PID1 detect service death during the gap and report it to the new instance
## Problem During `prepare_restart`, services are left as orphans while the old server exits and the new one starts. The reconnection process has several issues: 1. **No log capture** during the gap -- new log buffer is initialized, old ring buffer logs are lost 2. **No health checks** running during the gap -- a service could die undetected 3. **500ms polling interval** for restored process monitoring -- a fast-crashing service could restart multiple times before detection 4. **No event stream** during the gap -- clients polling status see stale data ## Impact If a critical service crashes during the restart window, it goes undetected until the new supervisor polls. Log history from the old instance is lost. ## Files - `crates/my_init_server/src/main.rs` -- `PrepareRestart` handling - `crates/my_init_server/src/supervisor/mod.rs` -- `reconnect_restored_services` ## Suggested Fix - Preserve log buffers across restart (via FD preservation or state serialization) - Reduce polling interval for restored services - Have PID1 detect service death during the gap and report it to the new instance
Member

Classification: valid-bug — prepare_restart leaves services unmonitored during gap, log history lost, no health checks.

Per the issue description: during restart gap no health checks or monitoring runs. A service crash goes undetected. Ring buffer logs are lost when the new server initializes a fresh buffer.

> Classification: valid-bug — prepare_restart leaves services unmonitored during gap, log history lost, no health checks. Per the issue description: during restart gap no health checks or monitoring runs. A service crash goes undetected. Ring buffer logs are lost when the new server initializes a fresh buffer.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
geomind_code/my_init#26
No description provided.