[SIGNIFICANT] max_restarts=0 means unlimited -- no crash-loop protection #27

Open
opened 2026-05-11 10:52:02 +00:00 by thabeta · 1 comment
Owner

Problem

max_restarts = 0 means unlimited restarts. A service with restart: "always" that crash-loops will restart forever. Exponential backoff caps at restart_delay_max_ms (typically 60s), so a service that crashes after 1s of uptime with 1s max backoff would restart 60 times per minute, forever.

There's no rate-limiting beyond the backoff cap. No "crash-loop detected" state. No emergency shutdown after N crashes in M minutes.

Impact

  • CPU waste from infinite restart loops
  • Log volume from constant restart messages
  • Could mask real issues (the service is broken but keeps restarting)
  • Potential resource exhaustion

Files

  • crates/my_init_server/src/graph/service.rs -- should_restart, next_restart_delay

Suggested Fix

  • Add a crash-loop detection mechanism: if a service restarts N times within M minutes, enter a CrashLoopBackoff state with extended delays
  • Add max_crashes_per_hour config option
  • After extended crash-looping, transition to Failed and alert
## Problem `max_restarts = 0` means unlimited restarts. A service with `restart: "always"` that crash-loops will restart forever. Exponential backoff caps at `restart_delay_max_ms` (typically 60s), so a service that crashes after 1s of uptime with 1s max backoff would restart 60 times per minute, forever. There's no rate-limiting beyond the backoff cap. No "crash-loop detected" state. No emergency shutdown after N crashes in M minutes. ## Impact - CPU waste from infinite restart loops - Log volume from constant restart messages - Could mask real issues (the service is broken but keeps restarting) - Potential resource exhaustion ## Files - `crates/my_init_server/src/graph/service.rs` -- `should_restart`, `next_restart_delay` ## Suggested Fix - Add a crash-loop detection mechanism: if a service restarts N times within M minutes, enter a `CrashLoopBackoff` state with extended delays - Add `max_crashes_per_hour` config option - After extended crash-looping, transition to Failed and alert
Member

Classification: valid-bug — max_restarts=0 means unlimited restarts; a crash-looping service with restart:always restarts forever with no rate-limiting beyond backoff cap.

per issue description referencing crates/my_init_server/src/graph/service.rs: exponential backoff caps at restart_delay_max_ms but there is no crash-loop detection or max_crashes_per_hour config. A service that crashes after 1s of uptime restarts 60 times per minute indefinitely.

> Classification: valid-bug — max_restarts=0 means unlimited restarts; a crash-looping service with restart:always restarts forever with no rate-limiting beyond backoff cap. per issue description referencing crates/my_init_server/src/graph/service.rs: exponential backoff caps at restart_delay_max_ms but there is no crash-loop detection or max_crashes_per_hour config. A service that crashes after 1s of uptime restarts 60 times per minute indefinitely.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
geomind_code/my_init#27
No description provided.