[CRITICAL] waitpid on non-child processes in stop_and_wait #17

Open
opened 2026-05-11 10:52:00 +00:00 by thabeta · 1 comment
Owner

Problem

stop_and_wait in api.rs calls waitpid(Pid::from_raw(pid as i32), WNOHANG) on processes that were not spawned by the current server instance. This happens for services restored after prepare_restart.

waitpid only works on direct children of the calling process. For non-child processes, it returns ECHILD immediately. The code treats ECHILD as "process already reaped" (sets reaped = true), which means stop_and_wait skips SIGKILL and returns success for a still-running process.

Impact

During cascade stops for remove_service or similar operations, restored services that are actually running are incorrectly reported as stopped. The calling code proceeds to delete the service while the process is still alive, leaving an orphan.

Files

  • crates/my_init_server/src/supervisor/api.rs -- stop_and_wait method

Suggested Fix

Use process_exists(pid) polling instead of waitpid for non-child processes. Alternatively, track which processes were spawned by the current instance vs restored, and use different wait strategies.

## Problem `stop_and_wait` in `api.rs` calls `waitpid(Pid::from_raw(pid as i32), WNOHANG)` on processes that were **not spawned by the current server instance**. This happens for services restored after `prepare_restart`. `waitpid` only works on direct children of the calling process. For non-child processes, it returns `ECHILD` immediately. The code treats `ECHILD` as "process already reaped" (sets `reaped = true`), which means `stop_and_wait` skips SIGKILL and returns success for a **still-running process**. ## Impact During cascade stops for `remove_service` or similar operations, restored services that are actually running are incorrectly reported as stopped. The calling code proceeds to delete the service while the process is still alive, leaving an orphan. ## Files - `crates/my_init_server/src/supervisor/api.rs` -- `stop_and_wait` method ## Suggested Fix Use `process_exists(pid)` polling instead of `waitpid` for non-child processes. Alternatively, track which processes were spawned by the current instance vs restored, and use different wait strategies.
Member

Classification: valid-bug — waitpid on restored (non-child) processes treats ECHILD as "already reaped", leaving orphaned processes.

Confirmed by code inspection at crates/my_init_server/src/supervisor/api.rs:327-346. stop_and_wait calls waitpid(WNOHANG) on any process including restored (non-child) ones. waitpid on non-children returns ECHILD, which is treated as "already reaped" at line 336-338, causing skip of SIGKILL. The process continues running while the supervisor believes it is stopped. Only affects restored services after prepare_restart.

> Classification: valid-bug — waitpid on restored (non-child) processes treats ECHILD as "already reaped", leaving orphaned processes. Confirmed by code inspection at crates/my_init_server/src/supervisor/api.rs:327-346. stop_and_wait calls waitpid(WNOHANG) on any process including restored (non-child) ones. waitpid on non-children returns ECHILD, which is treated as "already reaped" at line 336-338, causing skip of SIGKILL. The process continues running while the supervisor believes it is stopped. Only affects restored services after prepare_restart.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
geomind_code/my_init#17
No description provided.