zinit shutdown feature #38
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
source scripts/build_lib.sh 2>/dev/null && cargo_env && cargo build --release --workspace
Finished
releaseprofile [optimized] target(s) in 0.11sInstalled zinit -> /Users/despiegk/hero/bin/zinit
Installed zinit_server -> /Users/despiegk/hero/bin/zinit_server
Installed zinit_ui -> /Users/despiegk/hero/bin/zinit_ui
Installed zinit_pid1 -> /Users/despiegk/hero/bin/zinit_pid1
Requesting graceful shutdown via zinit CLI...
Requesting zinit daemon shutdown...
Shutdown accepted, waiting for all services to stop...
zinit daemon stopped (0.0s)
Waiting for zinit_server to exit (max 30s)...
theck following behavior
do detailed study what is happengin
make integration test for this
Implementation Spec for Issue #38 — Zinit Graceful Shutdown
Current State Analysis
The
system.shutdownRPC handler is a no-op stub — it logs and returns{"ok": true}but performs no actual shutdown. TheSupervisor::shutdown()method exists but is never called from the RPC handler. The existingcancel_job()sends a single SIGTERM but does NOT wait, does NOT send SIGKILL on timeout, does NOT handle child processes, and ignores per-actionstop_signal/stop_timeout_ms.Key building blocks exist but are unused during shutdown:
kill_process_tree()in process.rs — deepest-first SIGTERM then SIGKILLget_child_processes()— recursive descendant discovery via sysinfoActionSpec.stop_signal/stop_timeout_ms— stored but ignoredObjective
Implement a robust graceful shutdown sequence that stops all services/jobs using per-action metadata, kills deepest children first, enforces a 30-second global timeout, verifies all processes are gone, and cleanly exits.
Requirements
stop_signalstop_timeout_msfor each service to exitstop_timeout_msat 30 secondsFiles to Modify/Create
crates/zinit_server/src/supervisor/shutdown.rscrates/zinit_server/src/supervisor/mod.rscrates/zinit_server/src/supervisor/executor.rsgraceful_stop_job()respecting stop_signal/timeoutcrates/zinit_server/src/rpc/system.rscrates/zinit_server/src/rpc/mod.rscrates/zinit_server/src/web.rscrates/zinit_server/src/main.rstests/integration/tests/shutdown.rsImplementation Plan
Step 1: Shutdown channel infrastructure (main.rs, web.rs, rpc/)
tokio::sync::watch::channelfor shutdown signalStep 2:
graceful_stop_job()in executor.rs (parallel with Step 1)get_child_processes()Step 3: GracefulShutdown orchestrator (shutdown.rs)
Step 4: Wire shutdown into main.rs
Step 5: Update integration tests
Step 6: CLI improvements
Acceptance Criteria
system.shutdownRPC triggers actual graceful shutdownTest Results
zinit_server: 37/37 passed ✅
zinit_lib: all passed ✅
Integration tests: 2 passed, 3 failed (pre-existing)
The 3 failing tests (
test_server_sighup_reload,test_server_sighup_add_remove,test_server_sigterm_child_propagation) inbinary_signals.rsare pre-existing failures caused by aServiceListOutputdeserialization mismatch — not related to the shutdown feature changes.Changes Made
tokio::sync::watchchannel from RPC handler to main loopgraceful_stop_job()— respects per-actionstop_signalandstop_timeout_ms(capped at 30s), kills deepest children first, SIGKILL fallbackGracefulShutdownorchestrator — dependency-aware wave-based shutdown, final sweep for port/process filter cleanup, orphan killingImplementation committed:
efcaffaon branchdevelopment_kristofReady for review and merge.