lab service --start kills tmux session when login shell is ~/hero/bin/nu #310

Open
opened 2026-06-03 17:04:44 +00:00 by thabeta · 0 comments
Owner

Problem

Running lab service --start (or lab service hero_proc --start) causes the caller's tmux session to exit. This happens when the user's login shell is /home/driver/hero/bin/nu.

Root Cause

start_hero_proc() in crates/lab/src/service/hero_proc_exception.rs calls kill_all_hero_processes_except_self() as a pre-start cleanup sweep. This function walks /proc/<pid>/exe and SIGKILLs every process whose executable lives under ~/hero/bin/ — except the calling lab binary itself.

The exclusion logic only skips processes whose exe matches lab. It does not exclude:

  • The user's login shell (when it is a hero binary like nu)
  • Processes that are ancestors of the calling lab process
  • Session leaders of active terminal sessions

When the login shell is ~/hero/bin/nu, the tmux session's shell process matches the kill pattern. sigkill_with_group() then kills the entire process group, taking down tmux and everything in it.

Call Chain

lab service hero_proc --start
  → start_hero_proc()  [hero_proc_exception.rs]
    → kill_all_hero_processes_except_self()  [line ~413]
      → walks /proc/<pid>/exe
      → finds nu at /home/driver/hero/bin/nu (tmux shell)
      → sigkill_with_group(pid)  [line ~444]
        → kills PID + process group
          → tmux loses its shell → session exits

Relevant Code

crates/lab/src/service/hero_proc_exception.rs:

pub fn kill_all_hero_processes_except_self() {
    let hero_bin = crate::repo::paths::hero_bin_dir();
    let my_pid = std::process::id();
    let my_exe = std::fs::read_link("/proc/self/exe").ok();

    for entry in std::fs::read_dir("/proc").flatten() {
        // ... skips self
        let exe = std::fs::read_link(entry.path().join("exe"));
        if !exe.starts_with(&hero_bin) { continue; }
        if my_exe.as_ref().is_some_and(|me| me == &exe) { continue; }
        sigkill_with_group(pid);  // ← kills tmux shell
    }
}

Evidence

# tmux session tree:
tmux (457614)
  └── nu -c /home/driver/hero/bin/nu (457615) ← exe = ~/hero/bin/nu → MATCHES → KILLED
        └── nu (457618) ← also killed

# User's shell:
SHELL=/home/driver/hero/bin/nu

Suggested Fix

kill_all_hero_processes_except_self() should exclude ancestor processes of the calling lab process. Walk the PPID chain from my_pid up to init, collect all ancestor PIDs, and skip them in the kill loop.

This is the safest approach because:

  • The shell that launched lab is always an ancestor and must be preserved
  • It does not rely on fragile heuristics about session leaders or multiplexer detection
  • It handles arbitrary nesting (SSH → tmux → nu → lab)

Impact

  • Anyone whose login shell is ~/hero/bin/nu (or any other hero binary) will lose their terminal session every time they run lab service --start
  • This also affects any wrapper scripts or SSH sessions that use hero binaries as their shell
## Problem Running `lab service --start` (or `lab service hero_proc --start`) causes the caller's tmux session to exit. This happens when the user's login shell is `/home/driver/hero/bin/nu`. ## Root Cause `start_hero_proc()` in `crates/lab/src/service/hero_proc_exception.rs` calls `kill_all_hero_processes_except_self()` as a pre-start cleanup sweep. This function walks `/proc/<pid>/exe` and SIGKILLs every process whose executable lives under `~/hero/bin/` — except the calling `lab` binary itself. The exclusion logic only skips processes whose exe matches `lab`. It does not exclude: - The user's login shell (when it is a hero binary like `nu`) - Processes that are ancestors of the calling `lab` process - Session leaders of active terminal sessions When the login shell is `~/hero/bin/nu`, the tmux session's shell process matches the kill pattern. `sigkill_with_group()` then kills the entire process group, taking down tmux and everything in it. ## Call Chain ``` lab service hero_proc --start → start_hero_proc() [hero_proc_exception.rs] → kill_all_hero_processes_except_self() [line ~413] → walks /proc/<pid>/exe → finds nu at /home/driver/hero/bin/nu (tmux shell) → sigkill_with_group(pid) [line ~444] → kills PID + process group → tmux loses its shell → session exits ``` ## Relevant Code **`crates/lab/src/service/hero_proc_exception.rs`:** ```rust pub fn kill_all_hero_processes_except_self() { let hero_bin = crate::repo::paths::hero_bin_dir(); let my_pid = std::process::id(); let my_exe = std::fs::read_link("/proc/self/exe").ok(); for entry in std::fs::read_dir("/proc").flatten() { // ... skips self let exe = std::fs::read_link(entry.path().join("exe")); if !exe.starts_with(&hero_bin) { continue; } if my_exe.as_ref().is_some_and(|me| me == &exe) { continue; } sigkill_with_group(pid); // ← kills tmux shell } } ``` ## Evidence ``` # tmux session tree: tmux (457614) └── nu -c /home/driver/hero/bin/nu (457615) ← exe = ~/hero/bin/nu → MATCHES → KILLED └── nu (457618) ← also killed # User's shell: SHELL=/home/driver/hero/bin/nu ``` ## Suggested Fix `kill_all_hero_processes_except_self()` should exclude ancestor processes of the calling `lab` process. Walk the PPID chain from `my_pid` up to init, collect all ancestor PIDs, and skip them in the kill loop. This is the safest approach because: - The shell that launched `lab` is always an ancestor and must be preserved - It does not rely on fragile heuristics about session leaders or multiplexer detection - It handles arbitrary nesting (SSH → tmux → nu → lab) ## Impact - Anyone whose login shell is `~/hero/bin/nu` (or any other hero binary) will lose their terminal session every time they run `lab service --start` - This also affects any wrapper scripts or SSH sessions that use hero binaries as their shell
omarz self-assigned this 2026-06-04 08:21:36 +00:00
omarz removed their assignment 2026-06-04 08:22:17 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_skills#310
No description provided.