service.status RPC latency under concurrent load limits effective parallelism to ~9x

mik-tf commented

2026-05-23 05:09:37 +00:00

Owner

On a local hero_proc 0.7.0 with 101 supervised services, a single service.status call over UDS takes ~225ms with curl overhead (roughly 50-100ms server-side). Running 20 sequential service.status calls totals 37 seconds (1.8s per call), and 100 concurrent service.status calls complete in 13.7s, implying the daemon caps effective parallelism at roughly 9x regardless of how many futures the client fans out. service.stats shows the same scaling. hero_cockpit#6 (lhumina_code/hero_cockpit#6) surfaced this because hero_cockpit_server.handle_list_services issues N service.status plus N service.stats per service.list_full row, and even with futures::future::join_all parallelization the round-trip exceeds the hero_router 10s upstream timeout for any reasonably sized cockpit stack. Two viable directions: (1) add a bulk service.status_all RPC that returns state plus restarts plus pid plus memory_bytes plus cpu_percent for every supervised service in one call, or (2) profile and reduce the per-call latency of service.status under concurrent load (the daemon appears to serialize on something inside the supervisor handler). Either path resolves the cockpit /services page rendering and unblocks similar fan-out patterns in other admin UIs.

On a local hero_proc 0.7.0 with 101 supervised services, a single service.status call over UDS takes ~225ms with curl overhead (roughly 50-100ms server-side). Running 20 sequential service.status calls totals 37 seconds (1.8s per call), and 100 concurrent service.status calls complete in 13.7s, implying the daemon caps effective parallelism at roughly 9x regardless of how many futures the client fans out. service.stats shows the same scaling. hero_cockpit#6 (https://forge.ourworld.tf/lhumina_code/hero_cockpit/issues/6) surfaced this because hero_cockpit_server.handle_list_services issues N service.status plus N service.stats per service.list_full row, and even with futures::future::join_all parallelization the round-trip exceeds the hero_router 10s upstream timeout for any reasonably sized cockpit stack. Two viable directions: (1) add a bulk service.status_all RPC that returns state plus restarts plus pid plus memory_bytes plus cpu_percent for every supervised service in one call, or (2) profile and reduce the per-call latency of service.status under concurrent load (the daemon appears to serialize on something inside the supervisor handler). Either path resolves the cockpit /services page rendering and unblocks similar fan-out patterns in other admin UIs.

mik-tf referenced this issue from lhumina_code/hero_cockpit

2026-05-23 05:10:03 +00:00

cockpit.list_services times out: serialized N+1 RPC fan-out to hero_proc #6

mik-tf referenced this issue from a commit

2026-05-23 05:13:00 +00:00

docs(channels/free): s147 fix pass — flip e2e_checklist rationales after hero_cockpit#6 + hero_router#110 closure

mik-tf referenced this issue from lhumina_code/home

2026-05-23 12:27:04 +00:00

[META] Hero OS demo-deployer arc tracker (cockpit + proxy + content + deployer + manifest + integration) #235

mik-tf referenced this issue from a commit

2026-05-23 17:23:25 +00:00

feat(rpc): add service.status_all bulk RPC

mik-tf closed this issue

2026-05-23 17:23:25 +00:00

mik-tf commented

2026-05-23 17:41:38 +00:00

Author

Owner

Closed by e833dc9 — adds service.status_all bulk RPC.

Took option (a) from the issue (bulk RPC) over option (b) (profile + reduce per-call latency) because option (a) has structural amortization that option (b) cannot match without rewriting the SQL layer:

Single SYSMON mutex grab via monitoring::processes_stats(&pids) covers every running PID in one sysinfo refresh (vs one mutex per service in the fan-out).
Single SQL chain per service: get_for_service + get_job_ids + N x jobs.get. The existing service.status handler walks the same chain 3 times per call (via service_running_jobs, service_last_terminal_state, count_restarts); the new bulk handler walks it once and derives state, pid, and restarts inline from the materialized job list.

Local smoke at 105 supervised services, steady state:

pattern	wall-clock
OLD: serial N x service.status + N x service.stats (daemon-side, direct UDS, single client)	347 ms
NEW: bulk service.status_all (daemon-side, direct UDS)	10 ms
NEW: bulk service.status_all (cockpit -> hero_proc, through hero_router)	14 ms

The bulk row for any given service matches service.status byte-for-byte on the overlapping fields (pid, current_run_id, restarts, state, health_reason); the extra fields (memory_bytes, cpu_percent, context) fold in service.stats.

Cockpit adoption landed alongside in lhumina_code/hero_cockpit@722ace2; that closes hero_cockpit#6 too.

Closed by https://forge.ourworld.tf/lhumina_code/hero_proc/commit/e833dc9 — adds `service.status_all` bulk RPC. Took option (a) from the issue (bulk RPC) over option (b) (profile + reduce per-call latency) because option (a) has structural amortization that option (b) cannot match without rewriting the SQL layer: - Single SYSMON mutex grab via `monitoring::processes_stats(&pids)` covers every running PID in one sysinfo refresh (vs one mutex per service in the fan-out). - Single SQL chain per service: `get_for_service` + `get_job_ids` + N x `jobs.get`. The existing `service.status` handler walks the same chain 3 times per call (via `service_running_jobs`, `service_last_terminal_state`, `count_restarts`); the new bulk handler walks it once and derives state, pid, and restarts inline from the materialized job list. Local smoke at 105 supervised services, steady state: | pattern | wall-clock | |---|---| | OLD: serial N x service.status + N x service.stats (daemon-side, direct UDS, single client) | 347 ms | | NEW: bulk service.status_all (daemon-side, direct UDS) | 10 ms | | NEW: bulk service.status_all (cockpit -> hero_proc, through hero_router) | 14 ms | The bulk row for any given service matches `service.status` byte-for-byte on the overlapping fields (pid, current_run_id, restarts, state, health_reason); the extra fields (memory_bytes, cpu_percent, context) fold in `service.stats`. Cockpit adoption landed alongside in https://forge.ourworld.tf/lhumina_code/hero_cockpit/commit/722ace2; that closes hero_cockpit#6 too.

sameh-farouk referenced this issue

2026-05-24 10:14:10 +00:00

db/factory.rs Arc<std::sync::Mutex<Connection>> + 57 hot-path lock-in-async sites is a real defect, not a trade-off (see #121 evidence; #122 candidate) #124

sameh-farouk referenced this issue

2026-05-24 10:55:47 +00:00

hero_proc_server wedges in futex_do_wait after several hours, drops all RPCs (2nd recurrence) #122

sameh-farouk referenced this issue from a commit

2026-05-25 03:12:17 +00:00

feat(db): #124 migrate HeroProcDb from Arc<Mutex<Connection>> to deadpool-sqlite pool

sameh-farouk referenced this issue

2026-05-25 10:30:06 +00:00

db/factory.rs Arc<std::sync::Mutex<Connection>> + 57 hot-path lock-in-async sites is a real defect, not a trade-off (see #121 evidence; #122 candidate) #124

sameh-farouk commented

2026-05-25 10:31:17 +00:00

Member

For audit trail: the underlying serialization point this issue identified ("the daemon appears to serialize on something inside the supervisor handler") was the Arc<std::sync::Mutex<rusqlite::Connection>> in db/factory.rs, which forced all 57 sub-API call sites to lock-then-blocking-SQLite from inside async tasks. That root cause was removed in #124 (a58bdf0 on development) — the daemon now uses deadpool_sqlite::Pool with interact(), so multiple SQLite operations can be in flight concurrently and async workers are no longer parked on a sync mutex.

The e833dc9 feat(rpc): add service.status_all bulk RPC workaround that closed this issue stays in place — it's still a valid latency optimization for any client that needs N service statuses at once, even after the underlying cap is removed.

Stage 6 verification on a release daemon (50 parallel service.status_all calls): completed in ~120–300ms total (vs the ~9× serial cap = ~13.7s for 100 calls measured in the original repro). The cap is gone empirically too, not just architecturally.

Informational only — leaving this closed.

For audit trail: the underlying serialization point this issue identified ("the daemon appears to serialize on something inside the supervisor handler") was the `Arc<std::sync::Mutex<rusqlite::Connection>>` in `db/factory.rs`, which forced all 57 sub-API call sites to lock-then-blocking-SQLite from inside async tasks. That root cause was removed in #124 (`a58bdf0` on `development`) — the daemon now uses `deadpool_sqlite::Pool` with `interact()`, so multiple SQLite operations can be in flight concurrently and async workers are no longer parked on a sync mutex. The `e833dc9 feat(rpc): add service.status_all bulk RPC` workaround that closed this issue stays in place — it's still a valid latency optimization for any client that needs N service statuses at once, even after the underlying cap is removed. Stage 6 verification on a release daemon (50 parallel `service.status_all` calls): completed in ~120–300ms total (vs the ~9× serial cap = ~13.7s for 100 calls measured in the original repro). The cap is gone empirically too, not just architecturally. Informational only — leaving this closed.

~~sameh-farouk referenced this issue 2026-05-25 13:38:06 +00:00~~

hero_proc_server wedges in futex_do_wait after several hours, drops all RPCs (2nd recurrence) #122

sameh-farouk referenced this issue

2026-05-26 00:38:10 +00:00

db/factory.rs Arc<std::sync::Mutex<Connection>> + 57 hot-path lock-in-async sites is a real defect, not a trade-off (see #121 evidence; #122 candidate) #124

Rows
Columns

service.status RPC latency under concurrent load limits effective parallelism to ~9x #121