cockpit.list_services times out: serialized N+1 RPC fan-out to hero_proc #5
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Found while running a verification pass against a local cockpit install. The
cockpit.list_serviceshandler atcrates/hero_cockpit_server/src/main.rs:460callsservice.list_full, then for each returned service it awaits a serializedservice_statusfollowed by a serializedservice_stats. With around 90 services registered in hero_proc on a realistic VM, that is roughly 180 sequential RPC round-trips per page load, and the request reliably exceeds the router default 10 second upstream timeout. The visible symptom is that opening/services(which is the primary cockpit page after login) returns the plaintext stringupstream timeoutinstead of the services table, which makes every cockpit lifecycle button unreachable. Likely fix is to fan the per-serviceservice_statusandservice_statscalls out concurrently withfutures::future::join_all(or to extendservice.list_fullto return state and stats inline, then drop the secondary calls entirely). Reproduced locally with 91 services discovered by hero_router. Happy to open a PR once a preferred shape is confirmed.