[High][Performance/Architecture] Admin APIs serialize behind a single Node mutex and hold it across long waits #31
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The HTTP and JSON-RPC layers store the node as
Arc<Mutex<Node<_>>>, and several handlers hold that mutex while awaiting long-running work.This is especially problematic for:
Why this matters
A coarse node-wide mutex turns independent admin operations into a serialized queue. In practice, one slow or waiting request can block unrelated state inspection or admin actions.
This is not just a micro-optimization issue. It changes the runtime behavior of the control surface under load.
Evidence
Shared state type:
mycelium-api/src/lib.rs:48-50HTTP long-poll message receive holds the node lock through timeout/wait:
mycelium-api/src/message.rs:142-149HTTP proxy connect holds the node lock across async connect:
mycelium-api/src/lib.rs:359-365JSON-RPC long-poll message receive similarly holds the node lock:
mycelium-api/src/rpc.rs:396-403JSON-RPC proxy connect does the same:
mycelium-api/src/rpc.rs:301-305Expected behavior
Slow API calls should not block unrelated read-only or control operations on the whole node.
Actual behavior
The API surface is effectively serialized on a single mutex in key paths.
Suggested fix
Mutex<Node>.Risk
High for operability and perceived reliability. Under contention, the admin surface can appear hung even when the node is otherwise healthy.