[High] Race Condition in Route Update Propagation #22

Open
opened 2026-02-11 19:31:45 +00:00 by thabeta · 0 comments
Owner

Issue

Concurrent updates to the routing table during peer synchronization can cause stale metric caching in the Babel protocol implementation.

Location

mycelium/src/babel/route_request.rs

Problem Description

When multiple peers send route updates simultaneously, the route_request handler does not use atomic operations or sufficient locking to ensure all metric updates are consistently applied. This can lead to:

  • Incorrect path metrics being used for route decisions
  • Packets being routed through suboptimal paths
  • Transient routing loops during network topology changes

Impact

  • Severity: HIGH (affects routing correctness)
  • Frequency: Occurs under high peer churn or large mesh networks
  • User Impact: Unstable routing, higher latency, potential packet loss

Remediation

  1. Use a version-based or epoch-based approach to atomic route updates
  2. Implement read-write locks or RwLock for route table access
  3. Add integration tests that stress-test concurrent route updates
  4. Document the thread-safety guarantees of the routing table

Testing

  • Create a chaos test with 100+ peers sending contradictory routes
  • Verify no stale metrics are observed in routing decisions
  • Measure update propagation latency under concurrent load
## Issue Concurrent updates to the routing table during peer synchronization can cause stale metric caching in the Babel protocol implementation. ## Location `mycelium/src/babel/route_request.rs` ## Problem Description When multiple peers send route updates simultaneously, the route_request handler does not use atomic operations or sufficient locking to ensure all metric updates are consistently applied. This can lead to: - Incorrect path metrics being used for route decisions - Packets being routed through suboptimal paths - Transient routing loops during network topology changes ## Impact - **Severity**: HIGH (affects routing correctness) - **Frequency**: Occurs under high peer churn or large mesh networks - **User Impact**: Unstable routing, higher latency, potential packet loss ## Remediation 1. Use a version-based or epoch-based approach to atomic route updates 2. Implement read-write locks or RwLock for route table access 3. Add integration tests that stress-test concurrent route updates 4. Document the thread-safety guarantees of the routing table ## Testing - Create a chaos test with 100+ peers sending contradictory routes - Verify no stale metrics are observed in routing decisions - Measure update propagation latency under concurrent load
Sign in to join this conversation.
No labels
Urgent
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
geomind_code/mycelium_network#22
No description provided.