rpc.discover can return a stale OpenRPC spec — regenerated clients then drift #32

Open
opened 2026-04-21 16:46:36 +00:00 by timur · 0 comments
Owner

Problem

rpc.discover is implemented by reading a cached OpenRPC document that the server holds in memory. In practice, this document can fall out of sync with the live binary — if the server registers a service after caching, if the cache is populated from a static include_str!'d file that pre-dates the last regen, or if the service registry mutates between cache-build and request time.

Observed: a running hero_logic binary (built Apr 20) serves a rpc.discover response that omits LogicService.play_start and siblings, even though those methods are absolutely present in the compiled code and the .oschema. Every downstream tool that regenerates from rpc.discover then inherits the miss.

Expected

rpc.discover should return the exact same bytes as the OpenRPC spec compiled into the running binary. No caching layer between "what the code says" and "what the discover endpoint returns." If caching is needed for performance, it should invalidate on server start, not persist across a restart.

Proposed fix

Option A: Serve rpc.discover directly from the include_str!'d spec constant. No runtime rebuild, no cache. O(1), always correct.

Option B: Keep the current cache, but rebuild it on every server start (clear on startup, first call triggers rebuild from schema). Slightly more work on first call, safe across restarts.

Either is fine. Option A is simpler.

  • #29 (the issue that led to discovery of this drift — tactical fix)
  • #(big-move) (the proper fix — move Python codegen to build-time so rpc.discover isn't the canonical source of truth anymore)

This issue is specifically about the discover endpoint's correctness, regardless of what consumes it.

## Problem `rpc.discover` is implemented by reading a cached OpenRPC document that the server holds in memory. In practice, this document can fall out of sync with the live binary — if the server registers a service after caching, if the cache is populated from a static `include_str!`'d file that pre-dates the last regen, or if the service registry mutates between cache-build and request time. Observed: a running hero_logic binary (built Apr 20) serves a `rpc.discover` response that omits `LogicService.play_start` and siblings, even though those methods are absolutely present in the compiled code and the `.oschema`. Every downstream tool that regenerates from `rpc.discover` then inherits the miss. ## Expected `rpc.discover` should return the exact same bytes as the OpenRPC spec compiled into the running binary. No caching layer between "what the code says" and "what the discover endpoint returns." If caching is needed for performance, it should invalidate on server start, not persist across a restart. ## Proposed fix Option A: Serve `rpc.discover` directly from the `include_str!`'d spec constant. No runtime rebuild, no cache. O(1), always correct. Option B: Keep the current cache, but rebuild it on every server start (clear on startup, first call triggers rebuild from schema). Slightly more work on first call, safe across restarts. Either is fine. Option A is simpler. ## Related - #29 (the issue that led to discovery of this drift — tactical fix) - #(big-move) (the proper fix — move Python codegen to build-time so `rpc.discover` isn't the canonical source of truth anymore) This issue is specifically about the discover endpoint's correctness, regardless of what consumes it.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_rpc#32
No description provided.