baobab/docs/RPC_IMPLEMENTATION.md
Maxime Van Hees 0ebda7c1aa Updates
2025-08-14 14:14:34 +02:00

124 lines
7.1 KiB
Markdown

# RPC Implementation (jsonrpsee) for Supervisor
Objective
- Provide an HTTP/WS JSON-RPC server with jsonrpsee that exposes all Supervisor job operations.
- Use the current Supervisor and job model directly; methods should map 1:1 to Supervisor APIs.
- Keep the implementation simple: a single transport (jsonrpsee::server::Server on SocketAddr).
Canonical model
- Jobs are stored and updated in Redis under hero:job:{job_id}.
- Work is dispatched to type queues hero:q:work:type:{script_type}.
- Actors consume by script type and update the job hash status/output.
- Server-side types and queues are already aligned in code (see keys in [rust.keys](core/job/src/lib.rs:392)).
What exists today (summary)
- Server state and registry
- [rust.OpenRpcServer](interfaces/openrpc/server/src/lib.rs:37) holds a Supervisor inside RwLock.
- Methods are registered manually with jsonrpsee::RpcModule in [rust.OpenRpcServer::start()](interfaces/openrpc/server/src/lib.rs:117).
- Methods wired vs. stubbed
- Wired: create_job, start_job, get_job_status, get_job_output, stop_job, delete_job, clear_all_jobs.
- Stubbed or partial: run_job (returns a formatted string), play (returns canned output), get_job_logs (mocked), list_jobs (returns fabricated Job objects from IDs).
- Transports
- start() supports a Unix transport through reth-ipc and a WebSocket SocketAddr. We only need HTTP/WS via jsonrpsee::server::Server::builder().build(addr).
Target surface (final)
- Methods
- fetch_nonce(pubkey: String) -> String [optional now]
- authenticate(pubkey: String, signature: String, nonce: String) -> bool [optional now]
- whoami() -> String [optional now]
- play(script: String) -> PlayResult { output: String } [maps to run_job with a chosen default ScriptType]
- create_job(job: JobParams) -> String (job_id)
- start_job(job_id: String) -> { success: bool }
- run_job(script: String, script_type: ScriptType, prerequisites?: Vec<String>) -> String (output)
- get_job_status(job_id: String) -> JobStatus
- get_job_output(job_id: String) -> String
- get_job_logs(job_id: String) -> JobLogsResult { logs: String | null }
- list_jobs() -> Vec<String>
- stop_job(job_id: String) -> null
- delete_job(job_id: String) -> null
- clear_all_jobs() -> null
- Types
- ScriptType = OSIS | SAL | V | Python ([rust.ScriptType](core/job/src/lib.rs:16))
- JobParams: script, script_type, caller_id, context_id, timeout?, prerequisites?
- JobStatus: Dispatched | WaitingForPrerequisites | Started | Error | Finished
- DTOs in [rust.interfaces/openrpc/server/src/types.rs](interfaces/openrpc/server/src/types.rs:1)
Required changes
1) Transport: simplify to HTTP/WS on SocketAddr
- Remove Unix transport: in [rust.OpenRpcServer::start()](interfaces/openrpc/server/src/lib.rs:247), delete Transport::Unix and reth-ipc usage.
- Use jsonrpsee::server::Server::builder().build(addr) and server.start(module), per upstream examples:
- [rust.http](reference_jsonrpsee_crate_examples/http.rs:53)
- [rust.ws](reference_jsonrpsee_crate_examples/ws.rs:55)
2) ScriptType consistency end-to-end
- Ensure ScriptType is hero_job::ScriptType (OSIS | SAL | V | Python) in request/response types (already used in [rust.JobParams](interfaces/openrpc/server/src/types.rs:6)). If openrpc.json is used to generate docs or clients, update its enum to match.
3) Implement run_job (one-shot)
- In [rust.OpenRpcApiServer::run_job](interfaces/openrpc/server/src/lib.rs:366):
- Build a hero_job::JobBuilder with caller_id/context_id placeholders (or accept them as parameters later).
- Set script, script_type, optional prerequisites, timeout default.
- Call supervisor.run_job_and_await_result(&job) and return the output string.
4) Implement play as a thin wrapper
- In [rust.OpenRpcApiServer::play](interfaces/openrpc/server/src/lib.rs:304):
- Choose a default ScriptType (recommendation: SAL), then delegate to run_job(script, SAL, None).
- Return PlayResult { output }.
5) Implement get_job_logs via Supervisor
- Replace the mocked return in [rust.get_job_logs](interfaces/openrpc/server/src/lib.rs:400) with a call to:
- supervisor.get_job_logs(&job_id) -> Option<String> and wrap into JobLogsResult { logs }.
6) list_jobs should return Vec<String> (IDs only)
- Replace placeholder construction in [rust.list_jobs](interfaces/openrpc/server/src/lib.rs:407) with:
- supervisor.list_jobs() returning Vec<String> directly.
- Optionally add get_job(job_id) later if needed.
7) Error handling
- Map SupervisorError to jsonrpsee error codes:
- Invalid input → ErrorCode::InvalidParams
- Timeout → a custom code or InvalidParams; optionally use -32002 as a custom timeout code.
- Internal IO/Redis errors → ErrorCode::InternalError
- Keep server logs descriptive; return minimal error messages to clients.
8) Server lifecycle
- Keep OpenRpcServer::new() to build with TOML or builder defaults (see [rust.OpenRpcServer::new()](interfaces/openrpc/server/src/lib.rs:98)).
- Expose a “start_on(addr)” function that returns a ServerHandle (just like upstream examples).
- Optional: expose Supervisor::start_rpc_server(host, port) to own lifecycle from Supervisor; or leave it in interfaces/openrpc with a thin cmd binary to start it.
Non-goals (for this phase)
- Unix IPC transport (reth-ipc).
- Advanced middleware (CORS, host filters, rate-limiting).
- RPC auth flows (fetch_nonce/authenticate/whoami) beyond placeholders.
- Pub/Sub over RPC.
Reference mapping (clickable)
- Server core and methods:
- [rust.OpenRpcServer](interfaces/openrpc/server/src/lib.rs:37)
- [rust.OpenRpcApi](interfaces/openrpc/server/src/lib.rs:45)
- [rust.OpenRpcServer::start()](interfaces/openrpc/server/src/lib.rs:117)
- [rust.JobParams](interfaces/openrpc/server/src/types.rs:6)
- [rust.StartJobResult](interfaces/openrpc/server/src/types.rs:23)
- [rust.JobLogsResult](interfaces/openrpc/server/src/types.rs:29)
- Supervisor backend:
- [rust.Supervisor::create_job()](core/supervisor/src/lib.rs:660)
- [rust.Supervisor::start_job()](core/supervisor/src/lib.rs:675)
- [rust.Supervisor::run_job_and_await_result()](core/supervisor/src/lib.rs:689)
- [rust.Supervisor::get_job_status()](core/supervisor/src/lib.rs:723)
- [rust.Supervisor::get_job_output()](core/supervisor/src/lib.rs:758)
- [rust.Supervisor::get_job_logs()](core/supervisor/src/lib.rs:817)
- [rust.Supervisor::list_jobs()](core/supervisor/src/lib.rs:780)
- [rust.Supervisor::stop_job()](core/supervisor/src/lib.rs:789)
- [rust.Supervisor::delete_job()](core/supervisor/src/lib.rs:850)
- [rust.Supervisor::clear_all_jobs()](core/supervisor/src/lib.rs:862)
- jsonrpsee examples to replicate transport and registration patterns:
- HTTP: [rust.http example](reference_jsonrpsee_crate_examples/http.rs:53)
- WS: [rust.ws example](reference_jsonrpsee_crate_examples/ws.rs:55)
Acceptance checklist
- Server starts on a host:port using jsonrpsee::server::Server.
- All Supervisor operations callable over RPC, 1:1 mapping, returning correct DTOs.
- ScriptType uses OSIS|SAL|V|Python.
- list_jobs returns Vec<String> and no fake job objects.
- run_job and play perform real execution and return actual outputs.
- No Unix IPC code path remains in start().