11 KiB
Baobab Project Overview
This document explains the system architecture and execution model: what a supervisor is, what an actor is (including each actor type and how they are used), how jobs flow through Redis, and how the various interfaces expose functionality over WebSocket and Unix IPC.
References point directly into the codebase for quick lookup.
1. Core Concepts
-
Supervisor
- A long-lived orchestrator that:
- Supervises actor lifecycles (start/restart/stop/health checks),
- Dispatches jobs to actors via Redis queues,
- Exposes a high-level API for creating, starting, running, and inspecting jobs.
- Key types and entry points:
- Supervisor
- SupervisorBuilder
- SupervisorBuilder::from_toml()
- Supervisor::start_actors()
- Supervisor::run_job_and_await_result()
- Supervisor::start_job()
- Supervisor::get_job_status()
- Supervisor::get_job_output()
- Supervisor::list_jobs()
- Supervisor::stop_job()
- Supervisor::delete_job()
- Supervisor::clear_all_jobs()
- A long-lived orchestrator that:
-
Actor
- A worker service that pulls jobs from a Redis queue and executes the job’s script with the appropriate engine/runtime for its type.
- Trait and common loop:
-
Job and Redis schema
- A job encapsulates a unit of work: script, script type (which selects the actor queue), caller/context IDs, timeout, etc.
- Canonical data and status types are re-exported by the supervisor:
- Redis schema used by the supervisor for job supervision is documented in:
- core/supervisor/README.md
- Keys overview (jobs, actor work queues, reply queues): see lines 95–100 in that file.
- core/supervisor/README.md
2. Actors and Script Execution
The system defines four actor types. Each actor has its own queue and executes scripts differently, with standardized context variables injected into script execution (e.g., CALLER_ID, CONTEXT_ID).
- Design summary:
Actor types and behavior:
-
OSIS (Rhai, non-blocking, sequential)
- Executes Rhai scripts one after another on a single thread using the Rhai engine.
- Intended for non-blocking tasks.
-
SAL (Rhai, blocking async, concurrent)
- Executes blocking asynchronous Rhai scripts concurrently by spawning a new thread per evaluation.
- Intended for IO-bound or blocking tasks requiring concurrency.
-
V (HeroScript via V engine) and Python (HeroScript via Python engine)
- Execute HeroScript scripts in their respective engines.
Execution context:
- Both CALLER_ID and CONTEXT_ID are injected in scope for scripts. See description at:
Actor implementation surface:
- Actors implement Actor and plug into the provided Actor::spawn() loop.
- The common loop:
- Connects to Redis (per-actor id),
- Blocks on the actor’s queue with BLPOP,
- Handles a special “ping” script inline (health check),
- Delegates other jobs to Actor::process_job().
3. Supervisor Responsibilities and Guarantees
-
Lifecycle management
- Starts/zinit-registers actors, monitors health, restarts if unhealthy or unresponsive, and cleans up services on shutdown.
- Health checking includes a ping job if idle (actor must respond “pong” immediately).
- Key entry points:
- Supervisor::start_actors()
- Background lifecycle manager (health loop):
- Per-actor health handling and restart:
- Uses zinit as the process manager; see the supervisor readme:
-
Job supervision
- Create, start, run-and-await, inspect, stop, delete jobs; dispatch based on script type using hardcoded per-type queues:
-
Job dependency utilities
- Check prerequisites and update dependents upon completion:
-
Redis naming and keys (namespace “hero:”)
- See “Redis Schema” section:
4. Interfaces (APIs and Transports)
The project exposes two complementary ways to interact with the supervisor and job system.
A. OpenRPC Server (JSON-RPC 2.0 over WebSocket or Unix IPC)
- Core types:
- Server lifecycle:
- Methods exposed (selected):
- Authentication: fetch_nonce, authenticate, whoami
- Script execution: play
- Job management: create_job, start_job, run_job, get_job_status, get_job_output, get_job_logs, list_jobs, stop_job, delete_job, clear_all_jobs
- All are registered inside OpenRpcServer::start() using jsonrpsee.
- Transports:
- WebSocket server binding is provided via jsonrpsee when using Transport::WebSocket.
- Unix Domain Socket (IPC) is implemented using reth-ipc when using Transport::Unix.
- Launchers:
- IPC server binary:
- IPC client (manual testing tool):
B. WebSocket Server (Actix)
- A dedicated Actix-based WebSocket server that runs a multi-circle endpoint: each connected circle uses its path “/{circle_pk}”. Each connection is handled by a dedicated Actix actor.
- Server runtime and session actor:
- Server
- Starts HTTP/WS server, binds routes, and spawns the WS actor per connection:
- Server::spawn_circle_server()
- per-connection handler:
- Auth and flow:
- Signature-based auth and session lifecycle are documented in:
- Nonce issuing, signature verification, and circle membership checks gate protected actions (e.g., play).
- Integration with supervisor:
- The WS server issues job requests via the supervisor (e.g., a “play” call builds and runs a job through Supervisor).
5. End-to-End Job Flow
-
Creating and starting a job via the OpenRPC server
- Client calls OpenRPC “create_job”, which builds a Job and stores it in Redis via Supervisor::create_job().
- Client then calls “start_job”, which reads the job to determine its ScriptType, computes the actor queue via Supervisor::get_actor_queue_key(), and pushes the job ID to the actor’s Redis list via Supervisor::start_job().
-
Running-and-awaiting a job in one step
- Client calls “run_job” or equivalent flow; the server uses Supervisor::run_job_and_await_result():
- Stores the job,
- Pushes to the appropriate actor queue,
- Waits for the result on a dedicated reply queue “hero::reply:{job_id}”.
- Client calls “run_job” or equivalent flow; the server uses Supervisor::run_job_and_await_result():
-
Actor processing loop
- The actor BLPOP’s its queue (timeout), receives a job ID, loads the job, handles “ping” inline, otherwise calls Actor::process_job() for execution, and writes status/output back to Redis.
- The common loop is provided by Actor::spawn().
-
Health checks
- The supervisor periodically checks zinit state and may issue ping jobs if idle; failure to respond leads to restart. See lifecycle logic:
-
Redis schema pointers (namespace hero:)
- See section “Redis Schema for Job Supervision”:
6. How the Interfaces Fit Together
-
The OpenRPC server provides a JSON-RPC 2.0 façade for programmatic control (automation, services).
- Choose between WebSocket and Unix IPC transports via Transport.
- It wraps the Supervisor, delegating all job and lifecycle supervision calls.
-
The WebSocket (Actix) server provides a multi-circle, session-based, interactive API well-suited for browser or persistent WS clients.
- It authenticates users per-circle, then issues supervisor-backed job calls within the authenticated context.
- Session isolation is per WS actor instance; see:
Both interfaces ultimately converge on the same core abstraction: the Supervisor orchestrating jobs and actors over Redis with zinit-backed lifecycle guarantees.
7. Additional References
-
Architecture summary for actor types and scripting:
-
Supervisor documentation and prerequisites (Redis, zinit):
-
TUI/CLI examples and lifecycle demos:
-
Actor README (queue consumption, Rhai execution, context variables):