Files
herocoordinator/specs/specs.md
Lee Smet 2aa6277385 Actors are global
Signed-off-by: Lee Smet <lee.smet@hotmail.com>
2025-08-29 10:29:32 +02:00

20 KiB
Raw Permalink Blame History

Objects Used

Component What it stores Where it lives (Redis key) Main responsibilities
Actor Public key, reachable addresses, timestamps actor:<id> (hash) An identity that can request work, receive results and act as an administrator of a Context.
Context Permission lists (admins, readers, executors), timestamps context:<id> (hash) An isolated “tenant” a separate Redis DB and filesystem area. All objects (flows, messages, jobs, runners) belonging to a given workflow are stored under this context. The permission lists control who may read, execute or administer the context.
Flow DAG of job IDs, envvars, result map, status, timestamps flow:<id> (hash) A highlevel workflow created by a single Actor. It groups many RunnerJob objects, records their execution order, supplies common environment variables and aggregates the final result.
Message Payload, type (job|chat|mail), format (html|text|md), timeouts, embedded Job objects, log stream, status, timestamps message:<caller_id>:<id> (hash) The transport unit that travels over Mycelium (the pub/sub/message bus). A message can contain a RunnerJob (or a list of jobs) and is queued in two generic Redis lists: msg_out (to be sent) and msg_in (already received).
Runner Public key, Mycelium address, topic name, type (v|python|osis|rust), local flag, timestamps runner:<id> (hash) The worker that actually executes RunnerJob scripts. It subscribes to a Mycelium topic (normally runner<id>). If local == true the runner also consumes jobs directly from a Redis queue that is named after the scripttype suffix (v, python, …).
RunnerJob Script source, type (osis|sal|v|python), envvars, prerequisites, dependencies, status, timestamps, result map job:<caller_id>:<id> (hash) A single executable unit. It lives inside a Context, belongs to a Runner, and is queued according to its script_type (e.g. queue:python). Its status moves through the lifecycle `dispatched → waiting_for_prerequisites → started → finished

Key idea: All objects are persisted as hashes. Contextscoped objects (Context, Flow, Message, Runner, RunnerJob) live in a Redis database dedicated to that context. Actors are global and are stored in Redis DB 0 under actor:<id>. The system is completely decentralised each actor owns its own context and can spin up as many runners as needed. Communication between actors, runners and the rest of the system happens over Mycelium, a messagebus that uses Redis lists as queues.


Interaction diagram (who talks to who)

Sequence diagram “Submit a flow and run it”

%%{init: {"theme":"dark"}}%%
sequenceDiagram
    participant A as Actor
    participant L as LocalContext (Redis)
    participant M as Mycelium (msg_out / msg_in)
    participant R as RemoteContext (Redis)
    participant W as Runner (worker)

    %% 1. Actor creates everything locally
    A->>L: create Flow + RunnerJob (J)
    A->>L: LPUSH msg_out  Message{type=job, payload=J, target=Remote}

    %% 2. Mycelium transports the message
    M->>R: LPUSH msg_in  (Message key)

    %% 3. Remote context materialises the job
    R->>R: HSET Message hash
    R->>R: HSET RunnerJob (J')   // copy of payload
    R->>R: LPUSH queue:v  (job key)

    %% 4. Runner consumes and executes
    W->>R: BRPOP queue:v  (job key)
    W->>R: HSET job status = started
    W->>W: execute script
    W->>R: HSET job result + status = finished

    %% 5. Result is sent back
    W->>M: LPUSH msg_out  Message{type=result, payload=result, target=Local}
    M->>L: LPUSH msg_in  (result Message key)

    %% 6. Actor receives the result
    A->>L: RPOP msg_in → read result
%%{init: {"theme":"dark"}}%%
graph LR
    subgraph Redis["Redis (per Context)"]
        A[Actor] -->|stores| Ctx[Context]
        Ctx -->|stores| Fl[Flow]
        Ctx -->|stores| Msg[Message]
        Ctx -->|stores| Rnr[Runner]
        Ctx -->|stores| Job[RunnerJob]
    end

    subgraph Mycelium["Mycelium (Pub/Sub)"]
        MsgOut["queue:msg_out"] -->|outgoing| Mcel[Mycelium Bus]
        Mcel -->|incoming| MsgIn["queue:msg_in"]
        RnrTopic["topic:runnerX"] -->|subscribed by| Rnr
        queueV["queue:v"] -->|local jobs| Rnr
        queuePython["queue:python"] -->|local jobs| Rnr
    end

    A -->|creates / reads| Fl
    A -->|creates / reads| Msg
    A -->|creates / reads| Rnr
    A -->|creates / reads| Job
    Fl -->|references| Job
    Msg -->|may embed| Job
    Rnr -->|executes| Job
    Job -->|updates| Fl
    Msg -->|carries result back to| A

2.3Flowstatus lifecycle (state diagram)

%%{init: {"theme":"dark"}}%%
stateDiagram-v2
    [*] --> dispatched
    dispatched --> waiting_for_prerequisites : has prereqs
    waiting_for_prerequisites --> started : prereqs met
    dispatched --> started : no prereqs
    started --> finished : success
    started --> error : failure
    waiting_for_prerequisites --> error : timeout / impossible
    error --> [*]
    finished --> [*]

3Redis objects concrete key & data layout

All objects are stored as hashes (HSET). Below is a concise catalog that can be copied into a design doc.

Key pattern Example Fields (type) Comments
actor:${id} actor:12 idu32, pubkeystr, addresslist<Address>, created_atu32, updated_atu32 One hash per actor.
context:${id} context:7 idu32, adminslist<u32>, readerslist<u32>, executorslist<u32>, created_atu32, updated_atu32 Holds permission lists for a tenant.
flow:${id} flow:33 idu32, caller_idu32, context_idu32, jobslist<u32>, env_varsmap<str,str>, resultmap<str,str>, created_atu32, updated_atu32, statusstr (`dispatched started
message:${caller_id}:${id} message:12:101 idu32, caller_idu32, context_idu32, messagestr, message_typestr (`job chat
runner:${id} runner:20 idu32, pubkeystr, addressstr, topicstr, localbool, created_atu32, updated_atu32
job:${caller_id}:${id} job:12:2001 idu32, caller_idu32, context_idu32, scriptstr, script_typestr (`osis sal

Queue objects (lists)

Queue name Purpose
msg_out Outbound generic queue every Message that an actor wants to send is pushed here.
msg_in Inbound generic queue every message received from Mycelium is placed here for the local consumer to process.
queue:${suffix} (e.g. queue:v, queue:python) Local job queues used by a Runner when local == true. The suffix comes from ScriptType.queue_suffix().

4System specification (as a concise “specs” section)

4.1Naming conventions

  • All Redis hashes are prefixed with the object name (actor:, context:, …).
  • All queues are simple Redis lists (LPUSH / RPOP).
  • Message keys embed both the caller and a locally unique message id this guarantees global uniqueness across contexts.

4.2Permissions & security

  • Only IDs present in Context.admins may create or delete any object inside that context.
  • Context.readers can GET any hash but not modify it.
  • Context.executors are allowed to update RunnerJob.status, result and to pop from local job queues.
  • Every Actor must present a pubkey that can be verified by the receiving side (Mycelium uses asymmetric crypto).

4.3Message flow (publish / consume)

Below is a rewritten “Message flow (publish/consume)” that reflects the real runtime components:

  • Supervisordaemon runs on the node that owns the Flow (the actors side).
    It is the only process that ever RPOPs from the global msg_out queue, adds the proper routing information and hands the message to Mycelium.

  • Mycelium the pure pub/sub/messagebus. It never touches Redis directly; it only receives a payload key from the coordinator and delivers that key to the remote tenants msg_in list.

  • Remoteside runner / service consumes from its own msg_in, materialises the job and executes it.

The table now uses the exact component names and adds a short note about the permission check that the coordinator performs before it releases a message.

# Action (what the system does) Component that performs it Redis interaction (exact commands)
1Publish Actor creates a Message hash and LPUSHes its key onto the outbound queue. Actor (client code) HSET message:12:101 …
LPUSH msg_out message:12:101
2Coordinate & route The Supervisor daemon (running at source) RPOPs the key, checks the actors permissions, adds the targetcontext and topic fields, then forwards the key to Mycelium. Supervisor daemon (peractor) RPOP msg_out → (inprocess) → LPUSH msg_out_coordinator <key> (internal buffer)
3Transport Mycelium receives the key, looks at Message.message_type (or the explicit topic) and pushes the key onto the inbound queue of the remote tenant. Mycelium bus (network layer) LPUSH msg_in:<remotectx> <key>
4Consume The Remote side (runner or service) RPOPs from its msg_in, loads the full hash, verifies the actors signature and decides what to do based on message_type. Remote consumer (runner/service RPOP msg_in:<remotectx>HGETALL message:<key>
5Job materialisation If message_type == "job" the consumer creates a RunnerJob entry inside the remote context, adds the job key to the proper scripttype queue (queue:v, queue:python, …). Remote consumer HSET job:<caller_id>:<job_id> …
LPUSH queue:<script_type> job:<caller_id>:<job_id>
6Runner execution loop A Runner attached to that remote context BRPOPs from its scripttype queue, sets status = started, runs the script, writes result and final status. Runner BRPOP queue:<script_type>HSET job:<…> status started → … → HSET job:<…> result … status finished
7Result notification The runner builds a new Message (type chat, result, …) and pushes it onto msg_out again. The Supervisor daemon on the originating side will later pick it up and route it back to the original actor. RunnerSupervisor (remote side)MyceliumSupervisor (origin side)Actor HSET message:<res_key> …
LPUSH msg_out message:<res_key> (steps 23 repeat in reverse direction)

Tiny endtoend sequence (still simple enough to render)

%%{init: {"theme":"dark"}}%%
sequenceDiagram
    participant A as Actor
    participant L as LocalRedis (Flow ctx)
    participant C as Supervisor daemon (local)
    participant M as Mycelium bus
    participant R as RemoteRedis (target ctx)
    participant W as Runner (remote)

    %% 1⃣ publish
    A->>L: HSET message:12:101 …
    A->>L: LPUSH msg_out message:12:101

    %% 2⃣ coordinate
    C->>L: RPOP msg_out
    C->>C: check permissions / add routing info
    C->>M: push key to Mycelium (msg_out_coordinator)

    %% 3⃣ transport
    M->>R: LPUSH msg_in message:12:101

    %% 4⃣ consume
    R->>W: RPOP msg_in
    R->>R: HGETALL message:12:101
    R->>R: verify signature
    alt message_type == job
        R->>R: HSET job:12:2001 …
        R->>R: LPUSH queue:v job:12:2001
    end

    %% 5⃣ runner loop
    W->>R: BRPOP queue:v (job:12:2001)
    W->>R: HSET job:12:2001 status started
    W->>W: execute script
    W->>R: HSET job:12:2001 result … status finished

    %% 6⃣ result back
    W->>R: HSET message:12:900 result …
    W->>R: LPUSH msg_out message:12:900
    C->>M: (coordinator on remote side) routes back
    M->>L: LPUSH msg_in message:12:900
    A->>L: RPOP msg_in → read result

5What the system is trying to achieve

Goal How it is realized
Decentralised execution Every actor owns a Context; any number of Runners can be attached to that context, possibly on different machines, and they all talk over the same Mycelium/Redis backend.
Finegrained permissions Context.admins/readers/executors enforce who can create, view or run jobs.
Loose coupling via messages All actions (job submission, result propagation, chat, mail …) use the generic Message object; the same transport pipeline handles all of them.
Workflow orchestration The Flow object models a DAG of jobs, tracks collective status and aggregates results, without needing a central scheduler.
Pluggable runtimes ScriptType and RunnerType let a runner choose the proper execution environment (V, Python, OSIS, Rust, …) adding a new language only means adding a new ScriptType and a corresponding worker.
Observability Log arrays attached to a Message and the timestamps on every hash give a complete audit trail.
Resilience Jobs are idempotent hash entries; queues are persisted in Redis, and status changes are atomic (HSET). Retries and timeouts guarantee eventual consistency.

6Diagram summary (quick visual cheatsheet)

%%{init: {"theme":"dark"}}%%
graph TD
    A[Actor] -->|creates| Ctx[Context]
    A -->|creates| Flow
    A -->|creates| Msg
    A -->|creates| Rnr[Runner]
    A -->|creates| Job[RunnerJob]

    subgraph Redis["Redis (per Context)"]
        Ctx --> A
        Ctx --> Flow
        Ctx --> Msg
        Ctx --> Rnr
        Ctx --> Job
    end

    Msg -->|push to| OutQ[msg_out]
    OutQ --> Myc[Mycelium Bus]
    Myc -->|deliver| InQ[msg_in]
    InQ --> Rnr
    Rnr -->|pop from| Qv["queue:v"]
    Rnr -->|pop from| Qpy["queue:python"]

    Rnr -->|updates| Job
    Job -->|updates| Flow
    Flow -->|result Message| Msg