add some documentation for blue book

2025-11-14 11:00:26 +01:00
parent 75e62f4730
commit f67296cd25
11 changed files with 1275 additions and 8 deletions
--- a/docs/.collection
+++ b/docs/.collection
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,67 @@
 # Horus Documentation
 **Hierarchical Orchestration Runtime for Universal Scripts**
 Horus is a distributed job execution system with three layers: Coordinator, Supervisor, and Runner.
 ## Quick Links
 - **[Getting Started](./getting-started.md)** - Install and run your first job
 - **[Architecture](./architecture.md)** - System design and components
 - **[Etymology](./ethymology.md)** - The meaning behind the name
 ## Components
 ### Coordinator
 Workflow orchestration engine for DAG-based execution.
 - [Overview](./coordinator/overview.md)
 ### Supervisor
 Job dispatcher with authentication and routing.
 - [Overview](./supervisor/overview.md)
 - [Authentication](./supervisor/auth.md)
 - [OpenRPC API](./supervisor/openrpc.json)
 ### Runners
 Job executors for different workload types.
 - [Runner Overview](./runner/overview.md)
 - [Hero Runner](./runner/hero.md) - Heroscript execution
 - [SAL Runner](./runner/sal.md) - System operations
 - [Osiris Runner](./runner/osiris.md) - Database operations
 ## Core Concepts
 ### Jobs
 Units of work executed by runners. Each job contains:
 - Target runner ID
 - Payload (script/command)
 - Cryptographic signature
 - Optional timeout and environment variables
 ### Workflows
 Multi-step DAGs executed by the Coordinator. Steps can:
 - Run in parallel or sequence
 - Pass data between steps
 - Target different runners
 - Handle errors and retries
 ### Signatures
 All jobs must be cryptographically signed:
 - Ensures job authenticity
 - Prevents tampering
 - Enables authorization
 ## Use Cases
 - **Automation**: Execute system tasks and scripts
 - **Data Pipelines**: Multi-step ETL workflows
 - **CI/CD**: Build, test, and deployment pipelines
 - **Infrastructure**: Manage cloud resources and containers
 - **Integration**: Connect systems via scripted workflows
 ## Repository
 [git.ourworld.tf/herocode/horus](https://git.ourworld.tf/herocode/horus)
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -1,15 +1,185 @@
 # Architecture
-The Horus architecture consists of three layers:
+Horus is a hierarchical orchestration runtime with three layers: Coordinator, Supervisor, and Runner.
-1. Coordinator: A workflow engine that executes DAG-based flows by sending ready job steps to the targeted supervisors.
+## Overview
 2. Supervisor: A job dispatcher that routes jobs to the appropriate runners.
 3. Runner: A job executor that runs the actual job steps.
-## Networking
+```
 ┌─────────────────────────────────────────────────────────┐
 │                       Coordinator                        │
 │          (Workflow Engine - DAG Execution)              │
 │                                                          │
 │  • Parses workflow definitions                          │
 │  • Resolves dependencies                                │
 │  • Dispatches ready steps                               │
 │  • Tracks workflow state                                │
 └────────────────────┬────────────────────────────────────┘
                     │ OpenRPC (HTTP/Mycelium)
                     │
 ┌────────────────────▼────────────────────────────────────┐
 │                       Supervisor                         │
 │            (Job Dispatcher & Authenticator)             │
 │                                                          │
 │  • Verifies job signatures                              │
 │  • Routes jobs to runners                               │
 │  • Manages runner registry                              │
 │  • Tracks job lifecycle                                 │
 └────────────────────┬────────────────────────────────────┘
                     │ Redis Queue Protocol
                     │
 ┌────────────────────▼────────────────────────────────────┐
 │                        Runners                           │
 │                  (Job Executors)                        │
 │                                                          │
 │  ┌──────────┐  ┌──────────┐  ┌──────────┐            │
 │  │   Hero   │  │   SAL    │  │  Osiris  │            │
 │  │ Runner   │  │ Runner   │  │  Runner  │            │
 │  └──────────┘  └──────────┘  └──────────┘            │
 └─────────────────────────────────────────────────────────┘
 ```
- The user / client talks to the coordinator over an OpenRPC interface, using either regular HTTP transport or Mycelium.
+## Layers
- The coordinator talks to the supervisor over an OpenRPC interface, using either regular HTTP transport or Mycelium.
+
- The supervisor talks to runners over a Redis based job execution protocol.
+### 1. Coordinator (Optional)
 **Purpose:** Workflow orchestration and DAG execution
 **Responsibilities:**
 - Parse and validate workflow definitions
 - Execute DAG-based flows
 - Manage step dependencies
 - Route jobs to appropriate supervisors
 - Handle multi-step workflows
 **Use When:**
 - You need multi-step workflows
 - Jobs have dependencies
 - Parallel execution is required
 - Complex data pipelines
 [→ Coordinator Documentation](./coordinator/overview.md)
 ### 2. Supervisor (Required)
 **Purpose:** Job admission, authentication, and routing
 **Responsibilities:**
 - Receive jobs via OpenRPC interface
 - Verify cryptographic signatures
 - Route jobs to appropriate runners
 - Manage runner registry
 - Track job status and results
 **Features:**
 - OpenRPC API for job management
 - HTTP and Mycelium transport
 - Signature-based authentication
 - Runner health monitoring
 [→ Supervisor Documentation](./supervisor/overview.md)
 ### 3. Runners (Required)
 **Purpose:** Execute actual job workloads
 **Available Runners:**
 - **Hero Runner**: Executes heroscripts via Hero CLI
 - **SAL Runner**: System operations (OS, K8s, cloud, etc.)
 - **Osiris Runner**: Database operations with Rhai scripts
 **Common Features:**
 - Redis queue-based job polling
 - Signature verification
 - Timeout support
 - Environment variable handling
 [→ Runner Documentation](./runner/overview.md)
 ## Communication Protocols
 ### Client ↔ Coordinator
 - **Protocol:** OpenRPC
 - **Transport:** HTTP or Mycelium
 - **Operations:** Submit workflow, check status, retrieve results
 ### Coordinator ↔ Supervisor
 - **Protocol:** OpenRPC
 - **Transport:** HTTP or Mycelium
 - **Operations:** Create job, get status, retrieve logs
 ### Supervisor ↔ Runner
 - **Protocol:** Redis Queue
 - **Transport:** Redis pub/sub and lists
 - **Operations:** Push job, poll queue, store result
 ## Job Flow
 ### Simple Job (No Coordinator)
 ```
 1. Client → Supervisor: create_job()
 2. Supervisor: Verify signature
 3. Supervisor → Redis: Push to runner queue
 4. Runner ← Redis: Pop job
 5. Runner: Execute job
 6. Runner → Redis: Store result
 7. Client ← Supervisor: get_job_result()
 ```
 ### Workflow (With Coordinator)
 ```
 1. Client → Coordinator: submit_workflow()
 2. Coordinator: Parse DAG
 3. Coordinator: Identify ready steps
 4. Coordinator → Supervisor: create_job() for each ready step
 5. Supervisor → Runner: Route via Redis
 6. Runner: Execute and return result
 7. Coordinator: Update workflow state
 8. Coordinator: Dispatch next ready steps
 9. Repeat until workflow complete
 ```
 ## Security Model
 ### Authentication
 - Jobs must be cryptographically signed
 - Signatures verified at Supervisor layer
 - Public key infrastructure for identity
 ### Authorization
 - Runners only execute signed jobs
 - Signature verification before execution
 - Untrusted jobs rejected
 ### Transport Security
 - Optional TLS for HTTP transport
 - End-to-end encryption via Mycelium
 - No plaintext credentials
 [→ Authentication Details](./supervisor/auth.md)
 ## Deployment Patterns
 ### Minimal Setup
 ```
 Redis + Supervisor + Runner(s)
 ```
 Single machine, simple job execution.
 ### Distributed Setup
 ```
 Redis Cluster + Multiple Supervisors + Runner Pool
 ```
 High availability, load balancing.
 ### Full Orchestration
 ```
 Coordinator + Multiple Supervisors + Runner Pool
 ```
 Complex workflows, multi-step pipelines.
 ## Design Principles
 1. **Hierarchical**: Clear separation of concerns across layers
 2. **Secure**: Signature-based authentication throughout
 3. **Scalable**: Horizontal scaling at each layer
 4. **Observable**: Comprehensive logging and status tracking
 5. **Flexible**: Multiple runners for different workload types
--- a/docs/coordinator/overview.md
+++ b/docs/coordinator/overview.md
@@ -0,0 +1,145 @@
 # Coordinator Overview
 The Coordinator is the workflow orchestration layer in Horus. It executes DAG-based flows by managing job dependencies and dispatching ready steps to supervisors.
 ## Architecture
 ```
 Client → Coordinator → Supervisor(s) → Runner(s)
 ```
 ## Responsibilities
 ### 1. **Workflow Management**
 - Parse and validate DAG workflow definitions
 - Track workflow execution state
 - Manage step dependencies
 ### 2. **Job Orchestration**
 - Determine which steps are ready to execute
 - Dispatch jobs to appropriate supervisors
 - Handle step failures and retries
 ### 3. **Dependency Resolution**
 - Track step completion
 - Resolve data dependencies between steps
 - Pass outputs from completed steps to dependent steps
 ### 4. **Multi-Supervisor Coordination**
 - Route jobs to specific supervisors
 - Handle supervisor failures
 - Load balance across supervisors
 ## Workflow Definition
 Workflows are defined as Directed Acyclic Graphs (DAGs):
 ```yaml
 workflow:
  name: "data-pipeline"
  steps:
    - id: "fetch"
      runner: "hero"
      payload: "!!http.get url:'https://api.example.com/data'"
    - id: "process"
      runner: "sal"
      depends_on: ["fetch"]
      payload: |
        let data = input.fetch;
        let processed = process_data(data);
        processed
    - id: "store"
      runner: "osiris"
      depends_on: ["process"]
      payload: |
        let model = osiris.model("results");
        model.create(input.process);
 ```
 ## Features
 ### DAG Execution
 - Parallel execution of independent steps
 - Sequential execution of dependent steps
 - Automatic dependency resolution
 ### Error Handling
 - Step-level retry policies
 - Workflow-level error handlers
 - Partial workflow recovery
 ### Data Flow
 - Pass outputs between steps
 - Transform data between steps
 - Aggregate results from parallel steps
 ### Monitoring
 - Real-time workflow status
 - Step-level progress tracking
 - Execution metrics and logs
 ## Workflow Lifecycle
 1. **Submission**: Client submits workflow definition
 2. **Validation**: Coordinator validates DAG structure
 3. **Scheduling**: Determine ready steps (no pending dependencies)
 4. **Dispatch**: Send jobs to supervisors
 5. **Tracking**: Monitor step completion
 6. **Progression**: Execute next ready steps
 7. **Completion**: Workflow finishes when all steps complete
 ## Use Cases
 ### Data Pipelines
 ```yaml
 Extract → Transform → Load
 ```
 ### CI/CD Workflows
 ```yaml
 Build → Test → Deploy
 ```
 ### Multi-Stage Processing
 ```yaml
 Fetch Data → Process → Validate → Store → Notify
 ```
 ### Parallel Execution
 ```yaml
        ┌─ Task A ─┐
 Start ──┼─ Task B ─┼── Aggregate → Finish
        └─ Task C ─┘
 ```
 ## Configuration
 ```bash
 # Start coordinator
 coordinator --port 9090 --redis-url redis://localhost:6379
 # With multiple supervisors
 coordinator --port 9090 \
  --supervisor http://supervisor1:8080 \
  --supervisor http://supervisor2:8080
 ```
 ## API
 The Coordinator exposes an OpenRPC API:
 - `submit_workflow`: Submit a new workflow
 - `get_workflow_status`: Check workflow progress
 - `list_workflows`: List all workflows
 - `cancel_workflow`: Stop a running workflow
 - `get_workflow_logs`: Retrieve execution logs
 ## Advantages
 - **Declarative**: Define what to do, not how
 - **Scalable**: Parallel execution across multiple supervisors
 - **Resilient**: Automatic retry and error handling
 - **Observable**: Real-time status and logging
 - **Composable**: Reuse workflows as steps in larger workflows
--- a/docs/getting-started.md
+++ b/docs/getting-started.md
@@ -0,0 +1,186 @@
 # Getting Started with Horus
 Quick start guide to running your first Horus job.
 ## Prerequisites
 - Redis server running
 - Rust toolchain installed
 - Horus repository cloned
 ## Installation
 ### Build from Source
 ```bash
 # Clone repository
 git clone https://git.ourworld.tf/herocode/horus
 cd horus
 # Build all components
 cargo build --release
 # Binaries will be in target/release/
 ```
 ## Quick Start
 ### 1. Start Redis
 ```bash
 # Using Docker
 docker run -d -p 6379:6379 redis:latest
 # Or install locally
 redis-server
 ```
 ### 2. Start a Runner
 ```bash
 # Start Hero runner
 ./target/release/herorunner my-runner
 # Or SAL runner
 ./target/release/runner_sal my-sal-runner
 # Or Osiris runner
 ./target/release/runner_osiris my-osiris-runner
 ```
 ### 3. Start the Supervisor
 ```bash
 ./target/release/supervisor --port 8080
 ```
 ### 4. Submit a Job
 Using the Supervisor client:
 ```rust
 use hero_supervisor_client::SupervisorClient;
 use hero_job::Job;
 #[tokio::main]
 async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = SupervisorClient::new("http://localhost:8080")?;
    let job = Job::new(
        "my-runner",
        "print('Hello from Horus!')".to_string(),
    );
    let result = client.create_job(job).await?;
    println!("Job ID: {}", result.id);
    Ok(())
 }
 ```
 ## Example Workflows
 ### Simple Heroscript Execution
 ```bash
 # Job payload
 print("Hello World")
 !!git.list
 ```
 ### SAL System Operation
 ```rhai
 // List files in directory
 let files = os.list_dir("/tmp");
 for file in files {
    print(file);
 }
 ```
 ### Osiris Data Storage
 ```rhai
 // Store user data
 let users = osiris.model("users");
 let user = users.create(#{
    name: "Alice",
    email: "alice@example.com"
 });
 print(`Created user: ${user.id}`);
 ```
 ## Architecture Overview
 ```
 ┌──────────────┐
 │ Coordinator  │  (Optional: For workflows)
 └──────┬───────┘
       │
 ┌──────▼───────┐
 │  Supervisor  │  (Job dispatcher)
 └──────┬───────┘
       │
       │ Redis
       │
 ┌──────▼───────┐
 │   Runners    │  (Job executors)
 │  - Hero      │
 │  - SAL       │
 │  - Osiris    │
 └──────────────┘
 ```
 ## Next Steps
 - [Architecture Details](./architecture.md)
 - [Runner Documentation](./runner/overview.md)
 - [Supervisor API](./supervisor/overview.md)
 - [Coordinator Workflows](./coordinator/overview.md)
 - [Authentication](./supervisor/auth.md)
 ## Common Issues
 ### Runner Not Receiving Jobs
 1. Check Redis connection
 2. Verify runner ID matches job target
 3. Check supervisor logs
 ### Job Signature Verification Failed
 1. Ensure job is properly signed
 2. Verify public key is registered
 3. Check signature format
 ### Timeout Errors
 1. Increase job timeout value
 2. Check runner resource availability
 3. Optimize job payload
 ## Development
 ### Running Tests
 ```bash
 # All tests
 cargo test
 # Specific component
 cargo test -p hero-supervisor
 cargo test -p runner-hero
 ```
 ### Debug Mode
 ```bash
 # Enable debug logging
 RUST_LOG=debug ./target/release/supervisor --port 8080
 ```
 ## Support
 - Documentation: [docs.ourworld.tf/horus](https://docs.ourworld.tf/horus)
 - Repository: [git.ourworld.tf/herocode/horus](https://git.ourworld.tf/herocode/horus)
 - Issues: Report on the repository
--- a/docs/job-format.md
+++ b/docs/job-format.md
@@ -0,0 +1,179 @@
 # Job Format
 Jobs are the fundamental unit of work in Horus.
 ## Structure
 ```rust
 pub struct Job {
    pub id: String,              // Unique job identifier
    pub runner_id: String,       // Target runner ID
    pub payload: String,         // Job payload (script/command)
    pub timeout: Option<u64>,    // Timeout in seconds
    pub env_vars: HashMap<String, String>, // Environment variables
    pub signatures: Vec<Signature>, // Cryptographic signatures
    pub created_at: i64,         // Creation timestamp
    pub status: JobStatus,       // Current status
 }
 ```
 ## Job Status
 ```rust
 pub enum JobStatus {
    Pending,    // Queued, not yet started
    Running,    // Currently executing
    Completed,  // Finished successfully
    Failed,     // Execution failed
    Timeout,    // Exceeded timeout
    Cancelled,  // Manually cancelled
 }
 ```
 ## Signature Format
 ```rust
 pub struct Signature {
    pub public_key: String,  // Signer's public key
    pub signature: String,   // Cryptographic signature
    pub algorithm: String,   // Signature algorithm (e.g., "ed25519")
 }
 ```
 ## Creating a Job
 ### Minimal Job
 ```rust
 use hero_job::Job;
 let job = Job::new(
    "my-runner",
    "print('Hello World')".to_string(),
 );
 ```
 ### With Timeout
 ```rust
 let job = Job::builder()
    .runner_id("my-runner")
    .payload("long_running_task()")
    .timeout(300) // 5 minutes
    .build();
 ```
 ### With Environment Variables
 ```rust
 use std::collections::HashMap;
 let mut env_vars = HashMap::new();
 env_vars.insert("API_KEY".to_string(), "secret".to_string());
 env_vars.insert("ENV".to_string(), "production".to_string());
 let job = Job::builder()
    .runner_id("my-runner")
    .payload("deploy_app()")
    .env_vars(env_vars)
    .build();
 ```
 ### With Signature
 ```rust
 use hero_job::{Job, Signature};
 let job = Job::builder()
    .runner_id("my-runner")
    .payload("important_task()")
    .signature(Signature {
        public_key: "ed25519:abc123...".to_string(),
        signature: "sig:xyz789...".to_string(),
        algorithm: "ed25519".to_string(),
    })
    .build();
 ```
 ## Payload Format
 The payload format depends on the target runner:
 ### Hero Runner
 Heroscript content:
 ```heroscript
 !!git.list
 print("Repositories listed")
 !!docker.ps
 ```
 ### SAL Runner
 Rhai script with SAL modules:
 ```rhai
 let files = os.list_dir("/tmp");
 for file in files {
    print(file);
 }
 ```
 ### Osiris Runner
 Rhai script with Osiris database:
 ```rhai
 let users = osiris.model("users");
 let user = users.create(#{
    name: "Alice",
    email: "alice@example.com"
 });
 ```
 ## Job Result
 ```rust
 pub struct JobResult {
    pub job_id: String,
    pub status: JobStatus,
    pub output: String,      // Stdout
    pub error: Option<String>, // Stderr or error message
    pub exit_code: Option<i32>,
    pub started_at: Option<i64>,
    pub completed_at: Option<i64>,
 }
 ```
 ## Best Practices
 ### Timeouts
 - Always set timeouts for jobs
 - Default: 60 seconds
 - Long-running jobs: Set appropriate timeout
 - Infinite jobs: Use separate monitoring
 ### Environment Variables
 - Don't store secrets in env vars in production
 - Use vault/secret management instead
 - Keep env vars minimal
 - Document required variables
 ### Signatures
 - Always sign jobs in production
 - Use strong algorithms (ed25519)
 - Rotate keys regularly
 - Store private keys securely
 ### Payloads
 - Keep payloads concise
 - Validate input data
 - Handle errors gracefully
 - Log important operations
 ## Validation
 Jobs are validated before execution:
 1. **Structure**: All required fields present
 2. **Signature**: Valid cryptographic signature
 3. **Runner**: Target runner exists and available
 4. **Payload**: Non-empty payload
 5. **Timeout**: Reasonable timeout value
 Invalid jobs are rejected before execution.
--- a/docs/runner/hero.md
+++ b/docs/runner/hero.md
@@ -0,0 +1,71 @@
 # Hero Runner
 Executes heroscripts using the Hero CLI tool.
 ## Overview
 The Hero runner pipes job payloads directly to `hero run -s` via stdin, making it ideal for executing Hero automation tasks and heroscripts.
 ## Features
 - **Heroscript Execution**: Direct stdin piping to `hero run -s`
 - **No Temp Files**: Secure execution without filesystem artifacts
 - **Environment Variables**: Full environment variable support
 - **Timeout Support**: Respects job timeout settings
 - **Signature Verification**: Cryptographic job verification
 ## Usage
 ```bash
 # Start the runner
 herorunner my-hero-runner
 # With custom Redis
 herorunner my-hero-runner --redis-url redis://custom:6379
 ```
 ## Job Payload
 The payload should contain the heroscript content:
 ```heroscript
 !!git.list
 print("Repositories listed")
 !!docker.ps
 ```
 ## Examples
 ### Simple Print
 ```heroscript
 print("Hello from heroscript!")
 ```
 ### Hero Actions
 ```heroscript
 !!git.list
 !!docker.start name:"myapp"
 ```
 ### With Environment Variables
 ```json
 {
  "payload": "print(env.MY_VAR)",
  "env_vars": {
    "MY_VAR": "Hello World"
  }
 }
 ```
 ## Requirements
 - `hero` CLI must be installed and in PATH
 - Redis server accessible
 - Valid job signatures
 ## Error Handling
 - **Hero CLI Not Found**: Returns error if `hero` command unavailable
 - **Timeout**: Kills process if timeout exceeded
 - **Non-zero Exit**: Returns error with hero CLI output
 - **Invalid Signature**: Rejects job before execution
--- a/docs/runner/osiris.md
+++ b/docs/runner/osiris.md
@@ -0,0 +1,142 @@
 # Osiris Runner
 Database-backed runner for structured data storage and retrieval.
 ## Overview
 The Osiris runner executes Rhai scripts with access to a model-based database system, enabling structured data operations and persistence.
 ## Features
 - **Rhai Scripting**: Execute Rhai scripts with Osiris database access
 - **Model-Based Storage**: Define and use data models
 - **CRUD Operations**: Create, read, update, delete records
 - **Query Support**: Search and filter data
 - **Schema Validation**: Type-safe data operations
 - **Transaction Support**: Atomic database operations
 ## Usage
 ```bash
 # Start the runner
 runner_osiris my-osiris-runner
 # With custom Redis
 runner_osiris my-osiris-runner --redis-url redis://custom:6379
 ```
 ## Job Payload
 The payload should contain a Rhai script using Osiris operations:
 ```rhai
 // Example: Store data
 let model = osiris.model("users");
 let user = model.create(#{
    name: "Alice",
    email: "alice@example.com",
    age: 30
 });
 print(user.id);
 // Example: Retrieve data
 let found = model.get(user.id);
 print(found.name);
 ```
 ## Examples
 ### Create Model and Store Data
 ```rhai
 // Define model
 let posts = osiris.model("posts");
 // Create record
 let post = posts.create(#{
    title: "Hello World",
    content: "First post",
    author: "Alice",
    published: true
 });
 print(`Created post with ID: ${post.id}`);
 ```
 ### Query Data
 ```rhai
 let posts = osiris.model("posts");
 // Find by field
 let published = posts.find(#{
    published: true
 });
 for post in published {
    print(post.title);
 }
 ```
 ### Update Records
 ```rhai
 let posts = osiris.model("posts");
 // Get record
 let post = posts.get("post-123");
 // Update fields
 post.content = "Updated content";
 posts.update(post);
 ```
 ### Delete Records
 ```rhai
 let posts = osiris.model("posts");
 // Delete by ID
 posts.delete("post-123");
 ```
 ### Transactions
 ```rhai
 osiris.transaction(|| {
    let users = osiris.model("users");
    let posts = osiris.model("posts");
    let user = users.create(#{ name: "Bob" });
    let post = posts.create(#{
        title: "Bob's Post",
        author_id: user.id
    });
    // Both operations commit together
 });
 ```
 ## Data Models
 Models are defined dynamically through Rhai scripts:
 ```rhai
 let model = osiris.model("products");
 // Model automatically handles:
 // - ID generation
 // - Timestamps (created_at, updated_at)
 // - Schema validation
 // - Indexing
 ```
 ## Requirements
 - Redis server accessible
 - Osiris database configured
 - Valid job signatures
 - Sufficient storage for data operations
 ## Use Cases
 - **Configuration Storage**: Store application configs
 - **User Data**: Manage user profiles and preferences
 - **Workflow State**: Persist workflow execution state
 - **Metrics & Logs**: Store structured logs and metrics
 - **Cache Management**: Persistent caching layer
--- a/docs/runner/overview.md
+++ b/docs/runner/overview.md
@@ -0,0 +1,96 @@
 # Runners Overview
 Runners are the execution layer in the Horus architecture. They receive jobs from the Supervisor via Redis queues and execute the actual workload.
 ## Architecture
 ```
 Supervisor → Redis Queue → Runner → Execute Job → Return Result
 ```
 ## Available Runners
 Horus provides three specialized runners:
 ### 1. **Hero Runner**
 Executes heroscripts using the Hero CLI ecosystem.
 **Use Cases:**
 - Running Hero automation tasks
 - Executing heroscripts from job payloads
 - Integration with Hero CLI tools
 **Binary:** `herorunner`
 [→ Hero Runner Documentation](./hero.md)
 ### 2. **SAL Runner**
 System Abstraction Layer runner for system-level operations.
 **Use Cases:**
 - OS operations (file, process, network)
 - Infrastructure management (Kubernetes, VMs)
 - Cloud provider operations (Hetzner)
 - Database operations (Redis, Postgres)
 **Binary:** `runner_sal`
 [→ SAL Runner Documentation](./sal.md)
 ### 3. **Osiris Runner**
 Database-backed runner for data storage and retrieval using Rhai scripts.
 **Use Cases:**
 - Structured data storage
 - Model-based data operations
 - Rhai script execution with database access
 **Binary:** `runner_osiris`
 [→ Osiris Runner Documentation](./osiris.md)
 ## Common Features
 All runners implement the `Runner` trait and provide:
 - **Job Execution**: Process jobs from Redis queues
 - **Signature Verification**: Verify job signatures before execution
 - **Timeout Support**: Respect job timeout settings
 - **Environment Variables**: Pass environment variables to jobs
 - **Error Handling**: Comprehensive error reporting
 - **Logging**: Structured logging for debugging
 ## Runner Protocol
 Runners communicate with the Supervisor using a Redis-based protocol:
 1. **Job Queue**: Supervisor pushes jobs to `runner:{runner_id}:jobs`
 2. **Job Processing**: Runner pops job, validates signature, executes
 3. **Result Storage**: Runner stores result in `job:{job_id}:result`
 4. **Status Updates**: Runner updates job status throughout execution
 ## Starting a Runner
 ```bash
 # Hero Runner
 herorunner <runner_id> [--redis-url <url>]
 # SAL Runner
 runner_sal <runner_id> [--redis-url <url>]
 # Osiris Runner
 runner_osiris <runner_id> [--redis-url <url>]
 ```
 ## Configuration
 All runners accept:
 - `runner_id`: Unique identifier for the runner (required)
 - `--redis-url`: Redis connection URL (default: `redis://localhost:6379`)
 ## Security
 - Jobs must be cryptographically signed
 - Runners verify signatures before execution
 - Untrusted jobs are rejected
 - Environment variables should not contain sensitive data in production
--- a/docs/runner/sal.md
+++ b/docs/runner/sal.md
@@ -0,0 +1,123 @@
 # SAL Runner
 System Abstraction Layer runner for system-level operations.
 ## Overview
 The SAL runner executes Rhai scripts with access to system abstraction modules for OS operations, infrastructure management, and cloud provider interactions.
 ## Features
 - **Rhai Scripting**: Execute Rhai scripts with SAL modules
 - **System Operations**: File, process, and network management
 - **Infrastructure**: Kubernetes, VM, and container operations
 - **Cloud Providers**: Hetzner and other cloud integrations
 - **Database Access**: Redis and Postgres client operations
 - **Networking**: Mycelium and network configuration
 ## Available SAL Modules
 ### Core Modules
 - **sal-os**: Operating system operations
 - **sal-process**: Process management
 - **sal-text**: Text processing utilities
 - **sal-net**: Network operations
 ### Infrastructure
 - **sal-virt**: Virtualization management
 - **sal-kubernetes**: Kubernetes cluster operations
 - **sal-zinit-client**: Zinit process manager
 ### Storage & Data
 - **sal-redisclient**: Redis operations
 - **sal-postgresclient**: PostgreSQL operations
 - **sal-vault**: Secret management
 ### Networking
 - **sal-mycelium**: Mycelium network integration
 ### Cloud Providers
 - **sal-hetzner**: Hetzner cloud operations
 ### Version Control
 - **sal-git**: Git repository operations
 ## Usage
 ```bash
 # Start the runner
 runner_sal my-sal-runner
 # With custom Redis
 runner_sal my-sal-runner --redis-url redis://custom:6379
 ```
 ## Job Payload
 The payload should contain a Rhai script using SAL modules:
 ```rhai
 // Example: List files
 let files = os.list_dir("/tmp");
 print(files);
 // Example: Process management
 let pid = process.spawn("ls", ["-la"]);
 let output = process.wait(pid);
 print(output);
 ```
 ## Examples
 ### File Operations
 ```rhai
 // Read file
 let content = os.read_file("/path/to/file");
 print(content);
 // Write file
 os.write_file("/path/to/output", "Hello World");
 ```
 ### Kubernetes Operations
 ```rhai
 // List pods
 let pods = k8s.list_pods("default");
 for pod in pods {
    print(pod.name);
 }
 ```
 ### Redis Operations
 ```rhai
 // Set value
 redis.set("key", "value");
 // Get value
 let val = redis.get("key");
 print(val);
 ```
 ### Git Operations
 ```rhai
 // Clone repository
 git.clone("https://github.com/user/repo", "/tmp/repo");
 // Get status
 let status = git.status("/tmp/repo");
 print(status);
 ```
 ## Requirements
 - Redis server accessible
 - System permissions for requested operations
 - Valid job signatures
 - SAL modules available in runtime
 ## Security Considerations
 - SAL operations have system-level access
 - Jobs must be from trusted sources
 - Signature verification is mandatory
 - Limit runner permissions in production
--- a/docs/supervisor/overview.md
+++ b/docs/supervisor/overview.md
@@ -0,0 +1,88 @@
 # Supervisor Overview
 The Supervisor is the job dispatcher layer in Horus. It receives jobs, verifies signatures, and routes them to appropriate runners.
 ## Architecture
 ```
 Client → Supervisor → Redis Queue → Runner
 ```
 ## Responsibilities
 ### 1. **Job Admission**
 - Receive jobs via OpenRPC interface
 - Validate job structure and required fields
 - Verify cryptographic signatures
 ### 2. **Authentication & Authorization**
 - Verify job signatures using public keys
 - Ensure jobs are from authorized sources
 - Reject unsigned or invalid jobs
 ### 3. **Job Routing**
 - Route jobs to appropriate runner queues
 - Maintain runner registry
 - Load balance across available runners
 ### 4. **Job Management**
 - Track job status and lifecycle
 - Provide job query and listing APIs
 - Store job results and logs
 ### 5. **Runner Management**
 - Register and track available runners
 - Monitor runner health and availability
 - Handle runner disconnections
 ## OpenRPC Interface
 The Supervisor exposes an OpenRPC API for job management:
 ### Job Operations
 - `create_job`: Submit a new job
 - `get_job`: Retrieve job details
 - `list_jobs`: List all jobs
 - `delete_job`: Remove a job
 - `get_job_logs`: Retrieve job execution logs
 ### Runner Operations
 - `register_runner`: Register a new runner
 - `list_runners`: List available runners
 - `get_runner_status`: Check runner health
 ## Job Lifecycle
 1. **Submission**: Client submits job via OpenRPC
 2. **Validation**: Supervisor validates structure and signature
 3. **Queueing**: Job pushed to runner's Redis queue
 4. **Execution**: Runner processes job
 5. **Completion**: Result stored in Redis
 6. **Retrieval**: Client retrieves result via OpenRPC
 ## Transport Options
 The Supervisor supports multiple transport layers:
 - **HTTP**: Standard HTTP/HTTPS transport
 - **Mycelium**: Peer-to-peer encrypted transport
 ## Configuration
 ```bash
 # Start supervisor
 supervisor --port 8080 --redis-url redis://localhost:6379
 # With Mycelium
 supervisor --port 8080 --mycelium --redis-url redis://localhost:6379
 ```
 ## Security
 - All jobs must be cryptographically signed
 - Signatures verified before job admission
 - Public key infrastructure for identity
 - Optional TLS for HTTP transport
 - End-to-end encryption via Mycelium
 [→ Authentication Documentation](./auth.md)