add some documentation for blue book
This commit is contained in:
@@ -1,15 +1,185 @@
|
||||
# Architecture
|
||||
|
||||
The Horus architecture consists of three layers:
|
||||
Horus is a hierarchical orchestration runtime with three layers: Coordinator, Supervisor, and Runner.
|
||||
|
||||
1. Coordinator: A workflow engine that executes DAG-based flows by sending ready job steps to the targeted supervisors.
|
||||
2. Supervisor: A job dispatcher that routes jobs to the appropriate runners.
|
||||
3. Runner: A job executor that runs the actual job steps.
|
||||
## Overview
|
||||
|
||||
## Networking
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Coordinator │
|
||||
│ (Workflow Engine - DAG Execution) │
|
||||
│ │
|
||||
│ • Parses workflow definitions │
|
||||
│ • Resolves dependencies │
|
||||
│ • Dispatches ready steps │
|
||||
│ • Tracks workflow state │
|
||||
└────────────────────┬────────────────────────────────────┘
|
||||
│ OpenRPC (HTTP/Mycelium)
|
||||
│
|
||||
┌────────────────────▼────────────────────────────────────┐
|
||||
│ Supervisor │
|
||||
│ (Job Dispatcher & Authenticator) │
|
||||
│ │
|
||||
│ • Verifies job signatures │
|
||||
│ • Routes jobs to runners │
|
||||
│ • Manages runner registry │
|
||||
│ • Tracks job lifecycle │
|
||||
└────────────────────┬────────────────────────────────────┘
|
||||
│ Redis Queue Protocol
|
||||
│
|
||||
┌────────────────────▼────────────────────────────────────┐
|
||||
│ Runners │
|
||||
│ (Job Executors) │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ Hero │ │ SAL │ │ Osiris │ │
|
||||
│ │ Runner │ │ Runner │ │ Runner │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
- The user / client talks to the coordinator over an OpenRPC interface, using either regular HTTP transport or Mycelium.
|
||||
- The coordinator talks to the supervisor over an OpenRPC interface, using either regular HTTP transport or Mycelium.
|
||||
- The supervisor talks to runners over a Redis based job execution protocol.
|
||||
## Layers
|
||||
|
||||
### 1. Coordinator (Optional)
|
||||
**Purpose:** Workflow orchestration and DAG execution
|
||||
|
||||
**Responsibilities:**
|
||||
- Parse and validate workflow definitions
|
||||
- Execute DAG-based flows
|
||||
- Manage step dependencies
|
||||
- Route jobs to appropriate supervisors
|
||||
- Handle multi-step workflows
|
||||
|
||||
**Use When:**
|
||||
- You need multi-step workflows
|
||||
- Jobs have dependencies
|
||||
- Parallel execution is required
|
||||
- Complex data pipelines
|
||||
|
||||
[→ Coordinator Documentation](./coordinator/overview.md)
|
||||
|
||||
### 2. Supervisor (Required)
|
||||
**Purpose:** Job admission, authentication, and routing
|
||||
|
||||
**Responsibilities:**
|
||||
- Receive jobs via OpenRPC interface
|
||||
- Verify cryptographic signatures
|
||||
- Route jobs to appropriate runners
|
||||
- Manage runner registry
|
||||
- Track job status and results
|
||||
|
||||
**Features:**
|
||||
- OpenRPC API for job management
|
||||
- HTTP and Mycelium transport
|
||||
- Signature-based authentication
|
||||
- Runner health monitoring
|
||||
|
||||
[→ Supervisor Documentation](./supervisor/overview.md)
|
||||
|
||||
### 3. Runners (Required)
|
||||
**Purpose:** Execute actual job workloads
|
||||
|
||||
**Available Runners:**
|
||||
- **Hero Runner**: Executes heroscripts via Hero CLI
|
||||
- **SAL Runner**: System operations (OS, K8s, cloud, etc.)
|
||||
- **Osiris Runner**: Database operations with Rhai scripts
|
||||
|
||||
**Common Features:**
|
||||
- Redis queue-based job polling
|
||||
- Signature verification
|
||||
- Timeout support
|
||||
- Environment variable handling
|
||||
|
||||
[→ Runner Documentation](./runner/overview.md)
|
||||
|
||||
## Communication Protocols
|
||||
|
||||
### Client ↔ Coordinator
|
||||
- **Protocol:** OpenRPC
|
||||
- **Transport:** HTTP or Mycelium
|
||||
- **Operations:** Submit workflow, check status, retrieve results
|
||||
|
||||
### Coordinator ↔ Supervisor
|
||||
- **Protocol:** OpenRPC
|
||||
- **Transport:** HTTP or Mycelium
|
||||
- **Operations:** Create job, get status, retrieve logs
|
||||
|
||||
### Supervisor ↔ Runner
|
||||
- **Protocol:** Redis Queue
|
||||
- **Transport:** Redis pub/sub and lists
|
||||
- **Operations:** Push job, poll queue, store result
|
||||
|
||||
## Job Flow
|
||||
|
||||
### Simple Job (No Coordinator)
|
||||
```
|
||||
1. Client → Supervisor: create_job()
|
||||
2. Supervisor: Verify signature
|
||||
3. Supervisor → Redis: Push to runner queue
|
||||
4. Runner ← Redis: Pop job
|
||||
5. Runner: Execute job
|
||||
6. Runner → Redis: Store result
|
||||
7. Client ← Supervisor: get_job_result()
|
||||
```
|
||||
|
||||
### Workflow (With Coordinator)
|
||||
```
|
||||
1. Client → Coordinator: submit_workflow()
|
||||
2. Coordinator: Parse DAG
|
||||
3. Coordinator: Identify ready steps
|
||||
4. Coordinator → Supervisor: create_job() for each ready step
|
||||
5. Supervisor → Runner: Route via Redis
|
||||
6. Runner: Execute and return result
|
||||
7. Coordinator: Update workflow state
|
||||
8. Coordinator: Dispatch next ready steps
|
||||
9. Repeat until workflow complete
|
||||
```
|
||||
|
||||
## Security Model
|
||||
|
||||
### Authentication
|
||||
- Jobs must be cryptographically signed
|
||||
- Signatures verified at Supervisor layer
|
||||
- Public key infrastructure for identity
|
||||
|
||||
### Authorization
|
||||
- Runners only execute signed jobs
|
||||
- Signature verification before execution
|
||||
- Untrusted jobs rejected
|
||||
|
||||
### Transport Security
|
||||
- Optional TLS for HTTP transport
|
||||
- End-to-end encryption via Mycelium
|
||||
- No plaintext credentials
|
||||
|
||||
[→ Authentication Details](./supervisor/auth.md)
|
||||
|
||||
## Deployment Patterns
|
||||
|
||||
### Minimal Setup
|
||||
```
|
||||
Redis + Supervisor + Runner(s)
|
||||
```
|
||||
Single machine, simple job execution.
|
||||
|
||||
### Distributed Setup
|
||||
```
|
||||
Redis Cluster + Multiple Supervisors + Runner Pool
|
||||
```
|
||||
High availability, load balancing.
|
||||
|
||||
### Full Orchestration
|
||||
```
|
||||
Coordinator + Multiple Supervisors + Runner Pool
|
||||
```
|
||||
Complex workflows, multi-step pipelines.
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Hierarchical**: Clear separation of concerns across layers
|
||||
2. **Secure**: Signature-based authentication throughout
|
||||
3. **Scalable**: Horizontal scaling at each layer
|
||||
4. **Observable**: Comprehensive logging and status tracking
|
||||
5. **Flexible**: Multiple runners for different workload types
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user