add some documentation for blue book
This commit is contained in:
145
docs/coordinator/overview.md
Normal file
145
docs/coordinator/overview.md
Normal file
@@ -0,0 +1,145 @@
|
||||
# Coordinator Overview
|
||||
|
||||
The Coordinator is the workflow orchestration layer in Horus. It executes DAG-based flows by managing job dependencies and dispatching ready steps to supervisors.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Client → Coordinator → Supervisor(s) → Runner(s)
|
||||
```
|
||||
|
||||
## Responsibilities
|
||||
|
||||
### 1. **Workflow Management**
|
||||
- Parse and validate DAG workflow definitions
|
||||
- Track workflow execution state
|
||||
- Manage step dependencies
|
||||
|
||||
### 2. **Job Orchestration**
|
||||
- Determine which steps are ready to execute
|
||||
- Dispatch jobs to appropriate supervisors
|
||||
- Handle step failures and retries
|
||||
|
||||
### 3. **Dependency Resolution**
|
||||
- Track step completion
|
||||
- Resolve data dependencies between steps
|
||||
- Pass outputs from completed steps to dependent steps
|
||||
|
||||
### 4. **Multi-Supervisor Coordination**
|
||||
- Route jobs to specific supervisors
|
||||
- Handle supervisor failures
|
||||
- Load balance across supervisors
|
||||
|
||||
## Workflow Definition
|
||||
|
||||
Workflows are defined as Directed Acyclic Graphs (DAGs):
|
||||
|
||||
```yaml
|
||||
workflow:
|
||||
name: "data-pipeline"
|
||||
steps:
|
||||
- id: "fetch"
|
||||
runner: "hero"
|
||||
payload: "!!http.get url:'https://api.example.com/data'"
|
||||
|
||||
- id: "process"
|
||||
runner: "sal"
|
||||
depends_on: ["fetch"]
|
||||
payload: |
|
||||
let data = input.fetch;
|
||||
let processed = process_data(data);
|
||||
processed
|
||||
|
||||
- id: "store"
|
||||
runner: "osiris"
|
||||
depends_on: ["process"]
|
||||
payload: |
|
||||
let model = osiris.model("results");
|
||||
model.create(input.process);
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
### DAG Execution
|
||||
- Parallel execution of independent steps
|
||||
- Sequential execution of dependent steps
|
||||
- Automatic dependency resolution
|
||||
|
||||
### Error Handling
|
||||
- Step-level retry policies
|
||||
- Workflow-level error handlers
|
||||
- Partial workflow recovery
|
||||
|
||||
### Data Flow
|
||||
- Pass outputs between steps
|
||||
- Transform data between steps
|
||||
- Aggregate results from parallel steps
|
||||
|
||||
### Monitoring
|
||||
- Real-time workflow status
|
||||
- Step-level progress tracking
|
||||
- Execution metrics and logs
|
||||
|
||||
## Workflow Lifecycle
|
||||
|
||||
1. **Submission**: Client submits workflow definition
|
||||
2. **Validation**: Coordinator validates DAG structure
|
||||
3. **Scheduling**: Determine ready steps (no pending dependencies)
|
||||
4. **Dispatch**: Send jobs to supervisors
|
||||
5. **Tracking**: Monitor step completion
|
||||
6. **Progression**: Execute next ready steps
|
||||
7. **Completion**: Workflow finishes when all steps complete
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Data Pipelines
|
||||
```yaml
|
||||
Extract → Transform → Load
|
||||
```
|
||||
|
||||
### CI/CD Workflows
|
||||
```yaml
|
||||
Build → Test → Deploy
|
||||
```
|
||||
|
||||
### Multi-Stage Processing
|
||||
```yaml
|
||||
Fetch Data → Process → Validate → Store → Notify
|
||||
```
|
||||
|
||||
### Parallel Execution
|
||||
```yaml
|
||||
┌─ Task A ─┐
|
||||
Start ──┼─ Task B ─┼── Aggregate → Finish
|
||||
└─ Task C ─┘
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
```bash
|
||||
# Start coordinator
|
||||
coordinator --port 9090 --redis-url redis://localhost:6379
|
||||
|
||||
# With multiple supervisors
|
||||
coordinator --port 9090 \
|
||||
--supervisor http://supervisor1:8080 \
|
||||
--supervisor http://supervisor2:8080
|
||||
```
|
||||
|
||||
## API
|
||||
|
||||
The Coordinator exposes an OpenRPC API:
|
||||
|
||||
- `submit_workflow`: Submit a new workflow
|
||||
- `get_workflow_status`: Check workflow progress
|
||||
- `list_workflows`: List all workflows
|
||||
- `cancel_workflow`: Stop a running workflow
|
||||
- `get_workflow_logs`: Retrieve execution logs
|
||||
|
||||
## Advantages
|
||||
|
||||
- **Declarative**: Define what to do, not how
|
||||
- **Scalable**: Parallel execution across multiple supervisors
|
||||
- **Resilient**: Automatic retry and error handling
|
||||
- **Observable**: Real-time status and logging
|
||||
- **Composable**: Reuse workflows as steps in larger workflows
|
||||
Reference in New Issue
Block a user