7.3 KiB
Hero Dispatcher Protocol
This document describes the Redis-based protocol used by the Hero Dispatcher for job management and worker communication.
Overview
The Hero Dispatcher uses Redis as a message broker and data store for managing distributed job execution. Jobs are stored as Redis hashes, and communication with workers happens through Redis lists (queues).
Redis Namespace
All dispatcher-related keys use the hero:
namespace prefix to avoid conflicts with other Redis usage.
Data Structures
Job Storage
Jobs are stored as Redis hashes with the following key pattern:
hero:job:{job_id}
Job Hash Fields:
id
: Unique job identifier (UUID v4)caller_id
: Identifier of the client that created the jobworker_id
: Target worker identifiercontext_id
: Execution context identifierscript
: Script content to execute (Rhai or HeroScript)timeout
: Execution timeout in secondsretries
: Number of retry attemptsconcurrent
: Whether to execute in separate thread (true/false)log_path
: Optional path to log file for job outputcreated_at
: Job creation timestamp (ISO 8601)updated_at
: Job last update timestamp (ISO 8601)status
: Current job status (dispatched/started/error/finished)env_vars
: Environment variables as JSON object (optional)prerequisites
: JSON array of job IDs that must complete before this job (optional)dependents
: JSON array of job IDs that depend on this job completing (optional)output
: Job execution result (set by worker)error
: Error message if job failed (set by worker)dependencies
: List of job IDs that this job depends on
Job Dependencies
Jobs can have dependencies on other jobs, which are stored in the dependencies
field. A job will not be dispatched until all its dependencies have completed successfully.
Work Queues
Jobs are queued for execution using Redis lists:
hero:work_queue:{worker_id}
Workers listen on their specific queue using BLPOP
for job IDs to process.
Stop Queues
Job stop requests are sent through dedicated stop queues:
hero:stop_queue:{worker_id}
Workers monitor these queues to receive stop requests for running jobs.
Reply Queues
For synchronous job execution, dedicated reply queues are used:
hero:reply:{job_id}
Workers send results to these queues when jobs complete.
Job Lifecycle
1. Job Creation
Client -> Redis: HSET hero:job:{job_id} {job_fields}
2. Job Submission
Client -> Redis: LPUSH hero:work_queue:{worker_id} {job_id}
3. Job Processing
Worker -> Redis: BLPOP hero:work_queue:{worker_id}
Worker -> Redis: HSET hero:job:{job_id} status "started"
Worker: Execute script
Worker -> Redis: HSET hero:job:{job_id} status "finished" output "{result}"
4. Job Completion (Async)
Worker -> Redis: LPUSH hero:reply:{job_id} {result}
API Operations
List Jobs
dispatcher.list_jobs() -> Vec<String>
Redis Operations:
KEYS hero:job:*
- Get all job keys- Extract job IDs from key names
Stop Job
dispatcher.stop_job(job_id) -> Result<(), DispatcherError>
Redis Operations:
LPUSH hero:stop_queue:{worker_id} {job_id}
- Send stop request
Get Job Status
dispatcher.get_job_status(job_id) -> Result<JobStatus, DispatcherError>
Redis Operations:
HGETALL hero:job:{job_id}
- Get job data- Parse
status
field
Get Job Logs
dispatcher.get_job_logs(job_id) -> Result<Option<String>, DispatcherError>
Redis Operations:
HGETALL hero:job:{job_id}
- Get job data- Read
log_path
field - Read log file from filesystem
Run Job and Await Result
dispatcher.run_job_and_await_result(job, worker_id) -> Result<String, DispatcherError>
Redis Operations:
HSET hero:job:{job_id} {job_fields}
- Store jobLPUSH hero:work_queue:{worker_id} {job_id}
- Submit jobBLPOP hero:reply:{job_id} {timeout}
- Wait for result
Worker Protocol
Job Processing Loop
loop {
// 1. Wait for job
job_id = BLPOP hero:work_queue:{worker_id}
// 2. Get job details
job_data = HGETALL hero:job:{job_id}
// 3. Update status
HSET hero:job:{job_id} status "started"
// 4. Check for stop requests
if LLEN hero:stop_queue:{worker_id} > 0 {
stop_job_id = LPOP hero:stop_queue:{worker_id}
if stop_job_id == job_id {
HSET hero:job:{job_id} status "error" error "stopped"
continue
}
}
// 5. Execute script
result = execute_script(job_data.script)
// 6. Update job with result
HSET hero:job:{job_id} status "finished" output result
// 7. Send reply if needed
if reply_queue_exists(hero:reply:{job_id}) {
LPUSH hero:reply:{job_id} result
}
}
Stop Request Handling
Workers should periodically check the stop queue during long-running jobs:
if LLEN hero:stop_queue:{worker_id} > 0 {
stop_requests = LRANGE hero:stop_queue:{worker_id} 0 -1
if stop_requests.contains(current_job_id) {
// Stop current job execution
HSET hero:job:{current_job_id} status "error" error "stopped_by_request"
// Remove stop request
LREM hero:stop_queue:{worker_id} 1 current_job_id
return
}
}
Error Handling
Job Timeouts
- Client sets timeout when creating job
- Worker should respect timeout and stop execution
- If timeout exceeded:
HSET hero:job:{job_id} status "error" error "timeout"
Worker Failures
- If worker crashes, job remains in "started" status
- Monitoring systems can detect stale jobs and retry
- Jobs can be requeued:
LPUSH hero:work_queue:{worker_id} {job_id}
Redis Connection Issues
- Clients should implement retry logic with exponential backoff
- Workers should reconnect and resume processing
- Use Redis persistence to survive Redis restarts
Monitoring and Observability
Queue Monitoring
# Check work queue length
LLEN hero:work_queue:{worker_id}
# Check stop queue length
LLEN hero:stop_queue:{worker_id}
# List all jobs
KEYS hero:job:*
# Get job details
HGETALL hero:job:{job_id}
Metrics to Track
- Jobs created per second
- Jobs completed per second
- Average job execution time
- Queue depths
- Worker availability
- Error rates by job type
Security Considerations
Redis Security
- Use Redis AUTH for authentication
- Enable TLS for Redis connections
- Restrict Redis network access
- Use Redis ACLs to limit worker permissions
Job Security
- Validate script content before execution
- Sandbox script execution environment
- Limit resource usage (CPU, memory, disk)
- Log all job executions for audit
Log File Security
- Ensure log paths are within allowed directories
- Validate log file permissions
- Rotate and archive logs regularly
- Sanitize sensitive data in logs
Performance Considerations
Redis Optimization
- Use Redis pipelining for batch operations
- Configure appropriate Redis memory limits
- Use Redis clustering for high availability
- Monitor Redis memory usage and eviction
Job Optimization
- Keep job payloads small
- Use efficient serialization formats
- Batch similar jobs when possible
- Implement job prioritization if needed
Worker Optimization
- Pool worker connections to Redis
- Use async I/O for Redis operations
- Implement graceful shutdown handling
- Monitor worker resource usage