baobab/core/supervisor/LIFECYCLE.md
2025-08-05 15:44:33 +02:00

9.5 KiB

Actor Lifecycle Management

The Hero Supervisor includes comprehensive actor lifecycle management functionality using Zinit as the process manager. This enables the supervisor to manage actor processes, perform health monitoring, and implement load balancing.

Overview

The lifecycle management system provides:

  • Actor Process Management: Start, stop, restart, and monitor actor binaries
  • Health Monitoring: Automatic ping jobs every 10 minutes for idle actors
  • Graceful Shutdown: Clean termination of actor processes

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Supervisor    │    │ ActorLifecycle  │    │     Zinit       │
│                 │◄──►│    Manager       │◄──►│   (Process      │
│  (Job Dispatch) │    │                  │    │    Manager)     │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                       │                       │
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│     Redis       │    │ Health Monitor   │    │ Actor Binaries │
│   (Job Queue)   │    │  (Ping Jobs)     │    │  (OSIS/SAL/V)   │
└─────────────────┘    └──────────────────┘    └─────────────────┘

Components

ActorConfig

Defines configuration for a actor binary:

use hero_supervisor::{ActorConfig, ScriptType};
use std::path::PathBuf;
use std::collections::HashMap;

let config = ActorConfig::new(
    "osis_actor_0".to_string(),
    PathBuf::from("/usr/local/bin/osis_actor"),
    ScriptType::OSIS,
)
.with_args(vec![
    "--redis-url".to_string(),
    "redis://localhost:6379".to_string(),
    "--actor-id".to_string(),
    "osis_actor_0".to_string(),
])
.with_env({
    let mut env = HashMap::new();
    env.insert("RUST_LOG".to_string(), "info".to_string());
    env.insert("ACTOR_TYPE".to_string(), "osis".to_string());
    env
})
.with_health_check("/usr/local/bin/osis_actor --health-check".to_string())
.with_dependencies(vec!["redis".to_string()]);

ActorLifecycleManager

Main component for managing actor lifecycles:

use hero_supervisor::{ActorLifecycleManagerBuilder, Supervisor};

let supervisor = SupervisorBuilder::new()
    .redis_url("redis://localhost:6379")
    .caller_id("my_supervisor")
    .context_id("production")
    .build()?;

let mut lifecycle_manager = ActorLifecycleManagerBuilder::new("/var/run/zinit.sock".to_string())
    .with_supervisor(supervisor.clone())
    .add_actor(osis_actor_config)
    .add_actor(sal_actor_config)
    .add_actor(v_actor_config)
    .build();

Supported Script Types

The lifecycle manager supports all Hero script types:

  • OSIS: Rhai/HeroScript execution actors
  • SAL: System Abstraction Layer actors
  • V: HeroScript execution in V language
  • Python: HeroScript execution in Python

Key Features

1. Actor Management

// Start all configured actors
lifecycle_manager.start_all_actors().await?;

// Stop all actors
lifecycle_manager.stop_all_actors().await?;

// Restart specific actor
lifecycle_manager.restart_actor("osis_actor_0").await?;

// Get actor status
let status = lifecycle_manager.get_actor_status("osis_actor_0").await?;
println!("Actor state: {:?}, PID: {}", status.state, status.pid);

2. Health Monitoring

The system automatically monitors actor health:

  • Tracks last job execution time for each actor
  • Sends ping jobs to actors idle for 10+ minutes
  • Restarts actors that fail ping checks 3 times
  • Updates job times when actors receive tasks
// Manual health check
lifecycle_manager.monitor_actor_health().await?;

// Update job time (called automatically by supervisor)
lifecycle_manager.update_actor_job_time("osis_actor_0");

// Start continuous health monitoring
lifecycle_manager.start_health_monitoring().await; // Runs forever

3. Dynamic Scaling

Scale actors up or down based on demand:

// Scale OSIS actors to 5 instances
lifecycle_manager.scale_actors(&ScriptType::OSIS, 5).await?;

// Scale down SAL actors to 1 instance  
lifecycle_manager.scale_actors(&ScriptType::SAL, 1).await?;

// Check current running count
let count = lifecycle_manager.get_running_actor_count(&ScriptType::V).await;
println!("Running V actors: {}", count);

4. Service Dependencies

Actors can depend on other services:

let config = ActorConfig::new(name, binary, script_type)
    .with_dependencies(vec![
        "redis".to_string(),
        "database".to_string(),
        "auth_service".to_string(),
    ]);

Zinit ensures dependencies start before the actor.

Integration with Supervisor

The lifecycle manager integrates seamlessly with the supervisor:

use hero_supervisor::{Supervisor, ActorLifecycleManager};

// Create supervisor and lifecycle manager
let supervisor = SupervisorBuilder::new().build()?;
let mut lifecycle_manager = ActorLifecycleManagerBuilder::new(zinit_socket)
    .with_supervisor(supervisor.clone())
    .build();

// Start actors
lifecycle_manager.start_all_actors().await?;

// Create and execute jobs (supervisor automatically routes to actors)
let job = supervisor
    .new_job()
    .script_type(ScriptType::OSIS)
    .script_content("println!(\"Hello World!\");".to_string())
    .build()?;

let result = supervisor.run_job_and_await_result(&job).await?;
println!("Job result: {}", result);

Zinit Service Configuration

The lifecycle manager automatically creates Zinit service configurations:

# Generated service config for osis_actor_0
exec: "/usr/local/bin/osis_actor --redis-url redis://localhost:6379 --actor-id osis_actor_0"
test: "/usr/local/bin/osis_actor --health-check"
oneshot: false  # Restart on exit
after:
  - redis
env:
  RUST_LOG: "info"
  ACTOR_TYPE: "osis"

Error Handling

The system provides comprehensive error handling:

use hero_supervisor::SupervisorError;

match lifecycle_manager.start_actor(&config).await {
    Ok(_) => println!("Actor started successfully"),
    Err(SupervisorError::ActorStartFailed(actor, reason)) => {
        eprintln!("Failed to start {}: {}", actor, reason);
    }
    Err(e) => eprintln!("Other error: {}", e),
}

Example Usage

See examples/lifecycle_demo.rs for a comprehensive demonstration:

# Run the lifecycle demo
cargo run --example lifecycle_demo

# Run with custom Redis URL
REDIS_URL=redis://localhost:6379 cargo run --example lifecycle_demo

Prerequisites

  1. Zinit: Install and run Zinit process manager

    curl https://raw.githubusercontent.com/threefoldtech/zinit/refs/heads/master/install.sh | bash
    zinit init --config /etc/zinit/ --socket /var/run/zinit.sock
    
  2. Redis: Running Redis instance for job queues

    redis-server
    
  3. Actor Binaries: Compiled actor binaries for each script type

    • /usr/local/bin/osis_actor
    • /usr/local/bin/sal_actor
    • /usr/local/bin/v_actor
    • /usr/local/bin/python_actor

Configuration Best Practices

  1. Resource Limits: Configure appropriate resource limits in Zinit
  2. Health Checks: Implement meaningful health check commands
  3. Dependencies: Define proper service dependencies
  4. Environment: Set appropriate environment variables
  5. Logging: Configure structured logging for debugging
  6. Monitoring: Use health monitoring for production deployments

Troubleshooting

Common Issues

  1. Zinit Connection Failed

    • Ensure Zinit is running: ps aux | grep zinit
    • Check socket permissions: ls -la /var/run/zinit.sock
    • Verify socket path in configuration
  2. Actor Start Failed

    • Check binary exists and is executable
    • Verify dependencies are running
    • Review Zinit logs: zinit logs <service-name>
  3. Health Check Failures

    • Implement proper health check endpoint in actors
    • Verify health check command syntax
    • Check actor responsiveness
  4. Redis Connection Issues

    • Ensure Redis is running and accessible
    • Verify Redis URL configuration
    • Check network connectivity

Debug Commands

# Check Zinit status
zinit list

# View service logs
zinit logs osis_actor_0

# Check service status
zinit status osis_actor_0

# Monitor Redis queues
redis-cli keys "hero:job:*"

Performance Considerations

  • Scaling: Start with minimal actors and scale based on queue depth
  • Health Monitoring: Adjust ping intervals based on workload patterns
  • Resource Usage: Monitor CPU/memory usage of actor processes
  • Queue Depth: Monitor Redis queue lengths for scaling decisions

Security

  • Process Isolation: Zinit provides process isolation
  • User Permissions: Run actors with appropriate user permissions
  • Network Security: Secure Redis and Zinit socket access
  • Binary Validation: Verify actor binary integrity before deployment

Future

  • Load Balancing: Dynamic scaling of actors based on demand
  • Service Dependencies: Proper startup ordering with dependency management