sal/service_manager/plan_to_fix.md
Mahmoud-Emad 131d978450 feat: Add service manager support
- Add a new service manager crate for dynamic service management
- Integrate service manager with Rhai for scripting
- Provide examples for circle worker management and basic usage
- Add comprehensive tests for service lifecycle and error handling
- Implement cross-platform support for macOS and Linux (zinit/systemd)
2025-07-01 18:00:21 +03:00

6.2 KiB

🔧 Service Manager Production Readiness Fix Plan

📋 Executive Summary

This plan addresses all critical issues found in the service manager code review to achieve production readiness. The approach prioritizes quick fixes while maintaining code quality and stability.

🚨 Critical Issues Identified

  1. Runtime Creation Anti-Pattern - Creating new tokio runtimes for every operation
  2. Trait Design Inconsistency - Sync/async method signature mismatches
  3. Placeholder SystemdServiceManager - Incomplete Linux implementation
  4. Dangerous Patterns - unwrap() calls and silent error handling
  5. API Duplication - Confusing run() method that duplicates start_and_confirm()
  6. Dead Code - Unused fields and methods
  7. Broken Documentation - Non-compiling README examples

🎯 Solution Strategy: Fully Synchronous API

Decision: Redesign the API to be fully synchronous for quick fixes and better resource management.

Rationale:

  • Eliminates runtime creation anti-pattern
  • Simplifies the API surface
  • Reduces complexity and potential for async-related bugs
  • Easier to test and debug
  • Better performance for service management operations

📝 Detailed Fix Plan

Phase 1: Core API Redesign (2 hours)

1.1 Fix Trait Definition

  • File: src/lib.rs
  • Action: Remove all async keywords from trait methods
  • Remove: run() method (duplicate of start_and_confirm())
  • Simplify: Timeout handling in synchronous context

1.2 Update ZinitServiceManager

  • File: src/zinit.rs
  • Action:
    • Remove runtime creation anti-pattern
    • Use single shared runtime or blocking operations
    • Remove execute_async_with_timeout (dead code)
    • Remove unused socket_path field
    • Replace unwrap() calls with proper error handling
    • Fix silent error handling in remove() method

1.3 Update LaunchctlServiceManager

  • File: src/launchctl.rs
  • Action:
    • Remove runtime creation from every method
    • Use tokio::task::block_in_place for async operations
    • Remove run() method duplication
    • Fix silent error handling patterns

Phase 2: Complete SystemdServiceManager (3 hours)

2.1 Implement Core Functionality

  • File: src/systemd.rs
  • Action: Replace all placeholder implementations with real systemd integration
  • Methods to implement:
    • start() - Use systemctl start
    • stop() - Use systemctl stop
    • restart() - Use systemctl restart
    • status() - Parse systemctl status output
    • logs() - Use journalctl for log retrieval
    • list() - Parse systemctl list-units
    • remove() - Use systemctl disable and remove unit files

2.2 Add Unit File Management

  • Action: Implement systemd unit file creation and management
  • Features:
    • Generate .service files from ServiceConfig
    • Handle user vs system service placement
    • Proper file permissions and ownership

Phase 3: Error Handling & Safety (1 hour)

3.1 Remove Dangerous Patterns

  • Action: Replace all unwrap() calls with proper error handling
  • Files: All implementation files
  • Pattern: unwrap()map_err() with descriptive errors

3.2 Fix Silent Error Handling

  • Action: Replace let _ = ... patterns with proper error handling
  • Strategy: Log errors or propagate them appropriately

Phase 4: Testing & Documentation (1 hour)

4.1 Fix README Examples

  • File: README.md
  • Action: Update all code examples to match synchronous API
  • Add: Proper error handling examples
  • Add: Platform-specific usage notes

4.2 Update Tests

  • Action: Ensure all tests work with synchronous API
  • Fix: Remove async test patterns where not needed
  • Add: Error handling test cases

🔧 Implementation Details

Runtime Strategy for Async Operations

Since we're keeping the API synchronous but some underlying operations (like zinit-client) are async:

// Use a lazy static runtime for async operations
use once_cell::sync::Lazy;
use tokio::runtime::Runtime;

static ASYNC_RUNTIME: Lazy<Runtime> = Lazy::new(|| {
    Runtime::new().expect("Failed to create async runtime")
});

// In methods that need async operations:
fn some_method(&self) -> Result<T, ServiceManagerError> {
    ASYNC_RUNTIME.block_on(async {
        // async operations here
    })
}

SystemdServiceManager Implementation Strategy

// Use std::process::Command for systemctl operations
fn run_systemctl(&self, args: &[&str]) -> Result<String, ServiceManagerError> {
    let output = std::process::Command::new("systemctl")
        .args(args)
        .output()
        .map_err(|e| ServiceManagerError::Other(format!("systemctl failed: {}", e)))?;
    
    if !output.status.success() {
        let stderr = String::from_utf8_lossy(&output.stderr);
        return Err(ServiceManagerError::Other(format!("systemctl error: {}", stderr)));
    }
    
    Ok(String::from_utf8_lossy(&output.stdout).to_string())
}

⏱️ Timeline

  • Phase 1: 2 hours - Core API redesign
  • Phase 2: 3 hours - SystemdServiceManager implementation
  • Phase 3: 1 hour - Error handling & safety
  • Phase 4: 1 hour - Testing & documentation

Total Estimated Time: 7 hours

Success Criteria

  1. No Runtime Creation Anti-Pattern - Single shared runtime or proper blocking
  2. Complete Linux Support - Fully functional SystemdServiceManager
  3. No Placeholder Code - All methods have real implementations
  4. No Dangerous Patterns - No unwrap() or silent error handling
  5. Clean API - No duplicate methods, clear purpose for each method
  6. Working Documentation - All README examples compile and run
  7. Comprehensive Tests - All tests pass and cover error cases

🚀 Post-Fix Validation

  1. Compile Check: cargo check passes without warnings
  2. Test Suite: cargo test passes all tests
  3. Documentation: cargo doc generates without errors
  4. Example Validation: README examples compile and run
  5. Performance Test: No resource leaks under repeated operations

📦 Dependencies to Add

[dependencies]
once_cell = "1.19"  # For lazy static runtime

This plan ensures production readiness while maintaining the quick-fix approach requested by the team lead.