sal/service_manager/plan_to_fix.md
Mahmoud-Emad 131d978450 feat: Add service manager support
- Add a new service manager crate for dynamic service management
- Integrate service manager with Rhai for scripting
- Provide examples for circle worker management and basic usage
- Add comprehensive tests for service lifecycle and error handling
- Implement cross-platform support for macOS and Linux (zinit/systemd)
2025-07-01 18:00:21 +03:00

178 lines
6.2 KiB
Markdown

# 🔧 Service Manager Production Readiness Fix Plan
## 📋 Executive Summary
This plan addresses all critical issues found in the service manager code review to achieve production readiness. The approach prioritizes quick fixes while maintaining code quality and stability.
## 🚨 Critical Issues Identified
1. **Runtime Creation Anti-Pattern** - Creating new tokio runtimes for every operation
2. **Trait Design Inconsistency** - Sync/async method signature mismatches
3. **Placeholder SystemdServiceManager** - Incomplete Linux implementation
4. **Dangerous Patterns** - `unwrap()` calls and silent error handling
5. **API Duplication** - Confusing `run()` method that duplicates `start_and_confirm()`
6. **Dead Code** - Unused fields and methods
7. **Broken Documentation** - Non-compiling README examples
## 🎯 Solution Strategy: Fully Synchronous API
**Decision**: Redesign the API to be fully synchronous for quick fixes and better resource management.
**Rationale**:
- Eliminates runtime creation anti-pattern
- Simplifies the API surface
- Reduces complexity and potential for async-related bugs
- Easier to test and debug
- Better performance for service management operations
## 📝 Detailed Fix Plan
### Phase 1: Core API Redesign (2 hours)
#### 1.1 Fix Trait Definition
- **File**: `src/lib.rs`
- **Action**: Remove all `async` keywords from trait methods
- **Remove**: `run()` method (duplicate of `start_and_confirm()`)
- **Simplify**: Timeout handling in synchronous context
#### 1.2 Update ZinitServiceManager
- **File**: `src/zinit.rs`
- **Action**:
- Remove runtime creation anti-pattern
- Use single shared runtime or blocking operations
- Remove `execute_async_with_timeout` (dead code)
- Remove unused `socket_path` field
- Replace `unwrap()` calls with proper error handling
- Fix silent error handling in `remove()` method
#### 1.3 Update LaunchctlServiceManager
- **File**: `src/launchctl.rs`
- **Action**:
- Remove runtime creation from every method
- Use `tokio::task::block_in_place` for async operations
- Remove `run()` method duplication
- Fix silent error handling patterns
### Phase 2: Complete SystemdServiceManager (3 hours)
#### 2.1 Implement Core Functionality
- **File**: `src/systemd.rs`
- **Action**: Replace all placeholder implementations with real systemd integration
- **Methods to implement**:
- `start()` - Use `systemctl start`
- `stop()` - Use `systemctl stop`
- `restart()` - Use `systemctl restart`
- `status()` - Parse `systemctl status` output
- `logs()` - Use `journalctl` for log retrieval
- `list()` - Parse `systemctl list-units`
- `remove()` - Use `systemctl disable` and remove unit files
#### 2.2 Add Unit File Management
- **Action**: Implement systemd unit file creation and management
- **Features**:
- Generate `.service` files from `ServiceConfig`
- Handle user vs system service placement
- Proper file permissions and ownership
### Phase 3: Error Handling & Safety (1 hour)
#### 3.1 Remove Dangerous Patterns
- **Action**: Replace all `unwrap()` calls with proper error handling
- **Files**: All implementation files
- **Pattern**: `unwrap()``map_err()` with descriptive errors
#### 3.2 Fix Silent Error Handling
- **Action**: Replace `let _ = ...` patterns with proper error handling
- **Strategy**: Log errors or propagate them appropriately
### Phase 4: Testing & Documentation (1 hour)
#### 4.1 Fix README Examples
- **File**: `README.md`
- **Action**: Update all code examples to match synchronous API
- **Add**: Proper error handling examples
- **Add**: Platform-specific usage notes
#### 4.2 Update Tests
- **Action**: Ensure all tests work with synchronous API
- **Fix**: Remove async test patterns where not needed
- **Add**: Error handling test cases
## 🔧 Implementation Details
### Runtime Strategy for Async Operations
Since we're keeping the API synchronous but some underlying operations (like zinit-client) are async:
```rust
// Use a lazy static runtime for async operations
use once_cell::sync::Lazy;
use tokio::runtime::Runtime;
static ASYNC_RUNTIME: Lazy<Runtime> = Lazy::new(|| {
Runtime::new().expect("Failed to create async runtime")
});
// In methods that need async operations:
fn some_method(&self) -> Result<T, ServiceManagerError> {
ASYNC_RUNTIME.block_on(async {
// async operations here
})
}
```
### SystemdServiceManager Implementation Strategy
```rust
// Use std::process::Command for systemctl operations
fn run_systemctl(&self, args: &[&str]) -> Result<String, ServiceManagerError> {
let output = std::process::Command::new("systemctl")
.args(args)
.output()
.map_err(|e| ServiceManagerError::Other(format!("systemctl failed: {}", e)))?;
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
return Err(ServiceManagerError::Other(format!("systemctl error: {}", stderr)));
}
Ok(String::from_utf8_lossy(&output.stdout).to_string())
}
```
## ⏱️ Timeline
- **Phase 1**: 2 hours - Core API redesign
- **Phase 2**: 3 hours - SystemdServiceManager implementation
- **Phase 3**: 1 hour - Error handling & safety
- **Phase 4**: 1 hour - Testing & documentation
**Total Estimated Time**: 7 hours
## ✅ Success Criteria
1. **No Runtime Creation Anti-Pattern** - Single shared runtime or proper blocking
2. **Complete Linux Support** - Fully functional SystemdServiceManager
3. **No Placeholder Code** - All methods have real implementations
4. **No Dangerous Patterns** - No `unwrap()` or silent error handling
5. **Clean API** - No duplicate methods, clear purpose for each method
6. **Working Documentation** - All README examples compile and run
7. **Comprehensive Tests** - All tests pass and cover error cases
## 🚀 Post-Fix Validation
1. **Compile Check**: `cargo check` passes without warnings
2. **Test Suite**: `cargo test` passes all tests
3. **Documentation**: `cargo doc` generates without errors
4. **Example Validation**: README examples compile and run
5. **Performance Test**: No resource leaks under repeated operations
## 📦 Dependencies to Add
```toml
[dependencies]
once_cell = "1.19" # For lazy static runtime
```
This plan ensures production readiness while maintaining the quick-fix approach requested by the team lead.