add complete binary and benchmarking
This commit is contained in:
21
Cargo.toml
21
Cargo.toml
@@ -2,6 +2,7 @@
|
|||||||
resolver = "2"
|
resolver = "2"
|
||||||
members = [
|
members = [
|
||||||
"bin/coordinator",
|
"bin/coordinator",
|
||||||
|
"bin/horus",
|
||||||
"bin/osiris",
|
"bin/osiris",
|
||||||
"bin/runners/osiris",
|
"bin/runners/osiris",
|
||||||
"bin/runners/sal",
|
"bin/runners/sal",
|
||||||
@@ -35,6 +36,26 @@ lazy_static = { workspace = true }
|
|||||||
escargot = "0.5"
|
escargot = "0.5"
|
||||||
ctrlc = "3.4"
|
ctrlc = "3.4"
|
||||||
|
|
||||||
|
[dev-dependencies]
|
||||||
|
criterion = { version = "0.5", features = ["async_tokio", "html_reports"] }
|
||||||
|
osiris-client = { path = "lib/clients/osiris" }
|
||||||
|
reqwest = { version = "0.12", features = ["json"] }
|
||||||
|
serde_json = { workspace = true }
|
||||||
|
uuid = { workspace = true }
|
||||||
|
chrono = { workspace = true }
|
||||||
|
|
||||||
|
[[bench]]
|
||||||
|
name = "horus_stack"
|
||||||
|
harness = false
|
||||||
|
|
||||||
|
[[bench]]
|
||||||
|
name = "stress_test"
|
||||||
|
harness = false
|
||||||
|
|
||||||
|
[[bench]]
|
||||||
|
name = "memory_usage"
|
||||||
|
harness = false
|
||||||
|
|
||||||
[workspace.package]
|
[workspace.package]
|
||||||
version = "0.1.0"
|
version = "0.1.0"
|
||||||
edition = "2024"
|
edition = "2024"
|
||||||
|
|||||||
26
README.md
26
README.md
@@ -4,6 +4,17 @@ Horus is a comprehensive workspace for Hero infrastructure components.
|
|||||||
|
|
||||||
## Structure
|
## Structure
|
||||||
|
|
||||||
|
In '/bin' you have the binaries for the components of Horus
|
||||||
|
- [Supervisor](./bin/supervisor)
|
||||||
|
- [Osiris Runner](./bin/runner)
|
||||||
|
- [SAL Runner](./bin/runner)
|
||||||
|
- [Hero Runner](./bin/runner)
|
||||||
|
|
||||||
|
In '/lib' you have shared libraries.
|
||||||
|
- [Clients](./lib/clients)
|
||||||
|
|
||||||
|
## Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
horus/
|
horus/
|
||||||
├── bin/
|
├── bin/
|
||||||
@@ -15,7 +26,7 @@ horus/
|
|||||||
|
|
||||||
## Components
|
## Components
|
||||||
|
|
||||||
### Hero Supervisor (`bin/supervisor`)
|
### Supervisor (`bin/supervisor`)
|
||||||
|
|
||||||
The Hero Supervisor manages job execution across distributed runners with:
|
The Hero Supervisor manages job execution across distributed runners with:
|
||||||
- Job lifecycle management (create, start, stop, delete)
|
- Job lifecycle management (create, start, stop, delete)
|
||||||
@@ -25,6 +36,19 @@ The Hero Supervisor manages job execution across distributed runners with:
|
|||||||
- OpenRPC JSON-RPC API with authentication
|
- OpenRPC JSON-RPC API with authentication
|
||||||
- CORS-enabled HTTP server
|
- CORS-enabled HTTP server
|
||||||
|
|
||||||
|
### Coordinator (`bin/supervisor`)
|
||||||
|
|
||||||
|
The Hero Supervisor manages job execution across distributed runners with:
|
||||||
|
- Job lifecycle management (create, start, stop, delete)
|
||||||
|
- Runner registration and management
|
||||||
|
- Redis-based job queuing
|
||||||
|
- Osiris integration for persistent storage
|
||||||
|
- OpenRPC JSON-RPC API with authentication
|
||||||
|
- CORS-enabled HTTP server
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### Supervisor Client (`lib/clients/supervisor`)
|
### Supervisor Client (`lib/clients/supervisor`)
|
||||||
|
|
||||||
OpenRPC client library for Hero Supervisor with dual-target support:
|
OpenRPC client library for Hero Supervisor with dual-target support:
|
||||||
|
|||||||
217
benches/MEMORY_BENCHMARKS.md
Normal file
217
benches/MEMORY_BENCHMARKS.md
Normal file
@@ -0,0 +1,217 @@
|
|||||||
|
# Memory Usage Benchmarks
|
||||||
|
|
||||||
|
Benchmarks for measuring memory consumption of the Horus stack components.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The memory benchmarks measure heap memory usage for various operations:
|
||||||
|
- Job creation and storage
|
||||||
|
- Client instantiation
|
||||||
|
- Payload size impact
|
||||||
|
- Memory growth under load
|
||||||
|
|
||||||
|
## Benchmarks
|
||||||
|
|
||||||
|
### 1. `memory_job_creation`
|
||||||
|
Measures memory usage when creating multiple Job objects in memory.
|
||||||
|
|
||||||
|
**Test sizes**: 10, 50, 100, 200 jobs
|
||||||
|
|
||||||
|
**What it measures**:
|
||||||
|
- Memory allocated per job object
|
||||||
|
- Heap growth with increasing job count
|
||||||
|
- Memory efficiency of Job structure
|
||||||
|
|
||||||
|
**Expected results**:
|
||||||
|
- Linear memory growth with job count
|
||||||
|
- ~1-2 KB per job object (depending on payload)
|
||||||
|
|
||||||
|
### 2. `memory_client_creation`
|
||||||
|
Measures memory overhead of creating multiple Supervisor client instances.
|
||||||
|
|
||||||
|
**Test sizes**: 1, 10, 50, 100 clients
|
||||||
|
|
||||||
|
**What it measures**:
|
||||||
|
- Memory per client instance
|
||||||
|
- Connection pool overhead
|
||||||
|
- HTTP client memory footprint
|
||||||
|
|
||||||
|
**Expected results**:
|
||||||
|
- ~10-50 KB per client instance
|
||||||
|
- Includes HTTP client, connection pools, and buffers
|
||||||
|
|
||||||
|
### 3. `memory_payload_sizes`
|
||||||
|
Measures memory usage with different payload sizes.
|
||||||
|
|
||||||
|
**Test sizes**: 1KB, 10KB, 100KB, 1MB payloads
|
||||||
|
|
||||||
|
**What it measures**:
|
||||||
|
- Memory overhead of JSON serialization
|
||||||
|
- String allocation costs
|
||||||
|
- Payload storage efficiency
|
||||||
|
|
||||||
|
**Expected results**:
|
||||||
|
- Memory usage should scale linearly with payload size
|
||||||
|
- Small overhead for JSON structure (~5-10%)
|
||||||
|
|
||||||
|
## Running Memory Benchmarks
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run all memory benchmarks
|
||||||
|
cargo bench --bench memory_usage
|
||||||
|
|
||||||
|
# Run specific memory test
|
||||||
|
cargo bench --bench memory_usage -- memory_job_creation
|
||||||
|
|
||||||
|
# Run with verbose output to see memory deltas
|
||||||
|
cargo bench --bench memory_usage -- --verbose
|
||||||
|
```
|
||||||
|
|
||||||
|
## Interpreting Results
|
||||||
|
|
||||||
|
The benchmarks print memory deltas to stderr during execution:
|
||||||
|
|
||||||
|
```
|
||||||
|
Memory delta for 100 jobs: 156 KB
|
||||||
|
Memory delta for 50 clients: 2048 KB
|
||||||
|
Memory delta for 100KB payload: 105 KB
|
||||||
|
```
|
||||||
|
|
||||||
|
### Memory Delta Interpretation
|
||||||
|
|
||||||
|
- **Positive delta**: Memory was allocated during the operation
|
||||||
|
- **Zero delta**: No significant memory change (may be reusing existing allocations)
|
||||||
|
- **Negative delta**: Memory was freed (garbage collection, deallocations)
|
||||||
|
|
||||||
|
### Platform Differences
|
||||||
|
|
||||||
|
**macOS**: Uses `ps` command to read RSS (Resident Set Size)
|
||||||
|
**Linux**: Reads `/proc/self/status` for VmRSS
|
||||||
|
|
||||||
|
RSS includes:
|
||||||
|
- Heap allocations
|
||||||
|
- Stack memory
|
||||||
|
- Shared libraries (mapped into process)
|
||||||
|
- Memory-mapped files
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
1. **Granularity**: OS-level memory reporting may not capture small allocations
|
||||||
|
2. **Timing**: Memory measurements happen before/after operations, not continuously
|
||||||
|
3. **GC effects**: Rust's allocator may not immediately release memory to OS
|
||||||
|
4. **Shared memory**: RSS includes shared library memory
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### For Accurate Measurements
|
||||||
|
|
||||||
|
1. **Run multiple iterations**: Criterion handles this automatically
|
||||||
|
2. **Warm up**: First iterations may show higher memory due to lazy initialization
|
||||||
|
3. **Isolate tests**: Run memory benchmarks separately from performance benchmarks
|
||||||
|
4. **Monitor trends**: Compare results over time, not absolute values
|
||||||
|
|
||||||
|
### Memory Optimization Tips
|
||||||
|
|
||||||
|
If benchmarks show high memory usage:
|
||||||
|
|
||||||
|
1. **Check payload sizes**: Large payloads consume proportional memory
|
||||||
|
2. **Limit concurrent operations**: Too many simultaneous jobs/clients increase memory
|
||||||
|
3. **Review data structures**: Ensure efficient serialization
|
||||||
|
4. **Profile with tools**: Use `heaptrack` (Linux) or `instruments` (macOS) for detailed analysis
|
||||||
|
|
||||||
|
## Advanced Profiling
|
||||||
|
|
||||||
|
For detailed memory profiling beyond these benchmarks:
|
||||||
|
|
||||||
|
### macOS
|
||||||
|
```bash
|
||||||
|
# Use Instruments
|
||||||
|
instruments -t Allocations -D memory_trace.trace ./target/release/horus
|
||||||
|
|
||||||
|
# Use heap profiler
|
||||||
|
cargo install cargo-instruments
|
||||||
|
cargo instruments --bench memory_usage --template Allocations
|
||||||
|
```
|
||||||
|
|
||||||
|
### Linux
|
||||||
|
```bash
|
||||||
|
# Use Valgrind massif
|
||||||
|
valgrind --tool=massif --massif-out-file=massif.out \
|
||||||
|
./target/release/deps/memory_usage-*
|
||||||
|
|
||||||
|
# Visualize with massif-visualizer
|
||||||
|
massif-visualizer massif.out
|
||||||
|
|
||||||
|
# Use heaptrack
|
||||||
|
heaptrack ./target/release/deps/memory_usage-*
|
||||||
|
heaptrack_gui heaptrack.memory_usage.*.gz
|
||||||
|
```
|
||||||
|
|
||||||
|
### Cross-platform
|
||||||
|
```bash
|
||||||
|
# Use dhat (heap profiler)
|
||||||
|
cargo install dhat
|
||||||
|
# Add dhat to your benchmark and run
|
||||||
|
cargo bench --bench memory_usage --features dhat-heap
|
||||||
|
```
|
||||||
|
|
||||||
|
## Continuous Monitoring
|
||||||
|
|
||||||
|
Integrate memory benchmarks into CI/CD:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run and save baseline
|
||||||
|
cargo bench --bench memory_usage -- --save-baseline memory-main
|
||||||
|
|
||||||
|
# Compare in PR
|
||||||
|
cargo bench --bench memory_usage -- --baseline memory-main
|
||||||
|
|
||||||
|
# Fail if memory usage increases >10%
|
||||||
|
# (requires custom scripting to parse Criterion output)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### "Memory delta is always 0"
|
||||||
|
- OS may not update RSS immediately
|
||||||
|
- Allocations might be too small to measure
|
||||||
|
- Try increasing iteration count or operation size
|
||||||
|
|
||||||
|
### "Memory keeps growing"
|
||||||
|
- Check for memory leaks
|
||||||
|
- Verify objects are being dropped
|
||||||
|
- Use `cargo clippy` to find potential issues
|
||||||
|
|
||||||
|
### "Results are inconsistent"
|
||||||
|
- Other processes may be affecting measurements
|
||||||
|
- Run benchmarks on idle system
|
||||||
|
- Increase sample size in benchmark code
|
||||||
|
|
||||||
|
## Example Output
|
||||||
|
|
||||||
|
```
|
||||||
|
memory_job_creation/10 time: [45.2 µs 46.1 µs 47.3 µs]
|
||||||
|
Memory delta for 10 jobs: 24 KB
|
||||||
|
|
||||||
|
memory_job_creation/50 time: [198.4 µs 201.2 µs 204.8 µs]
|
||||||
|
Memory delta for 50 jobs: 98 KB
|
||||||
|
|
||||||
|
memory_job_creation/100 time: [387.6 µs 392.1 µs 397.4 µs]
|
||||||
|
Memory delta for 100 jobs: 187 KB
|
||||||
|
|
||||||
|
memory_client_creation/1 time: [234.5 µs 238.2 µs 242.6 µs]
|
||||||
|
Memory delta for 1 clients: 45 KB
|
||||||
|
|
||||||
|
memory_payload_sizes/1KB time: [12.3 µs 12.6 µs 13.0 µs]
|
||||||
|
Memory delta for 1KB payload: 2 KB
|
||||||
|
|
||||||
|
memory_payload_sizes/100KB time: [156.7 µs 159.4 µs 162.8 µs]
|
||||||
|
Memory delta for 100KB payload: 105 KB
|
||||||
|
```
|
||||||
|
|
||||||
|
## Related Documentation
|
||||||
|
|
||||||
|
- [Performance Benchmarks](./README.md)
|
||||||
|
- [Stress Tests](./README.md#stress-tests)
|
||||||
|
- [Rust Performance Book](https://nnethercote.github.io/perf-book/)
|
||||||
|
- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)
|
||||||
129
benches/QUICK_START.md
Normal file
129
benches/QUICK_START.md
Normal file
@@ -0,0 +1,129 @@
|
|||||||
|
# Horus Benchmarks - Quick Start
|
||||||
|
|
||||||
|
## 1. Start the Stack
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Terminal 1: Start Redis
|
||||||
|
redis-server
|
||||||
|
|
||||||
|
# Terminal 2: Start Horus
|
||||||
|
cd /Users/timurgordon/code/git.ourworld.tf/herocode/horus
|
||||||
|
RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports
|
||||||
|
```
|
||||||
|
|
||||||
|
## 2. Run Benchmarks
|
||||||
|
|
||||||
|
### Option A: Use the helper script (recommended)
|
||||||
|
```bash
|
||||||
|
./benches/run_benchmarks.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option B: Run directly with cargo
|
||||||
|
```bash
|
||||||
|
# All benchmarks
|
||||||
|
cargo bench
|
||||||
|
|
||||||
|
# Specific benchmark suite
|
||||||
|
cargo bench --bench horus_stack
|
||||||
|
cargo bench --bench stress_test
|
||||||
|
|
||||||
|
# Specific test
|
||||||
|
cargo bench --bench horus_stack -- supervisor_discovery
|
||||||
|
|
||||||
|
# Quick run (fewer samples)
|
||||||
|
cargo bench -- --quick
|
||||||
|
```
|
||||||
|
|
||||||
|
## 3. View Results
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Open HTML report in browser
|
||||||
|
open target/criterion/report/index.html
|
||||||
|
|
||||||
|
# Or on Linux
|
||||||
|
xdg-open target/criterion/report/index.html
|
||||||
|
```
|
||||||
|
|
||||||
|
## Available Benchmark Suites
|
||||||
|
|
||||||
|
### `horus_stack` - Standard Performance Tests
|
||||||
|
- API discovery and metadata
|
||||||
|
- Runner management
|
||||||
|
- Job operations
|
||||||
|
- Concurrency tests
|
||||||
|
- Health checks
|
||||||
|
- API latency measurements
|
||||||
|
|
||||||
|
### `stress_test` - Load & Stress Tests
|
||||||
|
- High-frequency job submissions (50-200 jobs)
|
||||||
|
- Sustained load testing
|
||||||
|
- Large payload handling (1KB-100KB)
|
||||||
|
- Rapid API calls (100 calls/test)
|
||||||
|
- Mixed workload scenarios
|
||||||
|
- Connection pool exhaustion (10-100 clients)
|
||||||
|
|
||||||
|
### `memory_usage` - Memory Profiling
|
||||||
|
- Job object memory footprint (10-200 jobs)
|
||||||
|
- Client instance memory overhead (1-100 clients)
|
||||||
|
- Payload size impact on memory (1KB-1MB)
|
||||||
|
- Memory growth patterns under load
|
||||||
|
|
||||||
|
## Common Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run only fast benchmarks
|
||||||
|
cargo bench -- --quick
|
||||||
|
|
||||||
|
# Save baseline for comparison
|
||||||
|
cargo bench -- --save-baseline main
|
||||||
|
|
||||||
|
# Compare against baseline
|
||||||
|
cargo bench -- --baseline main
|
||||||
|
|
||||||
|
# Run with verbose output
|
||||||
|
cargo bench -- --verbose
|
||||||
|
|
||||||
|
# Filter by name
|
||||||
|
cargo bench -- concurrent
|
||||||
|
cargo bench -- stress
|
||||||
|
|
||||||
|
# Run specific benchmark group
|
||||||
|
cargo bench --bench horus_stack -- api_latency
|
||||||
|
|
||||||
|
# Run memory benchmarks
|
||||||
|
cargo bench --bench memory_usage
|
||||||
|
|
||||||
|
# Run memory benchmarks with verbose output (shows memory deltas)
|
||||||
|
cargo bench --bench memory_usage -- --verbose
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
**"Connection refused"**
|
||||||
|
- Make sure Horus stack is running
|
||||||
|
- Check ports: 3030 (supervisor), 8081 (osiris), 9652/9653 (coordinator)
|
||||||
|
|
||||||
|
**"Job timeout"**
|
||||||
|
- Increase timeout in benchmark code
|
||||||
|
- Check that runners are registered: `curl http://127.0.0.1:3030` (requires POST)
|
||||||
|
|
||||||
|
**Slow benchmarks**
|
||||||
|
- Close other applications
|
||||||
|
- Use `--quick` flag for faster runs
|
||||||
|
- Reduce sample size in benchmark code
|
||||||
|
|
||||||
|
## Performance Expectations
|
||||||
|
|
||||||
|
| Test | Expected Time |
|
||||||
|
|------|---------------|
|
||||||
|
| supervisor_discovery | < 10ms |
|
||||||
|
| supervisor_get_info | < 5ms |
|
||||||
|
| job_full_lifecycle | < 100ms |
|
||||||
|
| concurrent_jobs (10) | < 500ms |
|
||||||
|
| stress_high_frequency (50) | < 2s |
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- See `benches/README.md` for detailed documentation
|
||||||
|
- Modify `benches/horus_stack.rs` to add custom tests
|
||||||
|
- Check `target/criterion/` for detailed reports
|
||||||
206
benches/README.md
Normal file
206
benches/README.md
Normal file
@@ -0,0 +1,206 @@
|
|||||||
|
# Horus Stack Benchmarks
|
||||||
|
|
||||||
|
Comprehensive benchmark suite for the entire Horus stack, testing performance through the client APIs.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
These benchmarks test the full Horus system including:
|
||||||
|
- **Supervisor API** - Job management, runner coordination
|
||||||
|
- **Coordinator API** - Job routing and execution
|
||||||
|
- **Osiris API** - REST API for data queries
|
||||||
|
|
||||||
|
All benchmarks interact with the stack through the official client libraries in `/lib/clients`, which is the only supported way to interact with the system.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
Before running benchmarks, you must have the Horus stack running:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start Redis
|
||||||
|
redis-server
|
||||||
|
|
||||||
|
# Start all Horus services
|
||||||
|
cd /Users/timurgordon/code/git.ourworld.tf/herocode/horus
|
||||||
|
RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports
|
||||||
|
```
|
||||||
|
|
||||||
|
The benchmarks expect:
|
||||||
|
- **Supervisor** running on `http://127.0.0.1:3030`
|
||||||
|
- **Coordinator** running on `http://127.0.0.1:9652` (HTTP) and `ws://127.0.0.1:9653` (WebSocket)
|
||||||
|
- **Osiris** running on `http://127.0.0.1:8081`
|
||||||
|
- **Redis** running on `127.0.0.1:6379`
|
||||||
|
- Admin secret: `SECRET`
|
||||||
|
|
||||||
|
## Running Benchmarks
|
||||||
|
|
||||||
|
### Run all benchmarks
|
||||||
|
```bash
|
||||||
|
cargo bench --bench horus_stack
|
||||||
|
```
|
||||||
|
|
||||||
|
### Run specific benchmark
|
||||||
|
```bash
|
||||||
|
cargo bench --bench horus_stack -- supervisor_discovery
|
||||||
|
```
|
||||||
|
|
||||||
|
### Run with specific filter
|
||||||
|
```bash
|
||||||
|
cargo bench --bench horus_stack -- concurrent
|
||||||
|
```
|
||||||
|
|
||||||
|
### Generate detailed reports
|
||||||
|
```bash
|
||||||
|
cargo bench --bench horus_stack -- --verbose
|
||||||
|
```
|
||||||
|
|
||||||
|
## Benchmark Categories
|
||||||
|
|
||||||
|
### 1. API Discovery & Metadata (`horus_stack`)
|
||||||
|
- `supervisor_discovery` - OpenRPC metadata retrieval
|
||||||
|
- `supervisor_get_info` - Supervisor information and stats
|
||||||
|
|
||||||
|
### 2. Runner Management (`horus_stack`)
|
||||||
|
- `supervisor_list_runners` - List all registered runners
|
||||||
|
- `get_all_runner_status` - Get status of all runners
|
||||||
|
|
||||||
|
### 3. Job Operations (`horus_stack`)
|
||||||
|
- `supervisor_job_create` - Create job without execution
|
||||||
|
- `supervisor_job_list` - List all jobs
|
||||||
|
- `job_full_lifecycle` - Complete job lifecycle (create → execute → result)
|
||||||
|
|
||||||
|
### 4. Concurrency Tests (`horus_stack`)
|
||||||
|
- `concurrent_jobs` - Submit multiple jobs concurrently (1, 5, 10, 20 jobs)
|
||||||
|
|
||||||
|
### 5. Health & Monitoring (`horus_stack`)
|
||||||
|
- `osiris_health_check` - Osiris server health endpoint
|
||||||
|
|
||||||
|
### 6. API Latency (`horus_stack`)
|
||||||
|
- `api_latency/supervisor_info` - Supervisor info latency
|
||||||
|
- `api_latency/runner_list` - Runner list latency
|
||||||
|
- `api_latency/job_list` - Job list latency
|
||||||
|
|
||||||
|
### 7. Stress Tests (`stress_test`)
|
||||||
|
- `stress_high_frequency_jobs` - High-frequency submissions (50-200 jobs)
|
||||||
|
- `stress_sustained_load` - Continuous load testing
|
||||||
|
- `stress_large_payloads` - Large payload handling (1KB-100KB)
|
||||||
|
- `stress_rapid_api_calls` - Rapid API calls (100 calls/iteration)
|
||||||
|
- `stress_mixed_workload` - Mixed operation scenarios
|
||||||
|
- `stress_connection_pool` - Connection pool exhaustion (10-100 clients)
|
||||||
|
|
||||||
|
### 8. Memory Usage (`memory_usage`)
|
||||||
|
- `memory_job_creation` - Memory per job object (10-200 jobs)
|
||||||
|
- `memory_client_creation` - Memory per client instance (1-100 clients)
|
||||||
|
- `memory_payload_sizes` - Memory vs payload size (1KB-1MB)
|
||||||
|
|
||||||
|
See [MEMORY_BENCHMARKS.md](./MEMORY_BENCHMARKS.md) for detailed memory profiling documentation.
|
||||||
|
|
||||||
|
## Interpreting Results
|
||||||
|
|
||||||
|
Criterion outputs detailed statistics including:
|
||||||
|
- **Mean time** - Average execution time
|
||||||
|
- **Std deviation** - Variability in measurements
|
||||||
|
- **Median** - Middle value (50th percentile)
|
||||||
|
- **MAD** - Median Absolute Deviation
|
||||||
|
- **Throughput** - Operations per second
|
||||||
|
|
||||||
|
Results are saved in `target/criterion/` with:
|
||||||
|
- HTML reports with graphs
|
||||||
|
- JSON data for further analysis
|
||||||
|
- Historical comparison with previous runs
|
||||||
|
|
||||||
|
## Performance Targets
|
||||||
|
|
||||||
|
Expected performance (on modern hardware):
|
||||||
|
|
||||||
|
| Benchmark | Target | Notes |
|
||||||
|
|-----------|--------|-------|
|
||||||
|
| supervisor_discovery | < 10ms | Metadata retrieval |
|
||||||
|
| supervisor_get_info | < 5ms | Simple info query |
|
||||||
|
| supervisor_list_runners | < 5ms | List operation |
|
||||||
|
| supervisor_job_create | < 10ms | Job creation only |
|
||||||
|
| job_full_lifecycle | < 100ms | Full execution cycle |
|
||||||
|
| osiris_health_check | < 2ms | Health endpoint |
|
||||||
|
| concurrent_jobs (10) | < 500ms | 10 parallel jobs |
|
||||||
|
|
||||||
|
## Customization
|
||||||
|
|
||||||
|
To modify benchmark parameters, edit `benches/horus_stack.rs`:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Change URLs
|
||||||
|
const SUPERVISOR_URL: &str = "http://127.0.0.1:3030";
|
||||||
|
const OSIRIS_URL: &str = "http://127.0.0.1:8081";
|
||||||
|
|
||||||
|
// Change admin secret
|
||||||
|
const ADMIN_SECRET: &str = "SECRET";
|
||||||
|
|
||||||
|
// Adjust concurrent job counts
|
||||||
|
for num_jobs in [1, 5, 10, 20, 50].iter() {
|
||||||
|
// ...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## CI/CD Integration
|
||||||
|
|
||||||
|
To run benchmarks in CI without the full stack:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run only fast benchmarks
|
||||||
|
cargo bench --bench horus_stack -- --quick
|
||||||
|
|
||||||
|
# Save baseline for comparison
|
||||||
|
cargo bench --bench horus_stack -- --save-baseline main
|
||||||
|
|
||||||
|
# Compare against baseline
|
||||||
|
cargo bench --bench horus_stack -- --baseline main
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### "Connection refused" errors
|
||||||
|
- Ensure the Horus stack is running
|
||||||
|
- Check that all services are listening on expected ports
|
||||||
|
- Verify firewall settings
|
||||||
|
|
||||||
|
### "Job execution timeout" errors
|
||||||
|
- Increase timeout values in benchmark code
|
||||||
|
- Check that runners are properly registered
|
||||||
|
- Verify Redis is accessible
|
||||||
|
|
||||||
|
### Inconsistent results
|
||||||
|
- Close other applications to reduce system load
|
||||||
|
- Run benchmarks multiple times for statistical significance
|
||||||
|
- Use `--warm-up-time` flag to increase warm-up period
|
||||||
|
|
||||||
|
## Adding New Benchmarks
|
||||||
|
|
||||||
|
To add a new benchmark:
|
||||||
|
|
||||||
|
1. Create a new function in `benches/horus_stack.rs`:
|
||||||
|
```rust
|
||||||
|
fn bench_my_feature(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
let client = /* create client */;
|
||||||
|
|
||||||
|
c.bench_function("my_feature", |b| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
// Your benchmark code
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Add to the criterion_group:
|
||||||
|
```rust
|
||||||
|
criterion_group!(
|
||||||
|
benches,
|
||||||
|
// ... existing benchmarks
|
||||||
|
bench_my_feature,
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
## Resources
|
||||||
|
|
||||||
|
- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)
|
||||||
|
- [Horus Client Documentation](../lib/clients/)
|
||||||
|
- [Performance Tuning Guide](../docs/performance.md)
|
||||||
195
benches/SUMMARY.md
Normal file
195
benches/SUMMARY.md
Normal file
@@ -0,0 +1,195 @@
|
|||||||
|
# Horus Stack Benchmarks - Summary
|
||||||
|
|
||||||
|
## ✅ Created Comprehensive Benchmark Suite
|
||||||
|
|
||||||
|
Successfully created a complete benchmark suite for the Horus stack that tests the entire system through the official client APIs.
|
||||||
|
|
||||||
|
### Files Created
|
||||||
|
|
||||||
|
1. **`benches/horus_stack.rs`** - Main benchmark suite
|
||||||
|
- API discovery and metadata retrieval
|
||||||
|
- Runner management operations
|
||||||
|
- Job lifecycle testing
|
||||||
|
- Concurrent job submissions (1, 5, 10, 20 jobs)
|
||||||
|
- Health checks
|
||||||
|
- API latency measurements
|
||||||
|
|
||||||
|
2. **`benches/stress_test.rs`** - Stress and load testing
|
||||||
|
- High-frequency job submissions (50-200 jobs)
|
||||||
|
- Sustained load testing
|
||||||
|
- Large payload handling (1KB-100KB)
|
||||||
|
- Rapid API calls (100 calls/iteration)
|
||||||
|
- Mixed workload scenarios
|
||||||
|
- Connection pool exhaustion tests (10-100 clients)
|
||||||
|
|
||||||
|
3. **`benches/memory_usage.rs`** - Memory profiling
|
||||||
|
- Job object memory footprint (10-200 jobs)
|
||||||
|
- Client instance memory overhead (1-100 clients)
|
||||||
|
- Payload size impact on memory (1KB-1MB)
|
||||||
|
- Real-time memory delta reporting
|
||||||
|
|
||||||
|
4. **`benches/README.md`** - Comprehensive documentation
|
||||||
|
- Setup instructions
|
||||||
|
- Benchmark descriptions
|
||||||
|
- Performance targets
|
||||||
|
- Customization guide
|
||||||
|
- Troubleshooting tips
|
||||||
|
|
||||||
|
5. **`benches/QUICK_START.md`** - Quick reference guide
|
||||||
|
- Fast setup steps
|
||||||
|
- Common commands
|
||||||
|
- Expected performance metrics
|
||||||
|
|
||||||
|
6. **`benches/MEMORY_BENCHMARKS.md`** - Memory profiling guide
|
||||||
|
- Memory benchmark descriptions
|
||||||
|
- Platform-specific measurement details
|
||||||
|
- Advanced profiling tools
|
||||||
|
- Memory optimization tips
|
||||||
|
|
||||||
|
7. **`benches/run_benchmarks.sh`** - Helper script
|
||||||
|
- Automated prerequisite checking
|
||||||
|
- Service health verification
|
||||||
|
- One-command benchmark execution
|
||||||
|
|
||||||
|
### Architecture
|
||||||
|
|
||||||
|
The benchmarks interact with the Horus stack exclusively through the client libraries:
|
||||||
|
|
||||||
|
- **`hero-supervisor-openrpc-client`** - Supervisor API (job management, runner coordination)
|
||||||
|
- **`osiris-client`** - Osiris REST API (data queries)
|
||||||
|
- **`hero-job`** - Job model definitions
|
||||||
|
|
||||||
|
This ensures benchmarks test the real-world API surface that users interact with.
|
||||||
|
|
||||||
|
### Key Features
|
||||||
|
|
||||||
|
✅ **Async/await support** - Uses Criterion's async_tokio feature
|
||||||
|
✅ **Realistic workloads** - Tests actual job submission and execution
|
||||||
|
✅ **Concurrent testing** - Measures performance under parallel load
|
||||||
|
✅ **Stress testing** - Pushes system limits with high-frequency operations
|
||||||
|
✅ **HTML reports** - Beautiful visualizations with historical comparison
|
||||||
|
✅ **Automated checks** - Helper script verifies stack is running
|
||||||
|
|
||||||
|
### Benchmark Categories
|
||||||
|
|
||||||
|
#### Performance Benchmarks (`horus_stack`)
|
||||||
|
- `supervisor_discovery` - OpenRPC metadata (target: <10ms)
|
||||||
|
- `supervisor_get_info` - Info retrieval (target: <5ms)
|
||||||
|
- `supervisor_list_runners` - List operations (target: <5ms)
|
||||||
|
- `supervisor_job_create` - Job creation (target: <10ms)
|
||||||
|
- `supervisor_job_list` - Job listing (target: <10ms)
|
||||||
|
- `osiris_health_check` - Health endpoint (target: <2ms)
|
||||||
|
- `job_full_lifecycle` - Complete job cycle (target: <100ms)
|
||||||
|
- `concurrent_jobs` - Parallel submissions (target: <500ms for 10 jobs)
|
||||||
|
- `get_all_runner_status` - Status queries
|
||||||
|
- `api_latency/*` - Detailed latency measurements
|
||||||
|
|
||||||
|
#### Stress Tests (`stress_test`)
|
||||||
|
- `stress_high_frequency_jobs` - 50-200 concurrent jobs
|
||||||
|
- `stress_sustained_load` - Continuous submissions over time
|
||||||
|
- `stress_large_payloads` - 1KB-100KB payload handling
|
||||||
|
- `stress_rapid_api_calls` - 100 rapid calls per iteration
|
||||||
|
- `stress_mixed_workload` - Combined operations
|
||||||
|
- `stress_connection_pool` - 10-100 concurrent clients
|
||||||
|
|
||||||
|
#### Memory Profiling (`memory_usage`)
|
||||||
|
- `memory_job_creation` - Memory footprint per job (10-200 jobs)
|
||||||
|
- `memory_client_creation` - Memory per client instance (1-100 clients)
|
||||||
|
- `memory_payload_sizes` - Memory vs payload size (1KB-1MB)
|
||||||
|
- Reports memory deltas in real-time during execution
|
||||||
|
|
||||||
|
### Usage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Quick start
|
||||||
|
./benches/run_benchmarks.sh
|
||||||
|
|
||||||
|
# Run specific suite
|
||||||
|
cargo bench --bench horus_stack
|
||||||
|
cargo bench --bench stress_test
|
||||||
|
cargo bench --bench memory_usage
|
||||||
|
|
||||||
|
# Run specific test
|
||||||
|
cargo bench -- supervisor_discovery
|
||||||
|
|
||||||
|
# Run memory benchmarks with verbose output (shows memory deltas)
|
||||||
|
cargo bench --bench memory_usage -- --verbose
|
||||||
|
|
||||||
|
# Save baseline
|
||||||
|
cargo bench -- --save-baseline main
|
||||||
|
|
||||||
|
# Compare against baseline
|
||||||
|
cargo bench -- --baseline main
|
||||||
|
```
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
The benchmarks require the full Horus stack to be running:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start Redis
|
||||||
|
redis-server
|
||||||
|
|
||||||
|
# Start Horus (with auto port cleanup)
|
||||||
|
RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports
|
||||||
|
```
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
All benchmarks use these defaults (configurable in source):
|
||||||
|
- Supervisor: `http://127.0.0.1:3030`
|
||||||
|
- Osiris: `http://127.0.0.1:8081`
|
||||||
|
- Coordinator HTTP: `http://127.0.0.1:9652`
|
||||||
|
- Coordinator WS: `ws://127.0.0.1:9653`
|
||||||
|
- Admin secret: `SECRET`
|
||||||
|
|
||||||
|
### Results
|
||||||
|
|
||||||
|
Results are saved to `target/criterion/` with:
|
||||||
|
- HTML reports with graphs and statistics
|
||||||
|
- JSON data for programmatic analysis
|
||||||
|
- Historical comparison with previous runs
|
||||||
|
- Detailed performance metrics (mean, median, std dev, throughput)
|
||||||
|
|
||||||
|
### Integration
|
||||||
|
|
||||||
|
The benchmarks are integrated into the workspace:
|
||||||
|
- Added to `Cargo.toml` with proper dependencies
|
||||||
|
- Uses workspace-level dependencies for consistency
|
||||||
|
- Configured with `harness = false` for Criterion
|
||||||
|
- Includes all necessary dev-dependencies
|
||||||
|
|
||||||
|
### Next Steps
|
||||||
|
|
||||||
|
1. Run benchmarks to establish baseline performance
|
||||||
|
2. Monitor performance over time as code changes
|
||||||
|
3. Use stress tests to identify bottlenecks
|
||||||
|
4. Customize benchmarks for specific use cases
|
||||||
|
5. Integrate into CI/CD for automated performance tracking
|
||||||
|
|
||||||
|
## Technical Details
|
||||||
|
|
||||||
|
### Dependencies Added
|
||||||
|
- `criterion` v0.5 with async_tokio and html_reports features
|
||||||
|
- `osiris-client` from workspace
|
||||||
|
- `reqwest` v0.12 with json feature
|
||||||
|
- `serde_json`, `uuid`, `chrono` from workspace
|
||||||
|
|
||||||
|
### Benchmark Harness
|
||||||
|
Uses Criterion.rs for:
|
||||||
|
- Statistical analysis
|
||||||
|
- Historical comparison
|
||||||
|
- HTML report generation
|
||||||
|
- Configurable sample sizes
|
||||||
|
- Warm-up periods
|
||||||
|
- Outlier detection
|
||||||
|
|
||||||
|
### Job Creation
|
||||||
|
Helper function `create_test_job()` creates properly structured Job instances:
|
||||||
|
- Unique UUIDs for each job
|
||||||
|
- Proper timestamps
|
||||||
|
- JSON-serialized payloads
|
||||||
|
- Empty signatures (for testing)
|
||||||
|
- Configurable runner and command
|
||||||
|
|
||||||
|
This ensures benchmarks test realistic job structures that match production usage.
|
||||||
324
benches/horus_stack.rs
Normal file
324
benches/horus_stack.rs
Normal file
@@ -0,0 +1,324 @@
|
|||||||
|
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
|
||||||
|
use hero_supervisor_openrpc_client::SupervisorClientBuilder;
|
||||||
|
use hero_job::Job;
|
||||||
|
use tokio::runtime::Runtime;
|
||||||
|
use std::time::Duration;
|
||||||
|
use std::collections::HashMap;
|
||||||
|
use uuid::Uuid;
|
||||||
|
use chrono::Utc;
|
||||||
|
|
||||||
|
/// Benchmark configuration
|
||||||
|
const SUPERVISOR_URL: &str = "http://127.0.0.1:3030";
|
||||||
|
const OSIRIS_URL: &str = "http://127.0.0.1:8081";
|
||||||
|
const ADMIN_SECRET: &str = "SECRET";
|
||||||
|
|
||||||
|
/// Helper to create a tokio runtime for benchmarks
|
||||||
|
fn create_runtime() -> Runtime {
|
||||||
|
Runtime::new().unwrap()
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Helper to create a test job
|
||||||
|
fn create_test_job(runner: &str, command: &str, args: Vec<String>) -> Job {
|
||||||
|
Job {
|
||||||
|
id: Uuid::new_v4().to_string(),
|
||||||
|
caller_id: "benchmark".to_string(),
|
||||||
|
context_id: "test".to_string(),
|
||||||
|
payload: serde_json::json!({
|
||||||
|
"command": command,
|
||||||
|
"args": args
|
||||||
|
}).to_string(),
|
||||||
|
runner: runner.to_string(),
|
||||||
|
timeout: 30,
|
||||||
|
env_vars: HashMap::new(),
|
||||||
|
created_at: Utc::now(),
|
||||||
|
updated_at: Utc::now(),
|
||||||
|
signatures: vec![],
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Benchmark: Supervisor discovery (OpenRPC metadata)
|
||||||
|
fn bench_supervisor_discovery(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
|
||||||
|
let client = rt.block_on(async {
|
||||||
|
SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create supervisor client")
|
||||||
|
});
|
||||||
|
|
||||||
|
c.bench_function("supervisor_discovery", |b| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
black_box(client.discover().await.expect("Discovery failed"))
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Benchmark: Supervisor info retrieval
|
||||||
|
fn bench_supervisor_info(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
|
||||||
|
let client = rt.block_on(async {
|
||||||
|
SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create supervisor client")
|
||||||
|
});
|
||||||
|
|
||||||
|
c.bench_function("supervisor_get_info", |b| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
black_box(client.get_supervisor_info().await.expect("Get info failed"))
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Benchmark: List runners
|
||||||
|
fn bench_list_runners(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
|
||||||
|
let client = rt.block_on(async {
|
||||||
|
SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create supervisor client")
|
||||||
|
});
|
||||||
|
|
||||||
|
c.bench_function("supervisor_list_runners", |b| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
black_box(client.runner_list().await.expect("List runners failed"))
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Benchmark: Job creation (without execution)
|
||||||
|
fn bench_job_create(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
|
||||||
|
let client = rt.block_on(async {
|
||||||
|
SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create supervisor client")
|
||||||
|
});
|
||||||
|
|
||||||
|
// Ensure runner exists
|
||||||
|
rt.block_on(async {
|
||||||
|
let _ = client.runner_create("hero").await;
|
||||||
|
});
|
||||||
|
|
||||||
|
c.bench_function("supervisor_job_create", |b| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
let job = create_test_job("hero", "echo", vec!["hello".to_string()]);
|
||||||
|
black_box(client.job_create(job).await.expect("Job create failed"))
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Benchmark: Job listing
|
||||||
|
fn bench_job_list(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
|
||||||
|
let client = rt.block_on(async {
|
||||||
|
SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create supervisor client")
|
||||||
|
});
|
||||||
|
|
||||||
|
c.bench_function("supervisor_job_list", |b| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
black_box(client.job_list().await.expect("Job list failed"))
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Benchmark: Osiris health check
|
||||||
|
fn bench_osiris_health(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
let client = reqwest::Client::new();
|
||||||
|
|
||||||
|
c.bench_function("osiris_health_check", |b| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
let url = format!("{}/health", OSIRIS_URL);
|
||||||
|
black_box(
|
||||||
|
client
|
||||||
|
.get(&url)
|
||||||
|
.send()
|
||||||
|
.await
|
||||||
|
.expect("Health check failed")
|
||||||
|
.json::<serde_json::Value>()
|
||||||
|
.await
|
||||||
|
.expect("JSON parse failed")
|
||||||
|
)
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Benchmark: Full job lifecycle (create, start, wait for result)
|
||||||
|
fn bench_job_lifecycle(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
|
||||||
|
let client = rt.block_on(async {
|
||||||
|
SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.timeout(Duration::from_secs(60))
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create supervisor client")
|
||||||
|
});
|
||||||
|
|
||||||
|
// First ensure we have a runner registered
|
||||||
|
rt.block_on(async {
|
||||||
|
let _ = client.runner_create("hero").await;
|
||||||
|
});
|
||||||
|
|
||||||
|
c.bench_function("job_full_lifecycle", |b| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
let job = create_test_job("hero", "echo", vec!["benchmark_test".to_string()]);
|
||||||
|
|
||||||
|
// Start job and wait for result
|
||||||
|
black_box(
|
||||||
|
client
|
||||||
|
.job_run(job, Some(30))
|
||||||
|
.await
|
||||||
|
.expect("Job run failed")
|
||||||
|
)
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Benchmark: Concurrent job submissions
|
||||||
|
fn bench_concurrent_jobs(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
|
||||||
|
let client = rt.block_on(async {
|
||||||
|
SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.timeout(Duration::from_secs(60))
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create supervisor client")
|
||||||
|
});
|
||||||
|
|
||||||
|
// Ensure runner is registered
|
||||||
|
rt.block_on(async {
|
||||||
|
let _ = client.runner_create("hero").await;
|
||||||
|
});
|
||||||
|
|
||||||
|
let mut group = c.benchmark_group("concurrent_jobs");
|
||||||
|
|
||||||
|
for num_jobs in [1, 5, 10, 20].iter() {
|
||||||
|
group.bench_with_input(
|
||||||
|
BenchmarkId::from_parameter(num_jobs),
|
||||||
|
num_jobs,
|
||||||
|
|b, &num_jobs| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
let mut handles = vec![];
|
||||||
|
|
||||||
|
for i in 0..num_jobs {
|
||||||
|
let client = client.clone();
|
||||||
|
let handle = tokio::spawn(async move {
|
||||||
|
let job = create_test_job("hero", "echo", vec![format!("job_{}", i)]);
|
||||||
|
client.job_create(job).await
|
||||||
|
});
|
||||||
|
handles.push(handle);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait for all jobs to be submitted
|
||||||
|
for handle in handles {
|
||||||
|
black_box(handle.await.expect("Task failed").expect("Job start failed"));
|
||||||
|
}
|
||||||
|
});
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
group.finish();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Benchmark: Runner status checks
|
||||||
|
fn bench_runner_status(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
|
||||||
|
let client = rt.block_on(async {
|
||||||
|
SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create supervisor client")
|
||||||
|
});
|
||||||
|
|
||||||
|
// Ensure we have runners
|
||||||
|
rt.block_on(async {
|
||||||
|
let _ = client.runner_create("hero").await;
|
||||||
|
let _ = client.runner_create("osiris").await;
|
||||||
|
});
|
||||||
|
|
||||||
|
c.bench_function("get_all_runner_status", |b| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
black_box(
|
||||||
|
client
|
||||||
|
.get_all_runner_status()
|
||||||
|
.await
|
||||||
|
.expect("Get status failed")
|
||||||
|
)
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Benchmark: API response time under load
|
||||||
|
fn bench_api_latency(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
|
||||||
|
let client = rt.block_on(async {
|
||||||
|
SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create supervisor client")
|
||||||
|
});
|
||||||
|
|
||||||
|
let mut group = c.benchmark_group("api_latency");
|
||||||
|
group.measurement_time(Duration::from_secs(10));
|
||||||
|
|
||||||
|
group.bench_function("supervisor_info", |b| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
black_box(client.get_supervisor_info().await.expect("Failed"))
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
group.bench_function("runner_list", |b| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
black_box(client.runner_list().await.expect("Failed"))
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
group.bench_function("job_list", |b| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
black_box(client.job_list().await.expect("Failed"))
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
group.finish();
|
||||||
|
}
|
||||||
|
|
||||||
|
criterion_group!(
|
||||||
|
benches,
|
||||||
|
bench_supervisor_discovery,
|
||||||
|
bench_supervisor_info,
|
||||||
|
bench_list_runners,
|
||||||
|
bench_job_create,
|
||||||
|
bench_job_list,
|
||||||
|
bench_osiris_health,
|
||||||
|
bench_job_lifecycle,
|
||||||
|
bench_concurrent_jobs,
|
||||||
|
bench_runner_status,
|
||||||
|
bench_api_latency,
|
||||||
|
);
|
||||||
|
|
||||||
|
criterion_main!(benches);
|
||||||
210
benches/memory_usage.rs
Normal file
210
benches/memory_usage.rs
Normal file
@@ -0,0 +1,210 @@
|
|||||||
|
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
|
||||||
|
use hero_supervisor_openrpc_client::SupervisorClientBuilder;
|
||||||
|
use hero_job::Job;
|
||||||
|
use tokio::runtime::Runtime;
|
||||||
|
use std::time::Duration;
|
||||||
|
use std::collections::HashMap;
|
||||||
|
use uuid::Uuid;
|
||||||
|
use chrono::Utc;
|
||||||
|
|
||||||
|
const SUPERVISOR_URL: &str = "http://127.0.0.1:3030";
|
||||||
|
const ADMIN_SECRET: &str = "SECRET";
|
||||||
|
|
||||||
|
fn create_runtime() -> Runtime {
|
||||||
|
Runtime::new().unwrap()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_test_job(runner: &str, command: &str, args: Vec<String>) -> Job {
|
||||||
|
Job {
|
||||||
|
id: Uuid::new_v4().to_string(),
|
||||||
|
caller_id: "benchmark".to_string(),
|
||||||
|
context_id: "test".to_string(),
|
||||||
|
payload: serde_json::json!({
|
||||||
|
"command": command,
|
||||||
|
"args": args
|
||||||
|
}).to_string(),
|
||||||
|
runner: runner.to_string(),
|
||||||
|
timeout: 30,
|
||||||
|
env_vars: HashMap::new(),
|
||||||
|
created_at: Utc::now(),
|
||||||
|
updated_at: Utc::now(),
|
||||||
|
signatures: vec![],
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(target_os = "macos")]
|
||||||
|
fn get_memory_usage() -> Option<usize> {
|
||||||
|
use std::process::Command;
|
||||||
|
let output = Command::new("ps")
|
||||||
|
.args(&["-o", "rss=", "-p", &std::process::id().to_string()])
|
||||||
|
.output()
|
||||||
|
.ok()?;
|
||||||
|
String::from_utf8(output.stdout)
|
||||||
|
.ok()?
|
||||||
|
.trim()
|
||||||
|
.parse::<usize>()
|
||||||
|
.ok()
|
||||||
|
.map(|kb| kb * 1024)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(target_os = "linux")]
|
||||||
|
fn get_memory_usage() -> Option<usize> {
|
||||||
|
use std::fs;
|
||||||
|
let status = fs::read_to_string("/proc/self/status").ok()?;
|
||||||
|
for line in status.lines() {
|
||||||
|
if line.starts_with("VmRSS:") {
|
||||||
|
let kb = line.split_whitespace().nth(1)?.parse::<usize>().ok()?;
|
||||||
|
return Some(kb * 1024);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
None
|
||||||
|
}
|
||||||
|
|
||||||
|
fn memory_job_creation(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
let client = rt.block_on(async {
|
||||||
|
SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create client")
|
||||||
|
});
|
||||||
|
|
||||||
|
rt.block_on(async {
|
||||||
|
let _ = client.runner_create("hero").await;
|
||||||
|
});
|
||||||
|
|
||||||
|
let mut group = c.benchmark_group("memory_job_creation");
|
||||||
|
|
||||||
|
for num_jobs in [10, 50, 100, 200].iter() {
|
||||||
|
group.bench_with_input(
|
||||||
|
BenchmarkId::from_parameter(num_jobs),
|
||||||
|
num_jobs,
|
||||||
|
|b, &num_jobs| {
|
||||||
|
b.iter_custom(|iters| {
|
||||||
|
let mut total_duration = Duration::ZERO;
|
||||||
|
|
||||||
|
for _ in 0..iters {
|
||||||
|
let mem_before = get_memory_usage().unwrap_or(0);
|
||||||
|
|
||||||
|
let start = std::time::Instant::now();
|
||||||
|
rt.block_on(async {
|
||||||
|
let mut jobs = Vec::new();
|
||||||
|
for i in 0..num_jobs {
|
||||||
|
let job = create_test_job("hero", "echo", vec![format!("mem_test_{}", i)]);
|
||||||
|
jobs.push(job);
|
||||||
|
}
|
||||||
|
black_box(jobs);
|
||||||
|
});
|
||||||
|
total_duration += start.elapsed();
|
||||||
|
|
||||||
|
let mem_after = get_memory_usage().unwrap_or(0);
|
||||||
|
let mem_delta = mem_after.saturating_sub(mem_before);
|
||||||
|
|
||||||
|
if mem_delta > 0 {
|
||||||
|
eprintln!("Memory delta for {} jobs: {} KB", num_jobs, mem_delta / 1024);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
total_duration
|
||||||
|
});
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
group.finish();
|
||||||
|
}
|
||||||
|
|
||||||
|
fn memory_client_creation(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
|
||||||
|
let mut group = c.benchmark_group("memory_client_creation");
|
||||||
|
|
||||||
|
for num_clients in [1, 10, 50, 100].iter() {
|
||||||
|
group.bench_with_input(
|
||||||
|
BenchmarkId::from_parameter(num_clients),
|
||||||
|
num_clients,
|
||||||
|
|b, &num_clients| {
|
||||||
|
b.iter_custom(|iters| {
|
||||||
|
let mut total_duration = Duration::ZERO;
|
||||||
|
|
||||||
|
for _ in 0..iters {
|
||||||
|
let mem_before = get_memory_usage().unwrap_or(0);
|
||||||
|
|
||||||
|
let start = std::time::Instant::now();
|
||||||
|
rt.block_on(async {
|
||||||
|
let mut clients = Vec::new();
|
||||||
|
for _ in 0..num_clients {
|
||||||
|
let client = SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create client");
|
||||||
|
clients.push(client);
|
||||||
|
}
|
||||||
|
black_box(clients);
|
||||||
|
});
|
||||||
|
total_duration += start.elapsed();
|
||||||
|
|
||||||
|
let mem_after = get_memory_usage().unwrap_or(0);
|
||||||
|
let mem_delta = mem_after.saturating_sub(mem_before);
|
||||||
|
|
||||||
|
if mem_delta > 0 {
|
||||||
|
eprintln!("Memory delta for {} clients: {} KB", num_clients, mem_delta / 1024);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
total_duration
|
||||||
|
});
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
group.finish();
|
||||||
|
}
|
||||||
|
|
||||||
|
fn memory_payload_sizes(c: &mut Criterion) {
|
||||||
|
let mut group = c.benchmark_group("memory_payload_sizes");
|
||||||
|
|
||||||
|
for size_kb in [1, 10, 100, 1000].iter() {
|
||||||
|
group.bench_with_input(
|
||||||
|
BenchmarkId::from_parameter(format!("{}KB", size_kb)),
|
||||||
|
size_kb,
|
||||||
|
|b, &size_kb| {
|
||||||
|
b.iter_custom(|iters| {
|
||||||
|
let mut total_duration = Duration::ZERO;
|
||||||
|
|
||||||
|
for _ in 0..iters {
|
||||||
|
let mem_before = get_memory_usage().unwrap_or(0);
|
||||||
|
|
||||||
|
let start = std::time::Instant::now();
|
||||||
|
let large_data = "x".repeat(size_kb * 1024);
|
||||||
|
let job = create_test_job("hero", "echo", vec![large_data]);
|
||||||
|
black_box(job);
|
||||||
|
total_duration += start.elapsed();
|
||||||
|
|
||||||
|
let mem_after = get_memory_usage().unwrap_or(0);
|
||||||
|
let mem_delta = mem_after.saturating_sub(mem_before);
|
||||||
|
|
||||||
|
if mem_delta > 0 {
|
||||||
|
eprintln!("Memory delta for {}KB payload: {} KB", size_kb, mem_delta / 1024);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
total_duration
|
||||||
|
});
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
group.finish();
|
||||||
|
}
|
||||||
|
|
||||||
|
criterion_group!(
|
||||||
|
memory_benches,
|
||||||
|
memory_job_creation,
|
||||||
|
memory_client_creation,
|
||||||
|
memory_payload_sizes,
|
||||||
|
);
|
||||||
|
|
||||||
|
criterion_main!(memory_benches);
|
||||||
113
benches/run_benchmarks.sh
Executable file
113
benches/run_benchmarks.sh
Executable file
@@ -0,0 +1,113 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Horus Stack Benchmark Runner
|
||||||
|
# This script ensures the Horus stack is running before executing benchmarks
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
PROJECT_ROOT="$(dirname "$SCRIPT_DIR")"
|
||||||
|
|
||||||
|
# Colors for output
|
||||||
|
RED='\033[0;31m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
NC='\033[0m' # No Color
|
||||||
|
|
||||||
|
# Configuration
|
||||||
|
SUPERVISOR_URL="http://127.0.0.1:3030"
|
||||||
|
OSIRIS_URL="http://127.0.0.1:8081"
|
||||||
|
REDIS_URL="127.0.0.1:6379"
|
||||||
|
|
||||||
|
echo -e "${GREEN}=== Horus Stack Benchmark Runner ===${NC}\n"
|
||||||
|
|
||||||
|
# Function to check if a service is running
|
||||||
|
check_service() {
|
||||||
|
local url=$1
|
||||||
|
local name=$2
|
||||||
|
|
||||||
|
if curl -s -f "$url/health" > /dev/null 2>&1 || curl -s -f "$url" > /dev/null 2>&1; then
|
||||||
|
echo -e "${GREEN}✓${NC} $name is running"
|
||||||
|
return 0
|
||||||
|
else
|
||||||
|
echo -e "${RED}✗${NC} $name is not running"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Function to check if Redis is running
|
||||||
|
check_redis() {
|
||||||
|
if redis-cli -h 127.0.0.1 -p 6379 ping > /dev/null 2>&1; then
|
||||||
|
echo -e "${GREEN}✓${NC} Redis is running"
|
||||||
|
return 0
|
||||||
|
else
|
||||||
|
echo -e "${RED}✗${NC} Redis is not running"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check prerequisites
|
||||||
|
echo "Checking prerequisites..."
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
REDIS_OK=false
|
||||||
|
OSIRIS_OK=false
|
||||||
|
SUPERVISOR_OK=false
|
||||||
|
|
||||||
|
if check_redis; then
|
||||||
|
REDIS_OK=true
|
||||||
|
fi
|
||||||
|
|
||||||
|
if check_service "$OSIRIS_URL" "Osiris"; then
|
||||||
|
OSIRIS_OK=true
|
||||||
|
fi
|
||||||
|
|
||||||
|
if check_service "$SUPERVISOR_URL" "Supervisor"; then
|
||||||
|
SUPERVISOR_OK=true
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# If any service is not running, provide instructions
|
||||||
|
if [ "$REDIS_OK" = false ] || [ "$OSIRIS_OK" = false ] || [ "$SUPERVISOR_OK" = false ]; then
|
||||||
|
echo -e "${YELLOW}Some services are not running. Please start the Horus stack:${NC}"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
if [ "$REDIS_OK" = false ]; then
|
||||||
|
echo " 1. Start Redis:"
|
||||||
|
echo " redis-server"
|
||||||
|
echo ""
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo " 2. Start Horus stack:"
|
||||||
|
echo " cd $PROJECT_ROOT"
|
||||||
|
echo " RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports"
|
||||||
|
echo ""
|
||||||
|
echo " Or run in the background:"
|
||||||
|
echo " RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports &"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
read -p "Do you want to continue anyway? (y/N) " -n 1 -r
|
||||||
|
echo
|
||||||
|
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
|
||||||
|
echo -e "${RED}Benchmark cancelled.${NC}"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Build the project first
|
||||||
|
echo -e "${GREEN}Building project...${NC}"
|
||||||
|
cd "$PROJECT_ROOT"
|
||||||
|
cargo build --release
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo -e "${GREEN}Running benchmarks...${NC}"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Run benchmarks with any additional arguments passed to this script
|
||||||
|
cargo bench --bench horus_stack "$@"
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo -e "${GREEN}=== Benchmark Complete ===${NC}"
|
||||||
|
echo ""
|
||||||
|
echo "Results saved to: target/criterion/"
|
||||||
|
echo "View HTML reports: open target/criterion/report/index.html"
|
||||||
300
benches/stress_test.rs
Normal file
300
benches/stress_test.rs
Normal file
@@ -0,0 +1,300 @@
|
|||||||
|
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
|
||||||
|
use hero_supervisor_openrpc_client::SupervisorClientBuilder;
|
||||||
|
use hero_job::Job;
|
||||||
|
use tokio::runtime::Runtime;
|
||||||
|
use std::time::Duration;
|
||||||
|
use std::collections::HashMap;
|
||||||
|
use uuid::Uuid;
|
||||||
|
use chrono::Utc;
|
||||||
|
|
||||||
|
/// Benchmark configuration
|
||||||
|
const SUPERVISOR_URL: &str = "http://127.0.0.1:3030";
|
||||||
|
const ADMIN_SECRET: &str = "SECRET";
|
||||||
|
|
||||||
|
/// Helper to create a tokio runtime for benchmarks
|
||||||
|
fn create_runtime() -> Runtime {
|
||||||
|
Runtime::new().unwrap()
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Helper to create a test job
|
||||||
|
fn create_test_job(runner: &str, command: &str, args: Vec<String>) -> Job {
|
||||||
|
Job {
|
||||||
|
id: Uuid::new_v4().to_string(),
|
||||||
|
caller_id: "benchmark".to_string(),
|
||||||
|
context_id: "test".to_string(),
|
||||||
|
payload: serde_json::json!({
|
||||||
|
"command": command,
|
||||||
|
"args": args
|
||||||
|
}).to_string(),
|
||||||
|
runner: runner.to_string(),
|
||||||
|
timeout: 30,
|
||||||
|
env_vars: HashMap::new(),
|
||||||
|
created_at: Utc::now(),
|
||||||
|
updated_at: Utc::now(),
|
||||||
|
signatures: vec![],
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Stress test: High-frequency job submissions
|
||||||
|
fn stress_high_frequency_jobs(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
|
||||||
|
let client = rt.block_on(async {
|
||||||
|
SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.timeout(Duration::from_secs(120))
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create supervisor client")
|
||||||
|
});
|
||||||
|
|
||||||
|
// Ensure runner is registered
|
||||||
|
rt.block_on(async {
|
||||||
|
let _ = client.runner_create("hero").await;
|
||||||
|
});
|
||||||
|
|
||||||
|
let mut group = c.benchmark_group("stress_high_frequency");
|
||||||
|
group.sample_size(10); // Fewer samples for stress tests
|
||||||
|
group.measurement_time(Duration::from_secs(20));
|
||||||
|
|
||||||
|
for num_jobs in [50, 100, 200].iter() {
|
||||||
|
group.bench_with_input(
|
||||||
|
BenchmarkId::from_parameter(num_jobs),
|
||||||
|
num_jobs,
|
||||||
|
|b, &num_jobs| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
let mut handles = vec![];
|
||||||
|
|
||||||
|
for i in 0..num_jobs {
|
||||||
|
let client = client.clone();
|
||||||
|
let handle = tokio::spawn(async move {
|
||||||
|
let job = create_test_job("hero", "echo", vec![format!("stress_{}", i)]);
|
||||||
|
client.job_create(job).await
|
||||||
|
});
|
||||||
|
handles.push(handle);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait for all jobs to be submitted
|
||||||
|
for handle in handles {
|
||||||
|
let _ = black_box(handle.await);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
group.finish();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Stress test: Sustained load over time
|
||||||
|
fn stress_sustained_load(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
|
||||||
|
let client = rt.block_on(async {
|
||||||
|
SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.timeout(Duration::from_secs(120))
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create supervisor client")
|
||||||
|
});
|
||||||
|
|
||||||
|
// Ensure runner is registered
|
||||||
|
rt.block_on(async {
|
||||||
|
let _ = client.runner_create("hero").await;
|
||||||
|
});
|
||||||
|
|
||||||
|
let mut group = c.benchmark_group("stress_sustained_load");
|
||||||
|
group.sample_size(10);
|
||||||
|
group.measurement_time(Duration::from_secs(30));
|
||||||
|
|
||||||
|
group.bench_function("continuous_submissions", |b| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
// Submit jobs continuously for the measurement period
|
||||||
|
for i in 0..20 {
|
||||||
|
let job = create_test_job("hero", "echo", vec![format!("sustained_{}", i)]);
|
||||||
|
let _ = black_box(client.job_create(job).await);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
group.finish();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Stress test: Large payload handling
|
||||||
|
fn stress_large_payloads(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
|
||||||
|
let client = rt.block_on(async {
|
||||||
|
SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.timeout(Duration::from_secs(120))
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create supervisor client")
|
||||||
|
});
|
||||||
|
|
||||||
|
// Ensure runner is registered
|
||||||
|
rt.block_on(async {
|
||||||
|
let _ = client.runner_create("hero").await;
|
||||||
|
});
|
||||||
|
|
||||||
|
let mut group = c.benchmark_group("stress_large_payloads");
|
||||||
|
group.sample_size(10);
|
||||||
|
|
||||||
|
for size_kb in [1, 10, 100].iter() {
|
||||||
|
group.bench_with_input(
|
||||||
|
BenchmarkId::from_parameter(format!("{}KB", size_kb)),
|
||||||
|
size_kb,
|
||||||
|
|b, &size_kb| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
// Create a large payload
|
||||||
|
let large_data = "x".repeat(size_kb * 1024);
|
||||||
|
let job = create_test_job("hero", "echo", vec![large_data]);
|
||||||
|
black_box(client.job_create(job).await.expect("Job create failed"))
|
||||||
|
});
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
group.finish();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Stress test: Rapid API calls
|
||||||
|
fn stress_rapid_api_calls(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
|
||||||
|
let client = rt.block_on(async {
|
||||||
|
SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create supervisor client")
|
||||||
|
});
|
||||||
|
|
||||||
|
let mut group = c.benchmark_group("stress_rapid_api");
|
||||||
|
group.sample_size(10);
|
||||||
|
group.measurement_time(Duration::from_secs(15));
|
||||||
|
|
||||||
|
group.bench_function("rapid_info_calls", |b| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
// Make 100 rapid API calls
|
||||||
|
for _ in 0..100 {
|
||||||
|
let _ = black_box(client.get_supervisor_info().await);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
group.bench_function("rapid_list_calls", |b| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
// Make 100 rapid list calls
|
||||||
|
for _ in 0..100 {
|
||||||
|
let _ = black_box(client.runner_list().await);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
group.finish();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Stress test: Mixed workload
|
||||||
|
fn stress_mixed_workload(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
|
||||||
|
let client = rt.block_on(async {
|
||||||
|
SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.timeout(Duration::from_secs(120))
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create supervisor client")
|
||||||
|
});
|
||||||
|
|
||||||
|
// Ensure runner is registered
|
||||||
|
rt.block_on(async {
|
||||||
|
let _ = client.runner_create("hero").await;
|
||||||
|
});
|
||||||
|
|
||||||
|
let mut group = c.benchmark_group("stress_mixed_workload");
|
||||||
|
group.sample_size(10);
|
||||||
|
group.measurement_time(Duration::from_secs(25));
|
||||||
|
|
||||||
|
group.bench_function("mixed_operations", |b| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
let mut handles = vec![];
|
||||||
|
|
||||||
|
// Mix of different operations
|
||||||
|
for i in 0..10 {
|
||||||
|
let client = client.clone();
|
||||||
|
|
||||||
|
// Job submission
|
||||||
|
let handle1 = tokio::spawn(async move {
|
||||||
|
let job = create_test_job("hero", "echo", vec![format!("mixed_{}", i)]);
|
||||||
|
client.job_create(job).await.map(|_| ())
|
||||||
|
});
|
||||||
|
handles.push(handle1);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait for all operations
|
||||||
|
for handle in handles {
|
||||||
|
let _ = black_box(handle.await);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
group.finish();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Stress test: Connection pool exhaustion
|
||||||
|
fn stress_connection_pool(c: &mut Criterion) {
|
||||||
|
let rt = create_runtime();
|
||||||
|
|
||||||
|
let mut group = c.benchmark_group("stress_connection_pool");
|
||||||
|
group.sample_size(10);
|
||||||
|
group.measurement_time(Duration::from_secs(20));
|
||||||
|
|
||||||
|
for num_clients in [10, 50, 100].iter() {
|
||||||
|
group.bench_with_input(
|
||||||
|
BenchmarkId::from_parameter(num_clients),
|
||||||
|
num_clients,
|
||||||
|
|b, &num_clients| {
|
||||||
|
b.to_async(&rt).iter(|| async {
|
||||||
|
let mut handles = vec![];
|
||||||
|
|
||||||
|
// Create many clients and make concurrent requests
|
||||||
|
for _ in 0..num_clients {
|
||||||
|
let handle = tokio::spawn(async move {
|
||||||
|
let client = SupervisorClientBuilder::new()
|
||||||
|
.url(SUPERVISOR_URL)
|
||||||
|
.secret(ADMIN_SECRET)
|
||||||
|
.build()
|
||||||
|
.expect("Failed to create client");
|
||||||
|
|
||||||
|
client.get_supervisor_info().await
|
||||||
|
});
|
||||||
|
handles.push(handle);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait for all requests
|
||||||
|
for handle in handles {
|
||||||
|
let _ = black_box(handle.await);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
group.finish();
|
||||||
|
}
|
||||||
|
|
||||||
|
criterion_group!(
|
||||||
|
stress_tests,
|
||||||
|
stress_high_frequency_jobs,
|
||||||
|
stress_sustained_load,
|
||||||
|
stress_large_payloads,
|
||||||
|
stress_rapid_api_calls,
|
||||||
|
stress_mixed_workload,
|
||||||
|
stress_connection_pool,
|
||||||
|
);
|
||||||
|
|
||||||
|
criterion_main!(stress_tests);
|
||||||
40
bin/horus/Cargo.toml
Normal file
40
bin/horus/Cargo.toml
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
[package]
|
||||||
|
name = "horus-mono"
|
||||||
|
version.workspace = true
|
||||||
|
edition.workspace = true
|
||||||
|
authors.workspace = true
|
||||||
|
license.workspace = true
|
||||||
|
repository.workspace = true
|
||||||
|
|
||||||
|
[[bin]]
|
||||||
|
name = "horus"
|
||||||
|
path = "src/main.rs"
|
||||||
|
|
||||||
|
[dependencies]
|
||||||
|
# Workspace dependencies
|
||||||
|
tokio = { workspace = true }
|
||||||
|
clap = { workspace = true }
|
||||||
|
log = { workspace = true }
|
||||||
|
env_logger = { workspace = true }
|
||||||
|
anyhow = { workspace = true }
|
||||||
|
|
||||||
|
# Internal dependencies - coordinator
|
||||||
|
hero-coordinator = { path = "../coordinator" }
|
||||||
|
hero-supervisor-openrpc-client = { path = "../../lib/clients/supervisor" }
|
||||||
|
|
||||||
|
# Internal dependencies - supervisor
|
||||||
|
hero-supervisor = { path = "../supervisor" }
|
||||||
|
|
||||||
|
# Internal dependencies - osiris server
|
||||||
|
osiris-core = { path = "../../lib/osiris/core" }
|
||||||
|
axum = "0.7"
|
||||||
|
tower = "0.4"
|
||||||
|
tower-http = { workspace = true }
|
||||||
|
serde = { workspace = true }
|
||||||
|
serde_json = { workspace = true }
|
||||||
|
tracing = { workspace = true }
|
||||||
|
tracing-subscriber = { workspace = true }
|
||||||
|
|
||||||
|
# Internal dependencies - runners
|
||||||
|
hero-runner = { path = "../../lib/runner" }
|
||||||
|
hero-job = { path = "../../lib/models/job" }
|
||||||
145
bin/horus/README.md
Normal file
145
bin/horus/README.md
Normal file
@@ -0,0 +1,145 @@
|
|||||||
|
# Horus - Hero System Mono Binary
|
||||||
|
|
||||||
|
A unified binary that runs all Hero system components: coordinator, supervisor, osiris server, and runners.
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
Build the binary:
|
||||||
|
```bash
|
||||||
|
cargo build -p horus-mono --release
|
||||||
|
```
|
||||||
|
|
||||||
|
The binary will be available at `target/release/horus`.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Run Individual Services
|
||||||
|
|
||||||
|
#### Coordinator
|
||||||
|
Manages job execution across runners:
|
||||||
|
```bash
|
||||||
|
horus coordinator \
|
||||||
|
--mycelium-ip 127.0.0.1 \
|
||||||
|
--mycelium-port 8990 \
|
||||||
|
--redis-addr 127.0.0.1:6379 \
|
||||||
|
--api-http-ip 127.0.0.1 \
|
||||||
|
--api-http-port 9652 \
|
||||||
|
--api-ws-ip 127.0.0.1 \
|
||||||
|
--api-ws-port 9653
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Supervisor
|
||||||
|
Manages actors and dispatches jobs:
|
||||||
|
```bash
|
||||||
|
horus supervisor \
|
||||||
|
--redis-url redis://127.0.0.1:6379 \
|
||||||
|
--admin-secret your-admin-secret \
|
||||||
|
--port 3030 \
|
||||||
|
--bind-address 127.0.0.1 \
|
||||||
|
--runners osiris,sal,hero
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Osiris Server
|
||||||
|
REST API server for Osiris data structures:
|
||||||
|
```bash
|
||||||
|
horus osiris \
|
||||||
|
--bind-address 0.0.0.0 \
|
||||||
|
--port 8081
|
||||||
|
```
|
||||||
|
|
||||||
|
### Run All Services Together
|
||||||
|
|
||||||
|
Start all services with a single command:
|
||||||
|
```bash
|
||||||
|
horus all \
|
||||||
|
--redis-url redis://127.0.0.1:6379 \
|
||||||
|
--admin-secret your-admin-secret
|
||||||
|
```
|
||||||
|
|
||||||
|
**Kill existing processes on ports before starting:**
|
||||||
|
```bash
|
||||||
|
horus all \
|
||||||
|
--redis-url redis://127.0.0.1:6379 \
|
||||||
|
--admin-secret your-admin-secret \
|
||||||
|
--kill-ports
|
||||||
|
```
|
||||||
|
|
||||||
|
This will start:
|
||||||
|
- **Supervisor** on `http://127.0.0.1:3030`
|
||||||
|
- **Coordinator HTTP** on `http://127.0.0.1:9652`
|
||||||
|
- **Coordinator WebSocket** on `ws://127.0.0.1:9653`
|
||||||
|
- **Osiris Server** on `http://0.0.0.0:8081`
|
||||||
|
|
||||||
|
The `--kill-ports` flag will automatically kill any processes using ports 3030, 8081, 9652, and 9653 before starting the services.
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
You can also configure services using environment variables:
|
||||||
|
|
||||||
|
### Coordinator
|
||||||
|
- `MYCELIUM_IP` - Mycelium IP address (default: 127.0.0.1)
|
||||||
|
- `MYCELIUM_PORT` - Mycelium port (default: 8990)
|
||||||
|
- `REDIS_ADDR` - Redis address (default: 127.0.0.1:6379)
|
||||||
|
- `API_HTTP_IP` - HTTP API bind IP (default: 127.0.0.1)
|
||||||
|
- `API_HTTP_PORT` - HTTP API port (default: 9652)
|
||||||
|
- `API_WS_IP` - WebSocket API bind IP (default: 127.0.0.1)
|
||||||
|
- `API_WS_PORT` - WebSocket API port (default: 9653)
|
||||||
|
|
||||||
|
### Logging
|
||||||
|
|
||||||
|
Set the `RUST_LOG` environment variable to control logging:
|
||||||
|
```bash
|
||||||
|
RUST_LOG=info horus all --admin-secret your-secret
|
||||||
|
```
|
||||||
|
|
||||||
|
Available levels: `error`, `warn`, `info`, `debug`, `trace`
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Redis server running on localhost:6379 (or specify custom address)
|
||||||
|
- For coordinator: Mycelium service running (if using Mycelium transport)
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
The horus binary consolidates the following components:
|
||||||
|
|
||||||
|
1. **Coordinator** - Routes jobs between contexts and manages job execution
|
||||||
|
2. **Supervisor** - Manages runner registration and job dispatching
|
||||||
|
3. **Osiris Server** - Provides REST API for Osiris data structures
|
||||||
|
4. **Runners** (not included in mono binary, run separately):
|
||||||
|
- OSIRIS runner - Script execution with Osiris support
|
||||||
|
- SAL runner - Script execution with SAL support
|
||||||
|
- Hero runner - Command execution
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Development Setup
|
||||||
|
```bash
|
||||||
|
# Start Redis
|
||||||
|
redis-server
|
||||||
|
|
||||||
|
# Run all services (kills any existing processes on required ports)
|
||||||
|
RUST_LOG=info horus all --admin-secret dev-secret --kill-ports
|
||||||
|
```
|
||||||
|
|
||||||
|
### Production Setup
|
||||||
|
```bash
|
||||||
|
# Build release binary
|
||||||
|
cargo build -p horus-mono --release
|
||||||
|
|
||||||
|
# Run with production settings
|
||||||
|
RUST_LOG=warn ./target/release/horus all \
|
||||||
|
--redis-url redis://prod-redis:6379 \
|
||||||
|
--admin-secret $ADMIN_SECRET
|
||||||
|
```
|
||||||
|
|
||||||
|
## Help
|
||||||
|
|
||||||
|
For detailed help on any command:
|
||||||
|
```bash
|
||||||
|
horus --help
|
||||||
|
horus coordinator --help
|
||||||
|
horus supervisor --help
|
||||||
|
horus osiris --help
|
||||||
|
horus all --help
|
||||||
|
```
|
||||||
569
bin/horus/src/main.rs
Normal file
569
bin/horus/src/main.rs
Normal file
@@ -0,0 +1,569 @@
|
|||||||
|
//! Horus - Mono binary for running all Hero components
|
||||||
|
//!
|
||||||
|
//! This binary provides subcommands to run:
|
||||||
|
//! - coordinator: Job coordination service
|
||||||
|
//! - supervisor: Actor and job management
|
||||||
|
//! - osiris: REST API server
|
||||||
|
//! - runner-osiris: Osiris script runner
|
||||||
|
//! - runner-sal: SAL script runner
|
||||||
|
//! - runner-hero: Command execution runner
|
||||||
|
//! - all: Run all components together
|
||||||
|
|
||||||
|
use clap::{Parser, Subcommand};
|
||||||
|
|
||||||
|
#[derive(Parser)]
|
||||||
|
#[command(name = "horus")]
|
||||||
|
#[command(about = "Horus - Hero system mono binary", long_about = None)]
|
||||||
|
struct Cli {
|
||||||
|
#[command(subcommand)]
|
||||||
|
command: Commands,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Subcommand)]
|
||||||
|
enum Commands {
|
||||||
|
/// Run the coordinator service
|
||||||
|
Coordinator {
|
||||||
|
#[arg(long, default_value = "127.0.0.1")]
|
||||||
|
mycelium_ip: String,
|
||||||
|
|
||||||
|
#[arg(long, default_value = "8990")]
|
||||||
|
mycelium_port: u16,
|
||||||
|
|
||||||
|
#[arg(long, default_value = "127.0.0.1:6379")]
|
||||||
|
redis_addr: String,
|
||||||
|
|
||||||
|
#[arg(long, default_value = "127.0.0.1")]
|
||||||
|
api_http_ip: String,
|
||||||
|
|
||||||
|
#[arg(long, default_value = "9652")]
|
||||||
|
api_http_port: u16,
|
||||||
|
|
||||||
|
#[arg(long, default_value = "127.0.0.1")]
|
||||||
|
api_ws_ip: String,
|
||||||
|
|
||||||
|
#[arg(long, default_value = "9653")]
|
||||||
|
api_ws_port: u16,
|
||||||
|
},
|
||||||
|
|
||||||
|
/// Run the supervisor service
|
||||||
|
Supervisor {
|
||||||
|
#[arg(long, default_value = "redis://127.0.0.1:6379")]
|
||||||
|
redis_url: String,
|
||||||
|
|
||||||
|
#[arg(long, default_value = "")]
|
||||||
|
namespace: String,
|
||||||
|
|
||||||
|
#[arg(long = "admin-secret", required = true)]
|
||||||
|
admin_secrets: Vec<String>,
|
||||||
|
|
||||||
|
#[arg(long = "user-secret")]
|
||||||
|
user_secrets: Vec<String>,
|
||||||
|
|
||||||
|
#[arg(long = "register-secret")]
|
||||||
|
register_secrets: Vec<String>,
|
||||||
|
|
||||||
|
#[arg(long, default_value = "3030")]
|
||||||
|
port: u16,
|
||||||
|
|
||||||
|
#[arg(long, default_value = "127.0.0.1")]
|
||||||
|
bind_address: String,
|
||||||
|
|
||||||
|
#[arg(long, value_delimiter = ',')]
|
||||||
|
runners: Vec<String>,
|
||||||
|
},
|
||||||
|
|
||||||
|
/// Run the Osiris REST API server
|
||||||
|
Osiris {
|
||||||
|
#[arg(long, default_value = "0.0.0.0")]
|
||||||
|
bind_address: String,
|
||||||
|
|
||||||
|
#[arg(long, default_value = "8081")]
|
||||||
|
port: u16,
|
||||||
|
},
|
||||||
|
|
||||||
|
/// Run all services together
|
||||||
|
All {
|
||||||
|
#[arg(long, default_value = "redis://127.0.0.1:6379")]
|
||||||
|
redis_url: String,
|
||||||
|
|
||||||
|
#[arg(long = "admin-secret", required = true)]
|
||||||
|
admin_secrets: Vec<String>,
|
||||||
|
|
||||||
|
#[arg(long, help = "Kill processes using required ports before starting")]
|
||||||
|
kill_ports: bool,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::main]
|
||||||
|
async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||||
|
let cli = Cli::parse();
|
||||||
|
|
||||||
|
match cli.command {
|
||||||
|
Commands::Coordinator {
|
||||||
|
mycelium_ip,
|
||||||
|
mycelium_port,
|
||||||
|
redis_addr,
|
||||||
|
api_http_ip,
|
||||||
|
api_http_port,
|
||||||
|
api_ws_ip,
|
||||||
|
api_ws_port,
|
||||||
|
} => {
|
||||||
|
run_coordinator(
|
||||||
|
mycelium_ip,
|
||||||
|
mycelium_port,
|
||||||
|
redis_addr,
|
||||||
|
api_http_ip,
|
||||||
|
api_http_port,
|
||||||
|
api_ws_ip,
|
||||||
|
api_ws_port,
|
||||||
|
false,
|
||||||
|
).await?;
|
||||||
|
}
|
||||||
|
|
||||||
|
Commands::Supervisor {
|
||||||
|
redis_url,
|
||||||
|
namespace,
|
||||||
|
admin_secrets,
|
||||||
|
user_secrets,
|
||||||
|
register_secrets,
|
||||||
|
port,
|
||||||
|
bind_address,
|
||||||
|
runners,
|
||||||
|
} => {
|
||||||
|
run_supervisor(
|
||||||
|
redis_url,
|
||||||
|
namespace,
|
||||||
|
admin_secrets,
|
||||||
|
user_secrets,
|
||||||
|
register_secrets,
|
||||||
|
port,
|
||||||
|
bind_address,
|
||||||
|
runners,
|
||||||
|
false,
|
||||||
|
).await?;
|
||||||
|
}
|
||||||
|
|
||||||
|
Commands::Osiris { bind_address, port } => {
|
||||||
|
run_osiris(bind_address, port, false).await?;
|
||||||
|
}
|
||||||
|
|
||||||
|
Commands::All {
|
||||||
|
redis_url,
|
||||||
|
admin_secrets,
|
||||||
|
kill_ports,
|
||||||
|
} => {
|
||||||
|
run_all(redis_url, admin_secrets, kill_ports).await?;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn run_coordinator(
|
||||||
|
mycelium_ip: String,
|
||||||
|
mycelium_port: u16,
|
||||||
|
redis_addr: String,
|
||||||
|
api_http_ip: String,
|
||||||
|
api_http_port: u16,
|
||||||
|
api_ws_ip: String,
|
||||||
|
api_ws_port: u16,
|
||||||
|
skip_logging_init: bool,
|
||||||
|
) -> Result<(), Box<dyn std::error::Error>> {
|
||||||
|
use std::net::{IpAddr, SocketAddr};
|
||||||
|
use std::sync::Arc;
|
||||||
|
use tracing::{error, info};
|
||||||
|
use tracing_subscriber::EnvFilter;
|
||||||
|
|
||||||
|
if !skip_logging_init {
|
||||||
|
let filter = EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new("info"));
|
||||||
|
tracing_subscriber::fmt()
|
||||||
|
.with_env_filter(filter)
|
||||||
|
.pretty()
|
||||||
|
.with_target(true)
|
||||||
|
.with_level(true)
|
||||||
|
.init();
|
||||||
|
}
|
||||||
|
|
||||||
|
let mycelium_ip: IpAddr = mycelium_ip.parse()?;
|
||||||
|
let api_http_ip: IpAddr = api_http_ip.parse()?;
|
||||||
|
let api_ws_ip: IpAddr = api_ws_ip.parse()?;
|
||||||
|
let redis_addr: SocketAddr = redis_addr.parse()?;
|
||||||
|
|
||||||
|
let http_addr = SocketAddr::new(api_http_ip, api_http_port);
|
||||||
|
let ws_addr = SocketAddr::new(api_ws_ip, api_ws_port);
|
||||||
|
|
||||||
|
let redis = hero_coordinator::storage::RedisDriver::new(redis_addr.to_string())
|
||||||
|
.await
|
||||||
|
.expect("Failed to connect to Redis");
|
||||||
|
|
||||||
|
let service = hero_coordinator::service::AppService::new(redis);
|
||||||
|
let service_for_router = service.clone();
|
||||||
|
|
||||||
|
let state = Arc::new(hero_coordinator::rpc::AppState::new(service));
|
||||||
|
|
||||||
|
// Only initialize router if not skipping logging (i.e., not in "all" mode)
|
||||||
|
// In "all" mode, we skip Mycelium since everything is local
|
||||||
|
if !skip_logging_init {
|
||||||
|
let base_url = format!("http://{}:{}", mycelium_ip, mycelium_port);
|
||||||
|
let mycelium = Arc::new(
|
||||||
|
hero_supervisor_openrpc_client::transports::MyceliumClient::new(&base_url)
|
||||||
|
.expect("Failed to create MyceliumClient")
|
||||||
|
);
|
||||||
|
let hub = hero_supervisor_openrpc_client::transports::SupervisorHub::new_with_client(
|
||||||
|
mycelium,
|
||||||
|
"supervisor.rpc".to_string(),
|
||||||
|
);
|
||||||
|
let cfg = hero_coordinator::router::RouterConfig {
|
||||||
|
context_ids: Vec::new(),
|
||||||
|
concurrency: 32,
|
||||||
|
base_url,
|
||||||
|
topic: "supervisor.rpc".to_string(),
|
||||||
|
sup_hub: hub.clone(),
|
||||||
|
transport_poll_interval_secs: 2,
|
||||||
|
transport_poll_timeout_secs: 300,
|
||||||
|
};
|
||||||
|
let _auto_handle = hero_coordinator::router::start_router_auto(service_for_router, cfg);
|
||||||
|
}
|
||||||
|
|
||||||
|
let http_module = hero_coordinator::rpc::build_module(state.clone());
|
||||||
|
let ws_module = hero_coordinator::rpc::build_module(state.clone());
|
||||||
|
|
||||||
|
info!(%http_addr, %ws_addr, %redis_addr, "Starting Coordinator JSON-RPC servers");
|
||||||
|
|
||||||
|
let _http_handle = match hero_coordinator::rpc::start_http(http_addr, http_module).await {
|
||||||
|
Ok(handle) => handle,
|
||||||
|
Err(e) => {
|
||||||
|
error!("Failed to start HTTP server on {}: {}", http_addr, e);
|
||||||
|
return Err(format!("Failed to start HTTP server: {}", e).into());
|
||||||
|
}
|
||||||
|
};
|
||||||
|
let _ws_handle = match hero_coordinator::rpc::start_ws(ws_addr, ws_module).await {
|
||||||
|
Ok(handle) => handle,
|
||||||
|
Err(e) => {
|
||||||
|
error!("Failed to start WS server on {}: {}", ws_addr, e);
|
||||||
|
return Err(format!("Failed to start WS server: {}", e).into());
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
if let Err(e) = tokio::signal::ctrl_c().await {
|
||||||
|
error!(error=%e, "Failed to listen for shutdown signal");
|
||||||
|
}
|
||||||
|
info!("Shutdown signal received, exiting.");
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn run_supervisor(
|
||||||
|
_redis_url: String,
|
||||||
|
_namespace: String,
|
||||||
|
admin_secrets: Vec<String>,
|
||||||
|
user_secrets: Vec<String>,
|
||||||
|
register_secrets: Vec<String>,
|
||||||
|
port: u16,
|
||||||
|
bind_address: String,
|
||||||
|
runners: Vec<String>,
|
||||||
|
skip_logging_init: bool,
|
||||||
|
) -> Result<(), Box<dyn std::error::Error>> {
|
||||||
|
use hero_supervisor::SupervisorBuilder;
|
||||||
|
use log::{error, info};
|
||||||
|
|
||||||
|
if !skip_logging_init {
|
||||||
|
env_logger::init();
|
||||||
|
}
|
||||||
|
|
||||||
|
let mut builder = SupervisorBuilder::new()
|
||||||
|
.admin_secrets(admin_secrets);
|
||||||
|
|
||||||
|
if !user_secrets.is_empty() {
|
||||||
|
builder = builder.user_secrets(user_secrets);
|
||||||
|
}
|
||||||
|
|
||||||
|
if !register_secrets.is_empty() {
|
||||||
|
builder = builder.register_secrets(register_secrets);
|
||||||
|
}
|
||||||
|
|
||||||
|
let supervisor = builder.build().await?;
|
||||||
|
|
||||||
|
if !runners.is_empty() {
|
||||||
|
for runner_name in &runners {
|
||||||
|
match supervisor.runner_create(runner_name.clone()).await {
|
||||||
|
Ok(_) => {},
|
||||||
|
Err(e) => error!("Failed to register runner '{}': {}", runner_name, e),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
use hero_supervisor::openrpc::start_http_openrpc_server;
|
||||||
|
|
||||||
|
let supervisor_clone = supervisor.clone();
|
||||||
|
let bind_addr = bind_address.clone();
|
||||||
|
|
||||||
|
tokio::spawn(async move {
|
||||||
|
match start_http_openrpc_server(supervisor_clone, &bind_addr, port).await {
|
||||||
|
Ok(handle) => {
|
||||||
|
handle.stopped().await;
|
||||||
|
error!("OpenRPC server stopped unexpectedly");
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
error!("OpenRPC server error: {}", e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
tokio::time::sleep(tokio::time::Duration::from_millis(500)).await;
|
||||||
|
|
||||||
|
println!("📡 Supervisor: http://{}:{}", bind_address, port);
|
||||||
|
info!("Hero Supervisor is running. Press Ctrl+C to shutdown.");
|
||||||
|
|
||||||
|
tokio::spawn(async move {
|
||||||
|
tokio::signal::ctrl_c().await.expect("Failed to listen for ctrl+c");
|
||||||
|
info!("Received shutdown signal");
|
||||||
|
std::process::exit(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
loop {
|
||||||
|
tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn run_osiris(
|
||||||
|
bind_address: String,
|
||||||
|
port: u16,
|
||||||
|
skip_logging_init: bool,
|
||||||
|
) -> Result<(), Box<dyn std::error::Error>> {
|
||||||
|
use axum::{
|
||||||
|
extract::{Path, Query, State},
|
||||||
|
http::StatusCode,
|
||||||
|
response::{IntoResponse, Json},
|
||||||
|
routing::get,
|
||||||
|
Router,
|
||||||
|
};
|
||||||
|
use serde_json::{json, Value};
|
||||||
|
use std::collections::HashMap;
|
||||||
|
use std::sync::Arc;
|
||||||
|
use tower_http::cors::{Any, CorsLayer};
|
||||||
|
use tracing::{info, warn};
|
||||||
|
|
||||||
|
if !skip_logging_init {
|
||||||
|
tracing_subscriber::fmt()
|
||||||
|
.with_target(false)
|
||||||
|
.compact()
|
||||||
|
.init();
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Clone)]
|
||||||
|
struct AppState {
|
||||||
|
store: Arc<tokio::sync::RwLock<HashMap<String, HashMap<String, Value>>>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl AppState {
|
||||||
|
fn new() -> Self {
|
||||||
|
Self {
|
||||||
|
store: Arc::new(tokio::sync::RwLock::new(HashMap::new())),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn health_check() -> impl IntoResponse {
|
||||||
|
Json(json!({
|
||||||
|
"status": "healthy",
|
||||||
|
"service": "osiris-server",
|
||||||
|
"version": "0.1.0"
|
||||||
|
}))
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_struct(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
Path((struct_name, id)): Path<(String, String)>,
|
||||||
|
) -> Result<Json<Value>, (StatusCode, String)> {
|
||||||
|
info!("GET /api/{}/{}", struct_name, id);
|
||||||
|
|
||||||
|
let store = state.store.read().await;
|
||||||
|
|
||||||
|
if let Some(struct_store) = store.get(&struct_name) {
|
||||||
|
if let Some(data) = struct_store.get(&id) {
|
||||||
|
return Ok(Json(data.clone()));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
warn!("Not found: {}/{}", struct_name, id);
|
||||||
|
Err((
|
||||||
|
StatusCode::NOT_FOUND,
|
||||||
|
format!("{}/{} not found", struct_name, id),
|
||||||
|
))
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn list_structs(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
Path(struct_name): Path<String>,
|
||||||
|
Query(params): Query<HashMap<String, String>>,
|
||||||
|
) -> Result<Json<Vec<Value>>, (StatusCode, String)> {
|
||||||
|
info!("GET /api/{} with params: {:?}", struct_name, params);
|
||||||
|
|
||||||
|
let store = state.store.read().await;
|
||||||
|
|
||||||
|
if let Some(struct_store) = store.get(&struct_name) {
|
||||||
|
let mut results: Vec<Value> = struct_store.values().cloned().collect();
|
||||||
|
|
||||||
|
if !params.is_empty() {
|
||||||
|
results.retain(|item| {
|
||||||
|
params.iter().all(|(key, value)| {
|
||||||
|
item.get(key)
|
||||||
|
.and_then(|v| v.as_str())
|
||||||
|
.map(|v| v == value)
|
||||||
|
.unwrap_or(false)
|
||||||
|
})
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
return Ok(Json(results));
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(Json(vec![]))
|
||||||
|
}
|
||||||
|
|
||||||
|
let state = AppState::new();
|
||||||
|
|
||||||
|
let app = Router::new()
|
||||||
|
.route("/health", get(health_check))
|
||||||
|
.route("/api/:struct_name", get(list_structs))
|
||||||
|
.route("/api/:struct_name/:id", get(get_struct))
|
||||||
|
.layer(
|
||||||
|
CorsLayer::new()
|
||||||
|
.allow_origin(Any)
|
||||||
|
.allow_methods(Any)
|
||||||
|
.allow_headers(Any),
|
||||||
|
)
|
||||||
|
.with_state(state);
|
||||||
|
|
||||||
|
let addr = format!("{}:{}", bind_address, port);
|
||||||
|
info!("🚀 Osiris Server starting on {}", addr);
|
||||||
|
|
||||||
|
let listener = tokio::net::TcpListener::bind(&addr)
|
||||||
|
.await
|
||||||
|
.expect("Failed to bind address");
|
||||||
|
|
||||||
|
axum::serve(listener, app)
|
||||||
|
.await
|
||||||
|
.expect("Server failed");
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Kill any process using the specified port
|
||||||
|
async fn kill_port(port: u16) -> Result<(), Box<dyn std::error::Error>> {
|
||||||
|
use std::process::Command;
|
||||||
|
use log::info;
|
||||||
|
|
||||||
|
// Use lsof to find the process using the port
|
||||||
|
let output = Command::new("lsof")
|
||||||
|
.args(&["-ti", &format!(":{}", port)])
|
||||||
|
.output()?;
|
||||||
|
|
||||||
|
if !output.status.success() || output.stdout.is_empty() {
|
||||||
|
// No process found on this port
|
||||||
|
return Ok(());
|
||||||
|
}
|
||||||
|
|
||||||
|
let pid_str = String::from_utf8_lossy(&output.stdout);
|
||||||
|
let pids: Vec<&str> = pid_str.trim().lines().collect();
|
||||||
|
|
||||||
|
for pid in pids {
|
||||||
|
if let Ok(pid_num) = pid.trim().parse::<i32>() {
|
||||||
|
info!("Killing process {} on port {}", pid_num, port);
|
||||||
|
let _ = Command::new("kill")
|
||||||
|
.arg(pid)
|
||||||
|
.output();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn run_all(
|
||||||
|
redis_url: String,
|
||||||
|
admin_secrets: Vec<String>,
|
||||||
|
kill_ports: bool,
|
||||||
|
) -> Result<(), Box<dyn std::error::Error>> {
|
||||||
|
use log::{info, warn};
|
||||||
|
|
||||||
|
// Initialize logging once for all services
|
||||||
|
env_logger::init();
|
||||||
|
|
||||||
|
// Kill processes on required ports if requested
|
||||||
|
if kill_ports {
|
||||||
|
let ports = vec![3030, 8081, 9652, 9653];
|
||||||
|
info!("🔪 Killing processes on ports: {:?}", ports);
|
||||||
|
|
||||||
|
for port in ports {
|
||||||
|
if let Err(e) = kill_port(port).await {
|
||||||
|
warn!("Failed to kill port {}: {}", port, e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Give the OS a moment to release the ports
|
||||||
|
tokio::time::sleep(tokio::time::Duration::from_millis(500)).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
info!("🚀 Starting all Horus services...");
|
||||||
|
|
||||||
|
// Start Osiris server
|
||||||
|
let osiris_handle = tokio::spawn(async move {
|
||||||
|
if let Err(e) = run_osiris("0.0.0.0".to_string(), 8081, true).await {
|
||||||
|
eprintln!("Osiris server error: {}", e);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Start Supervisor
|
||||||
|
let redis_url_clone = redis_url.clone();
|
||||||
|
let admin_secrets_clone = admin_secrets.clone();
|
||||||
|
let supervisor_handle = tokio::spawn(async move {
|
||||||
|
if let Err(e) = run_supervisor(
|
||||||
|
redis_url_clone,
|
||||||
|
"".to_string(),
|
||||||
|
admin_secrets_clone,
|
||||||
|
vec![],
|
||||||
|
vec![],
|
||||||
|
3030,
|
||||||
|
"127.0.0.1".to_string(),
|
||||||
|
vec!["osiris".to_string(), "sal".to_string(), "hero".to_string()],
|
||||||
|
true,
|
||||||
|
).await {
|
||||||
|
eprintln!("Supervisor error: {}", e);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Give supervisor time to start
|
||||||
|
tokio::time::sleep(tokio::time::Duration::from_secs(2)).await;
|
||||||
|
|
||||||
|
// Start Coordinator
|
||||||
|
let coordinator_handle = tokio::spawn(async move {
|
||||||
|
if let Err(e) = run_coordinator(
|
||||||
|
"127.0.0.1".to_string(),
|
||||||
|
8990,
|
||||||
|
"127.0.0.1:6379".to_string(),
|
||||||
|
"127.0.0.1".to_string(),
|
||||||
|
9652,
|
||||||
|
"127.0.0.1".to_string(),
|
||||||
|
9653,
|
||||||
|
true,
|
||||||
|
).await {
|
||||||
|
eprintln!("Coordinator error: {}", e);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
info!("✅ All services started:");
|
||||||
|
info!(" 📡 Supervisor: http://127.0.0.1:3030");
|
||||||
|
info!(" 🔗 Coordinator HTTP: http://127.0.0.1:9652");
|
||||||
|
info!(" 🔗 Coordinator WS: ws://127.0.0.1:9653");
|
||||||
|
info!(" 🌐 Osiris: http://0.0.0.0:8081");
|
||||||
|
|
||||||
|
// Wait for all services
|
||||||
|
tokio::select! {
|
||||||
|
_ = osiris_handle => {},
|
||||||
|
_ = supervisor_handle => {},
|
||||||
|
_ = coordinator_handle => {},
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
@@ -159,7 +159,7 @@ mod tests {
|
|||||||
.payload("test payload")
|
.payload("test payload")
|
||||||
.build()
|
.build()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
job.id = id.to_string(); // Set ID manually
|
// job.id = id.to_string(); // Set ID manually
|
||||||
job
|
job
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
46
scripts/configure.md
Normal file
46
scripts/configure.md
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
# Horus Configuration Heroscript
|
||||||
|
|
||||||
|
## Configure Coordinator
|
||||||
|
|
||||||
|
!!coordinator.configure
|
||||||
|
name:'default'
|
||||||
|
binary_path:'/hero/var/bin/coordinator'
|
||||||
|
redis_addr:'127.0.0.1:6379'
|
||||||
|
http_port:8081
|
||||||
|
ws_port:9653
|
||||||
|
log_level:'info'
|
||||||
|
repo_path:'/root/code/git.ourworld.tf/herocode/horus'
|
||||||
|
|
||||||
|
## Configure Supervisor
|
||||||
|
!!supervisor.configure
|
||||||
|
name:'default'
|
||||||
|
binary_path:'/hero/var/bin/supervisor'
|
||||||
|
redis_addr:'127.0.0.1:6379'
|
||||||
|
http_port:8082
|
||||||
|
ws_port:9654
|
||||||
|
log_level:'info'
|
||||||
|
repo_path:'/root/code/git.ourworld.tf/herocode/horus'
|
||||||
|
|
||||||
|
## Configure Hero Runner
|
||||||
|
!!herorunner.configure
|
||||||
|
name:'default'
|
||||||
|
binary_path:'/hero/var/bin/herorunner'
|
||||||
|
redis_addr:'127.0.0.1:6379'
|
||||||
|
log_level:'info'
|
||||||
|
repo_path:'/root/code/git.ourworld.tf/herocode/horus'
|
||||||
|
|
||||||
|
## Configure Osiris Runner
|
||||||
|
!!osirisrunner.configure
|
||||||
|
name:'default'
|
||||||
|
binary_path:'/hero/var/bin/runner_osiris'
|
||||||
|
redis_addr:'127.0.0.1:6379'
|
||||||
|
log_level:'info'
|
||||||
|
repo_path:'/root/code/git.ourworld.tf/herocode/horus'
|
||||||
|
|
||||||
|
## Configure SAL Runner
|
||||||
|
!!salrunner.configure
|
||||||
|
name:'default'
|
||||||
|
binary_path:'/hero/var/bin/runner_sal'
|
||||||
|
redis_addr:'127.0.0.1:6379'
|
||||||
|
log_level:'info'
|
||||||
|
repo_path:'/root/code/git.ourworld.tf/herocode/horus'
|
||||||
@@ -0,0 +1,6 @@
|
|||||||
|
// Install all components
|
||||||
|
!!herocoordinator.install
|
||||||
|
!!supervisor.install
|
||||||
|
!!herorunner.install
|
||||||
|
!!osirisrunner.install
|
||||||
|
!!salrunner.install
|
||||||
|
|||||||
12
scripts/start.md
Normal file
12
scripts/start.md
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
# Horus Start Script
|
||||||
|
|
||||||
|
Starts all horus binaries
|
||||||
|
|
||||||
|
!!include install.md
|
||||||
|
|
||||||
|
// Start all services
|
||||||
|
!!herocoordinator.start name:'default'
|
||||||
|
!!supervisor.start name:'default'
|
||||||
|
!!herorunner.start name:'default'
|
||||||
|
!!osirisrunner.start name:'default'
|
||||||
|
!!salrunner.start name:'default'
|
||||||
Reference in New Issue
Block a user