All benchmarks interact with the stack through the official client libraries in /lib/clients, which is the only supported way to interact with the system.

Prerequisites

Before running benchmarks, you must have the Horus stack running:

# Start Redis
redis-server

# Start all Horus services
cd /Users/timurgordon/code/git.ourworld.tf/herocode/horus
RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports

The benchmarks expect:

Supervisor running on http://127.0.0.1:3030
Coordinator running on http://127.0.0.1:9652 (HTTP) and ws://127.0.0.1:9653 (WebSocket)
Osiris running on http://127.0.0.1:8081
Redis running on 127.0.0.1:6379
Admin secret: SECRET

Running Benchmarks

Run all benchmarks

cargo bench --bench horus_stack

Run specific benchmark

cargo bench --bench horus_stack -- supervisor_discovery

Run with specific filter

cargo bench --bench horus_stack -- concurrent

Generate detailed reports

cargo bench --bench horus_stack -- --verbose

Benchmark Categories

1. API Discovery & Metadata (`horus_stack`)

supervisor_discovery - OpenRPC metadata retrieval
supervisor_get_info - Supervisor information and stats

2. Runner Management (`horus_stack`)

supervisor_list_runners - List all registered runners
get_all_runner_status - Get status of all runners

3. Job Operations (`horus_stack`)

supervisor_job_create - Create job without execution
supervisor_job_list - List all jobs
job_full_lifecycle - Complete job lifecycle (create → execute → result)

4. Concurrency Tests (`horus_stack`)

concurrent_jobs - Submit multiple jobs concurrently (1, 5, 10, 20 jobs)

5. Health & Monitoring (`horus_stack`)

osiris_health_check - Osiris server health endpoint

6. API Latency (`horus_stack`)

api_latency/supervisor_info - Supervisor info latency
api_latency/runner_list - Runner list latency
api_latency/job_list - Job list latency

7. Stress Tests (`stress_test`)

stress_high_frequency_jobs - High-frequency submissions (50-200 jobs)
stress_sustained_load - Continuous load testing
stress_large_payloads - Large payload handling (1KB-100KB)
stress_rapid_api_calls - Rapid API calls (100 calls/iteration)
stress_mixed_workload - Mixed operation scenarios
stress_connection_pool - Connection pool exhaustion (10-100 clients)

8. Memory Usage (`memory_usage`)

memory_job_creation - Memory per job object (10-200 jobs)
memory_client_creation - Memory per client instance (1-100 clients)
memory_payload_sizes - Memory vs payload size (1KB-1MB)

See MEMORY_BENCHMARKS.md for detailed memory profiling documentation.

Interpreting Results

Criterion outputs detailed statistics including:

Mean time - Average execution time
Std deviation - Variability in measurements
Median - Middle value (50th percentile)
MAD - Median Absolute Deviation
Throughput - Operations per second

Results are saved in target/criterion/ with:

HTML reports with graphs
JSON data for further analysis
Historical comparison with previous runs

Performance Targets

Expected performance (on modern hardware):

Benchmark	Target	Notes
supervisor_discovery	< 10ms	Metadata retrieval
supervisor_get_info	< 5ms	Simple info query
supervisor_list_runners	< 5ms	List operation
supervisor_job_create	< 10ms	Job creation only
job_full_lifecycle	< 100ms	Full execution cycle
osiris_health_check	< 2ms	Health endpoint
concurrent_jobs (10)	< 500ms	10 parallel jobs

Customization

To modify benchmark parameters, edit benches/horus_stack.rs:

// Change URLs
const SUPERVISOR_URL: &str = "http://127.0.0.1:3030";
const OSIRIS_URL: &str = "http://127.0.0.1:8081";

// Change admin secret
const ADMIN_SECRET: &str = "SECRET";

// Adjust concurrent job counts
for num_jobs in [1, 5, 10, 20, 50].iter() {
    // ...
}

CI/CD Integration

To run benchmarks in CI without the full stack:

# Run only fast benchmarks
cargo bench --bench horus_stack -- --quick

# Save baseline for comparison
cargo bench --bench horus_stack -- --save-baseline main

# Compare against baseline
cargo bench --bench horus_stack -- --baseline main

Troubleshooting

"Connection refused" errors

Ensure the Horus stack is running
Check that all services are listening on expected ports
Verify firewall settings

"Job execution timeout" errors

Increase timeout values in benchmark code
Check that runners are properly registered
Verify Redis is accessible

Inconsistent results

Close other applications to reduce system load
Run benchmarks multiple times for statistical significance
Use --warm-up-time flag to increase warm-up period

Adding New Benchmarks

To add a new benchmark:

Create a new function in benches/horus_stack.rs:

fn bench_my_feature(c: &mut Criterion) {
    let rt = create_runtime();
    let client = /* create client */;
    
    c.bench_function("my_feature", |b| {
        b.to_async(&rt).iter(|| async {
            // Your benchmark code
        });
    });
}

Add to the criterion_group:

criterion_group!(
    benches,
    // ... existing benchmarks
    bench_my_feature,
);

README.md

Horus Stack Benchmarks

Overview