add complete binary and benchmarking
This commit is contained in:
217
benches/MEMORY_BENCHMARKS.md
Normal file
217
benches/MEMORY_BENCHMARKS.md
Normal file
@@ -0,0 +1,217 @@
|
||||
# Memory Usage Benchmarks
|
||||
|
||||
Benchmarks for measuring memory consumption of the Horus stack components.
|
||||
|
||||
## Overview
|
||||
|
||||
The memory benchmarks measure heap memory usage for various operations:
|
||||
- Job creation and storage
|
||||
- Client instantiation
|
||||
- Payload size impact
|
||||
- Memory growth under load
|
||||
|
||||
## Benchmarks
|
||||
|
||||
### 1. `memory_job_creation`
|
||||
Measures memory usage when creating multiple Job objects in memory.
|
||||
|
||||
**Test sizes**: 10, 50, 100, 200 jobs
|
||||
|
||||
**What it measures**:
|
||||
- Memory allocated per job object
|
||||
- Heap growth with increasing job count
|
||||
- Memory efficiency of Job structure
|
||||
|
||||
**Expected results**:
|
||||
- Linear memory growth with job count
|
||||
- ~1-2 KB per job object (depending on payload)
|
||||
|
||||
### 2. `memory_client_creation`
|
||||
Measures memory overhead of creating multiple Supervisor client instances.
|
||||
|
||||
**Test sizes**: 1, 10, 50, 100 clients
|
||||
|
||||
**What it measures**:
|
||||
- Memory per client instance
|
||||
- Connection pool overhead
|
||||
- HTTP client memory footprint
|
||||
|
||||
**Expected results**:
|
||||
- ~10-50 KB per client instance
|
||||
- Includes HTTP client, connection pools, and buffers
|
||||
|
||||
### 3. `memory_payload_sizes`
|
||||
Measures memory usage with different payload sizes.
|
||||
|
||||
**Test sizes**: 1KB, 10KB, 100KB, 1MB payloads
|
||||
|
||||
**What it measures**:
|
||||
- Memory overhead of JSON serialization
|
||||
- String allocation costs
|
||||
- Payload storage efficiency
|
||||
|
||||
**Expected results**:
|
||||
- Memory usage should scale linearly with payload size
|
||||
- Small overhead for JSON structure (~5-10%)
|
||||
|
||||
## Running Memory Benchmarks
|
||||
|
||||
```bash
|
||||
# Run all memory benchmarks
|
||||
cargo bench --bench memory_usage
|
||||
|
||||
# Run specific memory test
|
||||
cargo bench --bench memory_usage -- memory_job_creation
|
||||
|
||||
# Run with verbose output to see memory deltas
|
||||
cargo bench --bench memory_usage -- --verbose
|
||||
```
|
||||
|
||||
## Interpreting Results
|
||||
|
||||
The benchmarks print memory deltas to stderr during execution:
|
||||
|
||||
```
|
||||
Memory delta for 100 jobs: 156 KB
|
||||
Memory delta for 50 clients: 2048 KB
|
||||
Memory delta for 100KB payload: 105 KB
|
||||
```
|
||||
|
||||
### Memory Delta Interpretation
|
||||
|
||||
- **Positive delta**: Memory was allocated during the operation
|
||||
- **Zero delta**: No significant memory change (may be reusing existing allocations)
|
||||
- **Negative delta**: Memory was freed (garbage collection, deallocations)
|
||||
|
||||
### Platform Differences
|
||||
|
||||
**macOS**: Uses `ps` command to read RSS (Resident Set Size)
|
||||
**Linux**: Reads `/proc/self/status` for VmRSS
|
||||
|
||||
RSS includes:
|
||||
- Heap allocations
|
||||
- Stack memory
|
||||
- Shared libraries (mapped into process)
|
||||
- Memory-mapped files
|
||||
|
||||
## Limitations
|
||||
|
||||
1. **Granularity**: OS-level memory reporting may not capture small allocations
|
||||
2. **Timing**: Memory measurements happen before/after operations, not continuously
|
||||
3. **GC effects**: Rust's allocator may not immediately release memory to OS
|
||||
4. **Shared memory**: RSS includes shared library memory
|
||||
|
||||
## Best Practices
|
||||
|
||||
### For Accurate Measurements
|
||||
|
||||
1. **Run multiple iterations**: Criterion handles this automatically
|
||||
2. **Warm up**: First iterations may show higher memory due to lazy initialization
|
||||
3. **Isolate tests**: Run memory benchmarks separately from performance benchmarks
|
||||
4. **Monitor trends**: Compare results over time, not absolute values
|
||||
|
||||
### Memory Optimization Tips
|
||||
|
||||
If benchmarks show high memory usage:
|
||||
|
||||
1. **Check payload sizes**: Large payloads consume proportional memory
|
||||
2. **Limit concurrent operations**: Too many simultaneous jobs/clients increase memory
|
||||
3. **Review data structures**: Ensure efficient serialization
|
||||
4. **Profile with tools**: Use `heaptrack` (Linux) or `instruments` (macOS) for detailed analysis
|
||||
|
||||
## Advanced Profiling
|
||||
|
||||
For detailed memory profiling beyond these benchmarks:
|
||||
|
||||
### macOS
|
||||
```bash
|
||||
# Use Instruments
|
||||
instruments -t Allocations -D memory_trace.trace ./target/release/horus
|
||||
|
||||
# Use heap profiler
|
||||
cargo install cargo-instruments
|
||||
cargo instruments --bench memory_usage --template Allocations
|
||||
```
|
||||
|
||||
### Linux
|
||||
```bash
|
||||
# Use Valgrind massif
|
||||
valgrind --tool=massif --massif-out-file=massif.out \
|
||||
./target/release/deps/memory_usage-*
|
||||
|
||||
# Visualize with massif-visualizer
|
||||
massif-visualizer massif.out
|
||||
|
||||
# Use heaptrack
|
||||
heaptrack ./target/release/deps/memory_usage-*
|
||||
heaptrack_gui heaptrack.memory_usage.*.gz
|
||||
```
|
||||
|
||||
### Cross-platform
|
||||
```bash
|
||||
# Use dhat (heap profiler)
|
||||
cargo install dhat
|
||||
# Add dhat to your benchmark and run
|
||||
cargo bench --bench memory_usage --features dhat-heap
|
||||
```
|
||||
|
||||
## Continuous Monitoring
|
||||
|
||||
Integrate memory benchmarks into CI/CD:
|
||||
|
||||
```bash
|
||||
# Run and save baseline
|
||||
cargo bench --bench memory_usage -- --save-baseline memory-main
|
||||
|
||||
# Compare in PR
|
||||
cargo bench --bench memory_usage -- --baseline memory-main
|
||||
|
||||
# Fail if memory usage increases >10%
|
||||
# (requires custom scripting to parse Criterion output)
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Memory delta is always 0"
|
||||
- OS may not update RSS immediately
|
||||
- Allocations might be too small to measure
|
||||
- Try increasing iteration count or operation size
|
||||
|
||||
### "Memory keeps growing"
|
||||
- Check for memory leaks
|
||||
- Verify objects are being dropped
|
||||
- Use `cargo clippy` to find potential issues
|
||||
|
||||
### "Results are inconsistent"
|
||||
- Other processes may be affecting measurements
|
||||
- Run benchmarks on idle system
|
||||
- Increase sample size in benchmark code
|
||||
|
||||
## Example Output
|
||||
|
||||
```
|
||||
memory_job_creation/10 time: [45.2 µs 46.1 µs 47.3 µs]
|
||||
Memory delta for 10 jobs: 24 KB
|
||||
|
||||
memory_job_creation/50 time: [198.4 µs 201.2 µs 204.8 µs]
|
||||
Memory delta for 50 jobs: 98 KB
|
||||
|
||||
memory_job_creation/100 time: [387.6 µs 392.1 µs 397.4 µs]
|
||||
Memory delta for 100 jobs: 187 KB
|
||||
|
||||
memory_client_creation/1 time: [234.5 µs 238.2 µs 242.6 µs]
|
||||
Memory delta for 1 clients: 45 KB
|
||||
|
||||
memory_payload_sizes/1KB time: [12.3 µs 12.6 µs 13.0 µs]
|
||||
Memory delta for 1KB payload: 2 KB
|
||||
|
||||
memory_payload_sizes/100KB time: [156.7 µs 159.4 µs 162.8 µs]
|
||||
Memory delta for 100KB payload: 105 KB
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Performance Benchmarks](./README.md)
|
||||
- [Stress Tests](./README.md#stress-tests)
|
||||
- [Rust Performance Book](https://nnethercote.github.io/perf-book/)
|
||||
- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)
|
||||
Reference in New Issue
Block a user