upgrade: per-rollout JSONL event log for tail -f #35

Merged
zaelgohary merged 2 commits from development_add_event_log into development 2026-05-25 12:12:11 +00:00
Member

Summary

Operators can now watch a live rollout via ssh herodev tail -f /var/lib/hero_codescalers/event-logs/<upgrade_id>.jsonl instead of polling the RPC. One append-only JSONL file per upgrade, no web/UI dependency.

Changes

  • Added EVENT_LOG_ROOT = /var/lib/hero_codescalers/event-logs/.
  • New emit_event(upgrade_id, json) helper. Single write_all of {...}\n keeps each line atomic under POSIX O_APPEND (< PIPE_BUF), so concurrent cell writers can't interleave bytes. Best-effort: I/O failures only tracing::warn!.
  • Emit points: upgrade.start, snapshot.ready, snapshot.failed, cell.start, cell.finish (covers enqueue_fail, grace, poll-budget, terminal phase), upgrade.cancel, upgrade.terminal (with status + succeeded/failed/total counts).
  • fail_cell now takes username + service so it can self-emit the cell.finish record.

Test Results

91-cell rollout upg_1779708686825_c04b296c produced 185 lines: 1x upgrade.start, 1x snapshot.ready, 91x cell.start, 91x cell.finish (all succeeded), 1x upgrade.terminal. Every line parses as JSON. Earlier 13-cell run caught an atomic-write bug (concurrent writeln! issued two syscalls per line, causing rare two-records-per-line interleaving), fixed by formatting {json}\n then a single write_all, retested clean.

## Summary Operators can now watch a live rollout via `ssh herodev tail -f /var/lib/hero_codescalers/event-logs/<upgrade_id>.jsonl` instead of polling the RPC. One append-only JSONL file per upgrade, no web/UI dependency. ## Changes - Added `EVENT_LOG_ROOT = /var/lib/hero_codescalers/event-logs/`. - New `emit_event(upgrade_id, json)` helper. Single `write_all` of `{...}\n` keeps each line atomic under POSIX `O_APPEND` (< PIPE_BUF), so concurrent cell writers can't interleave bytes. Best-effort: I/O failures only `tracing::warn!`. - Emit points: `upgrade.start`, `snapshot.ready`, `snapshot.failed`, `cell.start`, `cell.finish` (covers enqueue_fail, grace, poll-budget, terminal phase), `upgrade.cancel`, `upgrade.terminal` (with status + succeeded/failed/total counts). - `fail_cell` now takes `username` + `service` so it can self-emit the `cell.finish` record. ## Test Results 91-cell rollout `upg_1779708686825_c04b296c` produced 185 lines: 1x upgrade.start, 1x snapshot.ready, 91x cell.start, 91x cell.finish (all succeeded), 1x upgrade.terminal. Every line parses as JSON. Earlier 13-cell run caught an atomic-write bug (concurrent `writeln!` issued two syscalls per line, causing rare two-records-per-line interleaving), fixed by formatting `{json}\n` then a single `write_all`, retested clean.
zaelgohary merged commit f5e34a434b into development 2026-05-25 12:12:11 +00:00
zaelgohary deleted branch development_add_event_log 2026-05-25 12:12:11 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_codescalers!35
No description provided.