Manifest parse failure panics hero_router on startup (bad fallback) #48

Closed
opened 2026-04-22 14:54:01 +00:00 by rawdaGastan · 5 comments
Member

Category: error-handling / high

What

If ~/.hero/router/manual_sources.json exists but fails to deserialize, hero_router panics and refuses to start.

Where

crates/hero_router/src/main.rs:297-301

let manifest = Arc::new(Mutex::new(
    Manifest::load(&cfg.manifest_path).unwrap_or_else(|_| {
        Manifest::load("/dev/null").unwrap_or_else(|_| panic!("failed to load manifest"))
    }),
));

Why it's wrong

Manifest::load (see manifest.rs:42-54) reads the file with read_to_string then passes the string to serde_json::from_str. The intended fallback — Manifest::load("/dev/null") — does not short-circuit: /dev/null exists, so load reads it (yielding ""), then calls serde_json::from_str(""), which fails with "EOF while parsing a value". The outer unwrap_or_else then panics, taking the whole router down.

Net effect: any corruption, truncation, or schema drift in manual_sources.json crashes the router at startup, with a useless message ("failed to load manifest", no context about which path or what parse error).

Reproduction

mkdir -p ~/.hero/router
echo 'not-json' > ~/.hero/router/manual_sources.json
hero_router start
# thread 'main' panicked at 'failed to load manifest', ...

Suggested fix

Fall back to an in-memory empty manifest pointing at the original path, so the next save() writes a valid file:

let manifest = Arc::new(Mutex::new(
    Manifest::load(&cfg.manifest_path).unwrap_or_else(|e| {
        tracing::warn!(
            path = %cfg.manifest_path.display(),
            error = %e,
            "manifest unreadable; starting with empty manifest"
        );
        Manifest { path: cfg.manifest_path.clone(), sources: Vec::new() }
    }),
));

This requires making Manifest { path, sources } constructable — either expose an empty(path: PathBuf) -> Self constructor, or make the fields pub (they already are for sources; add path too, or add the constructor).

Additional: log the load error

At minimum, even if the fallback is kept, log the original error instead of discarding it — today the |_| arm swallows the parse error entirely, so a user can't see why their manifest didn't load.

## Category: error-handling / **high** ## What If `~/.hero/router/manual_sources.json` exists but fails to deserialize, `hero_router` panics and refuses to start. ## Where [`crates/hero_router/src/main.rs:297-301`](src/main.rs#L297) ```rust let manifest = Arc::new(Mutex::new( Manifest::load(&cfg.manifest_path).unwrap_or_else(|_| { Manifest::load("/dev/null").unwrap_or_else(|_| panic!("failed to load manifest")) }), )); ``` ## Why it's wrong `Manifest::load` (see [`manifest.rs:42-54`](src/manifest.rs#L42)) reads the file with `read_to_string` then passes the string to `serde_json::from_str`. The intended fallback — `Manifest::load("/dev/null")` — does **not** short-circuit: `/dev/null` exists, so `load` reads it (yielding `""`), then calls `serde_json::from_str("")`, which fails with "EOF while parsing a value". The outer `unwrap_or_else` then **panics**, taking the whole router down. Net effect: any corruption, truncation, or schema drift in `manual_sources.json` crashes the router at startup, with a useless message (`"failed to load manifest"`, no context about which path or what parse error). ## Reproduction ```bash mkdir -p ~/.hero/router echo 'not-json' > ~/.hero/router/manual_sources.json hero_router start # thread 'main' panicked at 'failed to load manifest', ... ``` ## Suggested fix Fall back to an **in-memory empty manifest** pointing at the original path, so the next `save()` writes a valid file: ```rust let manifest = Arc::new(Mutex::new( Manifest::load(&cfg.manifest_path).unwrap_or_else(|e| { tracing::warn!( path = %cfg.manifest_path.display(), error = %e, "manifest unreadable; starting with empty manifest" ); Manifest { path: cfg.manifest_path.clone(), sources: Vec::new() } }), )); ``` This requires making `Manifest { path, sources }` constructable — either expose an `empty(path: PathBuf) -> Self` constructor, or make the fields `pub` (they already are for `sources`; add `path` too, or add the constructor). ## Additional: log the load error At minimum, even if the fallback is kept, log the original error instead of discarding it — today the `|_|` arm swallows the parse error entirely, so a user can't see why their manifest didn't load.
rawdaGastan added this to the ACTIVE project 2026-04-22 16:12:43 +00:00
Author
Member

Implementation Spec for Issue #48

Objective

Make hero_router tolerate a malformed manual_sources.json at startup by logging the parse error and falling back to an empty in-memory manifest that still points at the original path.

Requirements

  • A malformed, truncated, or schema-drifted manual_sources.json must not crash the router on startup.
  • The parse failure must be logged via tracing::warn! with structured path and error fields.
  • After the fallback, the in-memory Manifest must retain the original cfg.manifest_path so that the next save() overwrites the bad file with valid JSON.
  • No change to the success path: a valid (or missing) file must still load exactly as today.

Files to Modify/Create

  • crates/hero_router/src/manifest.rs — add a new Manifest::empty(path: PathBuf) -> Self associated constructor, plus unit tests.
  • crates/hero_router/src/main.rs — replace the broken /dev/null fallback at lines 297-301 with a match that logs and falls back to Manifest::empty.

Implementation Plan

Step 1: Add Manifest::empty constructor

Files: crates/hero_router/src/manifest.rs

  • Inside impl Manifest, add:
    /// Construct an empty manifest bound to `path`, without touching disk.
    /// Used as a fallback when `load` fails so the next `save()` overwrites
    /// the offending file at the same location.
    pub fn empty(path: PathBuf) -> Self {
        Self { path, sources: Vec::new() }
    }
    
  • Do not change visibility of the path field; keep encapsulation.
  • Leave all existing methods untouched.

Dependencies: none

Step 2: Replace the panicking fallback at the call site

Files: crates/hero_router/src/main.rs

  • Replace the block at lines 297-301:
    let manifest = Arc::new(Mutex::new(
        Manifest::load(&cfg.manifest_path).unwrap_or_else(|_| {
            Manifest::load("/dev/null").unwrap_or_else(|_| panic!("failed to load manifest"))
        }),
    ));
    
    with:
    let manifest = Arc::new(Mutex::new(match Manifest::load(&cfg.manifest_path) {
        Ok(m) => m,
        Err(err) => {
            tracing::warn!(
                path = %cfg.manifest_path.display(),
                error = %err,
                "failed to load manual_sources manifest; starting with empty manifest (next write will overwrite the bad file)"
            );
            Manifest::empty(cfg.manifest_path.clone())
        }
    }));
    
  • anyhow::Error returned by Manifest::load already carries the full context chain, so %err is sufficient.
  • tracing is already a workspace dependency and already in scope in main.rs — no new imports needed.

Dependencies: Step 1

Step 3: Tests

Files: crates/hero_router/src/manifest.rs (append inside the existing #[cfg(test)] mod tests)

  • Add a test confirming Manifest::empty produces a usable, empty manifest whose save() round-trips via load.
  • Add a test documenting that load fails on malformed JSON, so a future refactor cannot silently swallow parse errors.

Dependencies: Step 1

Acceptance Criteria

  • hero_router starts successfully even if manual_sources.json contains invalid JSON.
  • Parse error is logged via tracing::warn! with path and error fields.
  • After fallback, the in-memory manifest points at the original cfg.manifest_path (so the next add/remove persists correctly).
  • The success path (valid JSON, or missing file) behaves exactly as before.
  • cargo test --workspace passes, including the two new tests.
  • cargo clippy --workspace --all-targets -- -D warnings passes.

Notes

  • Per the Hero branching model, branch off development.
  • The path field of Manifest stays private; the empty(path) constructor keeps encapsulation.
  • No Cargo.toml changes needed — tracing and anyhow already declared.
  • Fix is scoped to startup only; the CLI add/remove subcommands keep their current error-propagating behavior — a user running hero_router add ... against a corrupt file should see the parse error, not silently overwrite it.
## Implementation Spec for Issue #48 ### Objective Make `hero_router` tolerate a malformed `manual_sources.json` at startup by logging the parse error and falling back to an empty in-memory manifest that still points at the original path. ### Requirements - A malformed, truncated, or schema-drifted `manual_sources.json` must not crash the router on startup. - The parse failure must be logged via `tracing::warn!` with structured `path` and `error` fields. - After the fallback, the in-memory `Manifest` must retain the original `cfg.manifest_path` so that the next `save()` overwrites the bad file with valid JSON. - No change to the success path: a valid (or missing) file must still load exactly as today. ### Files to Modify/Create - `crates/hero_router/src/manifest.rs` — add a new `Manifest::empty(path: PathBuf) -> Self` associated constructor, plus unit tests. - `crates/hero_router/src/main.rs` — replace the broken `/dev/null` fallback at lines 297-301 with a `match` that logs and falls back to `Manifest::empty`. ### Implementation Plan #### Step 1: Add `Manifest::empty` constructor Files: `crates/hero_router/src/manifest.rs` - Inside `impl Manifest`, add: ```rust /// Construct an empty manifest bound to `path`, without touching disk. /// Used as a fallback when `load` fails so the next `save()` overwrites /// the offending file at the same location. pub fn empty(path: PathBuf) -> Self { Self { path, sources: Vec::new() } } ``` - Do not change visibility of the `path` field; keep encapsulation. - Leave all existing methods untouched. Dependencies: none #### Step 2: Replace the panicking fallback at the call site Files: `crates/hero_router/src/main.rs` - Replace the block at lines 297-301: ```rust let manifest = Arc::new(Mutex::new( Manifest::load(&cfg.manifest_path).unwrap_or_else(|_| { Manifest::load("/dev/null").unwrap_or_else(|_| panic!("failed to load manifest")) }), )); ``` with: ```rust let manifest = Arc::new(Mutex::new(match Manifest::load(&cfg.manifest_path) { Ok(m) => m, Err(err) => { tracing::warn!( path = %cfg.manifest_path.display(), error = %err, "failed to load manual_sources manifest; starting with empty manifest (next write will overwrite the bad file)" ); Manifest::empty(cfg.manifest_path.clone()) } })); ``` - `anyhow::Error` returned by `Manifest::load` already carries the full context chain, so `%err` is sufficient. - `tracing` is already a workspace dependency and already in scope in `main.rs` — no new imports needed. Dependencies: Step 1 #### Step 3: Tests Files: `crates/hero_router/src/manifest.rs` (append inside the existing `#[cfg(test)] mod tests`) - Add a test confirming `Manifest::empty` produces a usable, empty manifest whose `save()` round-trips via `load`. - Add a test documenting that `load` fails on malformed JSON, so a future refactor cannot silently swallow parse errors. Dependencies: Step 1 ### Acceptance Criteria - [ ] `hero_router` starts successfully even if `manual_sources.json` contains invalid JSON. - [ ] Parse error is logged via `tracing::warn!` with `path` and `error` fields. - [ ] After fallback, the in-memory manifest points at the original `cfg.manifest_path` (so the next `add`/`remove` persists correctly). - [ ] The success path (valid JSON, or missing file) behaves exactly as before. - [ ] `cargo test --workspace` passes, including the two new tests. - [ ] `cargo clippy --workspace --all-targets -- -D warnings` passes. ### Notes - Per the Hero branching model, branch off `development`. - The `path` field of `Manifest` stays private; the `empty(path)` constructor keeps encapsulation. - No `Cargo.toml` changes needed — `tracing` and `anyhow` already declared. - Fix is scoped to startup only; the CLI `add`/`remove` subcommands keep their current error-propagating behavior — a user running `hero_router add ...` against a corrupt file should see the parse error, not silently overwrite it.
Author
Member

Scope update

While implementing the approved spec, the same class of broken fallback was found at two additional sites in main.rs:

  • cmd_list (was lines 552-553): Manifest::load(&cfg.manifest_path).unwrap_or_else(|_| Manifest::load_default().unwrap_or_else(|_| panic!("manifest error")))
  • cmd_scan (was lines 601-602): same pattern.

The inner Manifest::load_default() loads ~/.hero/router/manual_sources.json, which is the same file cfg.manifest_path points at for default installs — so a corrupt manifest makes hero_router list and hero_router scan panic too.

Since both are read-only commands, the fix is identical to the startup path. To avoid three near-identical match blocks, the implementation was refactored to a single module-level helper:

/// Load the manual-sources manifest, or fall back to an empty one at the same
/// path if the file is unreadable or malformed. A corrupt manifest must not
/// crash read-only commands (`start` / `list` / `scan`); the next persistence
/// call will overwrite the bad file.
fn load_manifest_or_empty(path: &std::path::Path) -> Manifest { ... }

All three call sites (cmd_start, cmd_list, cmd_scan) now go through load_manifest_or_empty.

The mutating CLI commands (cmd_add, cmd_remove) retain their current ?-propagating behavior — a user running those against a corrupt file should see the parse error, not silently overwrite it.

Updated acceptance criteria (added)

  • hero_router list succeeds on a corrupt manual_sources.json.
  • hero_router scan succeeds on a corrupt manual_sources.json.
  • hero_router add / hero_router remove still fail loudly on a corrupt file (no behavior change).
## Scope update While implementing the approved spec, the same class of broken fallback was found at two additional sites in `main.rs`: - `cmd_list` (was lines 552-553): `Manifest::load(&cfg.manifest_path).unwrap_or_else(|_| Manifest::load_default().unwrap_or_else(|_| panic!("manifest error")))` - `cmd_scan` (was lines 601-602): same pattern. The inner `Manifest::load_default()` loads `~/.hero/router/manual_sources.json`, which is the **same file** `cfg.manifest_path` points at for default installs — so a corrupt manifest makes `hero_router list` and `hero_router scan` panic too. Since both are read-only commands, the fix is identical to the startup path. To avoid three near-identical match blocks, the implementation was refactored to a single module-level helper: ```rust /// Load the manual-sources manifest, or fall back to an empty one at the same /// path if the file is unreadable or malformed. A corrupt manifest must not /// crash read-only commands (`start` / `list` / `scan`); the next persistence /// call will overwrite the bad file. fn load_manifest_or_empty(path: &std::path::Path) -> Manifest { ... } ``` All three call sites (`cmd_start`, `cmd_list`, `cmd_scan`) now go through `load_manifest_or_empty`. The mutating CLI commands (`cmd_add`, `cmd_remove`) retain their current `?`-propagating behavior — a user running those against a corrupt file should see the parse error, not silently overwrite it. ### Updated acceptance criteria (added) - [ ] `hero_router list` succeeds on a corrupt `manual_sources.json`. - [ ] `hero_router scan` succeeds on a corrupt `manual_sources.json`. - [ ] `hero_router add` / `hero_router remove` still fail loudly on a corrupt file (no behavior change).
Author
Member

Test Results

  • Total: 73
  • Passed: 73
  • Failed: 0
  • Ignored: 0

Lint

  • cargo clippy --workspace --all-targets -- -D warnings: pass
  • cargo fmt --all --check: pass

New tests added in this PR

  • manifest::tests::empty_then_save_writes_valid_file_at_given_path
  • manifest::tests::load_fails_on_malformed_json
## Test Results - Total: 73 - Passed: 73 - Failed: 0 - Ignored: 0 ### Lint - `cargo clippy --workspace --all-targets -- -D warnings`: pass - `cargo fmt --all --check`: pass ### New tests added in this PR - `manifest::tests::empty_then_save_writes_valid_file_at_given_path` - `manifest::tests::load_fails_on_malformed_json`
Author
Member

Implementation Summary

Closing #48 via PR (to be linked in a follow-up comment).

Changes

crates/hero_router/src/manifest.rs

  • Added Manifest::empty(path: PathBuf) -> Self — constructs an empty manifest in-memory, bound to the given path, without touching disk.
  • Added #[derive(Debug)] on Manifest (required by the new load_fails_on_malformed_json unit test's unwrap_err()).

crates/hero_router/src/main.rs

  • Added private helper load_manifest_or_empty(path: &Path) -> Manifest at module scope above cmd_list. On parse/read failure it logs via tracing::warn! with structured path and error fields, then returns Manifest::empty(path.to_path_buf()).
  • Replaced three identical unwrap_or_else(|_| panic!(...)) fallbacks with a single call to the helper:
    • build_and_start (startup path) — was lines 297-301 using a /dev/null fallback that itself panicked.
    • cmd_list — was lines 552-555 using a load_default() fallback that pointed at the same file.
    • cmd_scan — was lines 601-604 using the same pattern as cmd_list.

The mutating CLI commands (cmd_add, cmd_remove) retain their ?-propagating behavior — a user running those against a corrupt manifest should see the parse error, not silently overwrite it.

Why not /dev/null as the fallback?

  1. On Linux it exists, so Manifest::load passes path.exists(), reads "", and serde_json::from_str("") fails — the fallback itself fails, causing the outer panic.
  2. Even if loaded, the resulting Manifest.path would be /dev/null, so any later save() would silently discard user data.
  3. Does not exist on Windows.
  4. Manifest::empty(path) avoids disk I/O entirely and preserves the correct persistence target, so the next save() overwrites the bad file.

Tests added

  • manifest::tests::empty_then_save_writes_valid_file_at_given_path — confirms Manifest::empty roundtrips through add() + save() + Manifest::load.
  • manifest::tests::load_fails_on_malformed_json — documents the failure mode so a future refactor cannot silently swallow parse errors.

End-to-end reproduction

Ran the exact scenario from the issue body against the fixed binary:

$ echo 'not-json' > ~/.hero/router/manual_sources.json

$ hero_router list
WARN hero_router: failed to load manual_sources manifest; starting with empty manifest
  path=/home/rawda/.hero/router/manual_sources.json
  error=Failed to parse manifest: /home/rawda/.hero/router/manual_sources.json
(exit 0, table rendered normally)

$ hero_router remove some-id
Error: Failed to parse manifest: /home/rawda/.hero/router/manual_sources.json
Caused by:
    expected ident at line 1 column 2
(exit 1, fails loudly as designed)

Test results

  • cargo test --workspace — 73 passed / 0 failed / 0 ignored.
  • cargo clippy --workspace --all-targets -- -D warnings — clean.
  • cargo fmt --all --check — clean.
## Implementation Summary Closing #48 via PR (to be linked in a follow-up comment). ### Changes **`crates/hero_router/src/manifest.rs`** - Added `Manifest::empty(path: PathBuf) -> Self` — constructs an empty manifest in-memory, bound to the given path, without touching disk. - Added `#[derive(Debug)]` on `Manifest` (required by the new `load_fails_on_malformed_json` unit test's `unwrap_err()`). **`crates/hero_router/src/main.rs`** - Added private helper `load_manifest_or_empty(path: &Path) -> Manifest` at module scope above `cmd_list`. On parse/read failure it logs via `tracing::warn!` with structured `path` and `error` fields, then returns `Manifest::empty(path.to_path_buf())`. - Replaced three identical `unwrap_or_else(|_| panic!(...))` fallbacks with a single call to the helper: - `build_and_start` (startup path) — was lines 297-301 using a `/dev/null` fallback that itself panicked. - `cmd_list` — was lines 552-555 using a `load_default()` fallback that pointed at the same file. - `cmd_scan` — was lines 601-604 using the same pattern as `cmd_list`. The mutating CLI commands (`cmd_add`, `cmd_remove`) retain their `?`-propagating behavior — a user running those against a corrupt manifest should see the parse error, not silently overwrite it. ### Why not `/dev/null` as the fallback? 1. On Linux it exists, so `Manifest::load` passes `path.exists()`, reads `""`, and `serde_json::from_str("")` fails — the fallback itself fails, causing the outer panic. 2. Even if loaded, the resulting `Manifest.path` would be `/dev/null`, so any later `save()` would silently discard user data. 3. Does not exist on Windows. 4. `Manifest::empty(path)` avoids disk I/O entirely and preserves the correct persistence target, so the next `save()` overwrites the bad file. ### Tests added - `manifest::tests::empty_then_save_writes_valid_file_at_given_path` — confirms `Manifest::empty` roundtrips through `add()` + `save()` + `Manifest::load`. - `manifest::tests::load_fails_on_malformed_json` — documents the failure mode so a future refactor cannot silently swallow parse errors. ### End-to-end reproduction Ran the exact scenario from the issue body against the fixed binary: ``` $ echo 'not-json' > ~/.hero/router/manual_sources.json $ hero_router list WARN hero_router: failed to load manual_sources manifest; starting with empty manifest path=/home/rawda/.hero/router/manual_sources.json error=Failed to parse manifest: /home/rawda/.hero/router/manual_sources.json (exit 0, table rendered normally) $ hero_router remove some-id Error: Failed to parse manifest: /home/rawda/.hero/router/manual_sources.json Caused by: expected ident at line 1 column 2 (exit 1, fails loudly as designed) ``` ### Test results - `cargo test --workspace` — 73 passed / 0 failed / 0 ignored. - `cargo clippy --workspace --all-targets -- -D warnings` — clean. - `cargo fmt --all --check` — clean.
Author
Member

Pull request opened: #53

This PR implements the changes discussed in this issue.

Pull request opened: https://forge.ourworld.tf/lhumina_code/hero_router/pulls/53 This PR implements the changes discussed in this issue.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_router#48
No description provided.