[bug] hero_voice Rustpotter wake-word detector hard-disabled by candle-core 0.2.2 conflict #23

Closed
opened 2026-05-01 04:09:30 +00:00 by mik-tf · 1 comment
Owner

Summary

The Rustpotter wake-word detector is hard-disabled stub in hero_voice due to a candle-core 0.2.2 dependency conflict with the rest of the workspace. The only working wake path is a fragile fallback: WebSocket Listen mode that VAD-segments microphone input and substring-matches "hey hero" on Whisper STT output. This blocks the Ambient AI roadmap (hero_agent#16).

Source

  • hero_voice/.../wakeword.rs — Rustpotter integration is conditionally compiled out / always returns the stub.
  • hero_voice/.../ws.rs:389 — substring-match "hey hero" on Whisper transcription is the only live wake path.
  • Neither STT/TTS/wake is exposed via OpenRPC (the OpenRPC surface is purely Topic/Folder CRUD today).

Why the fallback is fragile

  • Whisper has to fully transcribe before the substring match runs → high latency, ~500-1500ms more than a dedicated detector.
  • False positives on any phrase that includes "hey" + "hero"-rhyming words.
  • Needs full microphone audio + STT pipeline running constantly → CPU + power cost vs a tiny dedicated detector.
  • Not exposed as a tool the agent or other services can subscribe to.

Proposed fix (pick one)

Option A: upgrade candle-core. Find the candle-core version range that's compatible with the rest of the workspace and unstub Rustpotter. Cleanest but may cascade into other dep upgrades.

Option B: alternative detector. Pick a different wake-word library that doesn't depend on candle-core. Candidates worth evaluating: porcupine (Picovoice; commercial license), openWakeWord (ONNX-based), simple keyword spotting with whisper-base on a short audio window.

Option C: live with the substring fallback but expose it cleanly via OpenRPC + MCP so other services can subscribe. Doesn't fix the latency / false-positive cost.

Severity

Medium. Not a deploy blocker (substring fallback works), but the Ambient AI vision in hero_agent#16 and hero_demo#52 leans on responsive wake — the substring path doesn't get there.

Cross-refs

Spotted during docs_hero Phase 1 source-grounded read (session 52). Reconciliation memo: memory/investigation_roadmap_reconciliation.md.

## Summary The Rustpotter wake-word detector is **hard-disabled stub** in `hero_voice` due to a `candle-core 0.2.2` dependency conflict with the rest of the workspace. The only working wake path is a fragile fallback: WebSocket `Listen` mode that VAD-segments microphone input and substring-matches `"hey hero"` on Whisper STT output. This blocks the Ambient AI roadmap ([hero_agent#16](https://forge.ourworld.tf/lhumina_code/hero_agent/issues/16)). ## Source - `hero_voice/.../wakeword.rs` — Rustpotter integration is conditionally compiled out / always returns the stub. - `hero_voice/.../ws.rs:389` — substring-match `"hey hero"` on Whisper transcription is the only live wake path. - Neither STT/TTS/wake is exposed via OpenRPC (the OpenRPC surface is purely Topic/Folder CRUD today). ## Why the fallback is fragile - Whisper has to fully transcribe before the substring match runs → high latency, ~500-1500ms more than a dedicated detector. - False positives on any phrase that includes "hey" + "hero"-rhyming words. - Needs full microphone audio + STT pipeline running constantly → CPU + power cost vs a tiny dedicated detector. - Not exposed as a tool the agent or other services can subscribe to. ## Proposed fix (pick one) **Option A: upgrade `candle-core`.** Find the candle-core version range that's compatible with the rest of the workspace and unstub Rustpotter. Cleanest but may cascade into other dep upgrades. **Option B: alternative detector.** Pick a different wake-word library that doesn't depend on candle-core. Candidates worth evaluating: `porcupine` (Picovoice; commercial license), `openWakeWord` (ONNX-based), simple keyword spotting with whisper-base on a short audio window. **Option C: live with the substring fallback** but expose it cleanly via OpenRPC + MCP so other services can subscribe. Doesn't fix the latency / false-positive cost. ## Severity Medium. Not a deploy blocker (substring fallback works), but the Ambient AI vision in [hero_agent#16](https://forge.ourworld.tf/lhumina_code/hero_agent/issues/16) and [hero_demo#52](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/52) leans on responsive wake — the substring path doesn't get there. ## Cross-refs - [hero_agent#16 — Ambient AI](https://forge.ourworld.tf/lhumina_code/hero_agent/issues/16) (depends on this) - [hero_demo#52 — vision](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/52) - TTS expectation note: TTS is Kokoro-only; the "Groq fallback" applies to STT only (relevant background for Ambient AI scoping) Spotted during docs_hero Phase 1 source-grounded read (session 52). Reconciliation memo: `memory/investigation_roadmap_reconciliation.md`.
Owner

We can use Sherpa ONNX keyword detector for wake words. It runs in the browser via WASM and already has an implementation on this repo.

We can use Sherpa ONNX keyword detector for wake words. It runs in the browser via WASM and already has an implementation on this repo.
scott closed this issue 2026-05-27 16:24:29 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_voice#23
No description provided.