lhumina_code/home

Fork 0

Hero Agent v0.7.x: fix read aloud, conversations, convo mode + cross-browser voice #80

New issue

Closed

opened 2026-03-23 16:25:18 +00:00 by mik-tf · 5 comments

mik-tf commented

2026-03-23 16:25:18 +00:00

Owner

Context

v0.7.0-dev is deployed on herodev.gent04.grid.tf. Core features work (SSE chat, STT, MCP 62 tools, system prompt, 5 aibroker models, integration tests 20/20). But voice UI and conversation persistence have browser-level issues that need fixing.

What Works (v0.7.0-dev)

SSE streaming chat via aibroker (claude-sonnet-4.5 default)
Voice input STT (Groq Whisper + ffmpeg)
MCP tools discovered (62 tools including 5 mcp_hero)
System prompt with Hero OS context
Skills tab with execution stats
OpenRPC spec at /hero_agent/openrpc.json
uv + python3 for MCP execute_code
Integration test suite (20/20)
Semver deploy pipeline with releases on forge
Voice button UI (Read, Wake, Convo) — visible in both themes
Voice selector dropdown (6 voices, persisted in localStorage)

What Needs Fixing

Level 1: Browser user gesture issues (BLOCKING)

Browsers require speechSynthesis.speak() and new AudioContext() to be called in direct response to a user click. Dioxus spawn(async { document::eval(...) }) runs OUTSIDE the click context — browsers silently block it.

Console error: The AudioContext was not allowed to start. It must be resumed (or created) after a user gesture on the page.

Fix read aloud: Can't use Dioxus async eval for speechSynthesis. Options:
- (a) JS-side MutationObserver: set a global window._heroAutoRead = true flag on Read click, add a MutationObserver that watches for new AI message bubbles and auto-speaks them
- (b) dangerous_inner_html with raw <button onclick="..."> for the Read toggle
- (c) Pre-create speechSynthesis and AudioContext in a top-level JS <script> that listens for custom events
- Recommended: option (a) — cleanest, no Dioxus workarounds needed
Fix per-message speaker icon: Should work since it IS a direct click → test, may just need resp.ok check
Fix Convo AudioContext: Same issue — create AudioContext inside direct JS onclick, not Dioxus async

Level 2: Conversation persistence

POST /api/conversations returns 405: Route exists for GET (list) but POST (create) not registered. Add POST handler in hero_agent routes.rs
Conversation list loading: Fixed in v0.7.0-dev (unwrap {"conversations":[...]} wrapper) — verify it works
Conversation restore on page navigation: localStorage saves conversation ID, but use_effect restore may have race conditions
Long-term: Conversations should be stored in OSIS (issue #45), not just hero_agent SQLite

Level 3: Cross-browser voice (Phase 2 from issue #78)

Server-side wake word via rustpotter: Add to hero_voice, detect "Hero" keyword, send {"type":"wake_word"} via WebSocket. Works on ALL browsers.
Local Whisper STT via ONNX: Add ort crate to hero_voice, export Whisper tiny to ONNX, fallback chain: local → Groq cloud
AudioWorkletNode: Replace deprecated ScriptProcessorNode in Convo mode JS

Level 4: Server-side TTS (nice to have)

Add TTS model to aibroker (OpenAI TTS via OpenRouter)
hero_agent /api/voice/tts returns audio instead of 404
Better voice quality than browser speechSynthesis

Key Technical Insight

Dioxus async eval cannot satisfy browser user gesture requirements.

The pattern onclick → spawn(async { document::eval("speechSynthesis.speak(...)") }) does NOT work because:

Dioxus onclick triggers a Rust closure
spawn() schedules an async task
document::eval() calls JS via WASM bridge
By this point, the browser no longer considers it a "user gesture"

The fix is to keep audio initialization in pure JS triggered by DOM events, not through the Dioxus→WASM→JS bridge.

Files to modify

File	Changes
`hero_archipelagos/.../ai/src/island.rs`	Read aloud (JS MutationObserver), Convo (AudioContext in JS onclick)
`hero_archipelagos/.../ai/src/views/message_bubble.rs`	Per-message speaker (verify works)
`hero_archipelagos/.../ai/src/services/ai_service.rs`	Conversation API fixes
`hero_agent/.../routes.rs`	POST /api/conversations handler
`hero_voice/.../audio.rs`	Rustpotter wake word (Level 3)
`hero_voice/.../ws.rs`	Wake word WebSocket message (Level 3)

Repos involved

hero_archipelagos (voice UI fixes)
hero_agent (conversation API)
hero_voice (Phase 2: rustpotter + local Whisper)
hero_services (Dockerfile if needed)

Build & test

make dist-clean-wasm          # island UI changes
TAG=local make pack
# docker run -d -p 9090:6666 ...
make test-local               # integration tests (20/20)
# Test in browser: both dark + light mode, Brave + Chrome
# Then deploy:
TAG=0.7.x-dev make pack && deploy
make test-integration ENV=herodev

Priority order

Read aloud (JS MutationObserver approach)
Create conversation POST handler
Per-message speaker verification
Convo AudioContext fix
Phase 2: rustpotter + local Whisper (issue #78)
Server-side TTS (nice to have)

## Context v0.7.0-dev is deployed on herodev.gent04.grid.tf. Core features work (SSE chat, STT, MCP 62 tools, system prompt, 5 aibroker models, integration tests 20/20). But voice UI and conversation persistence have browser-level issues that need fixing. ## What Works (v0.7.0-dev) - [x] SSE streaming chat via aibroker (claude-sonnet-4.5 default) - [x] Voice input STT (Groq Whisper + ffmpeg) - [x] MCP tools discovered (62 tools including 5 mcp_hero) - [x] System prompt with Hero OS context - [x] Skills tab with execution stats - [x] OpenRPC spec at /hero_agent/openrpc.json - [x] uv + python3 for MCP execute_code - [x] Integration test suite (20/20) - [x] Semver deploy pipeline with releases on forge - [x] Voice button UI (Read, Wake, Convo) — visible in both themes - [x] Voice selector dropdown (6 voices, persisted in localStorage) ## What Needs Fixing ### Level 1: Browser user gesture issues (BLOCKING) Browsers require `speechSynthesis.speak()` and `new AudioContext()` to be called in direct response to a user click. Dioxus `spawn(async { document::eval(...) })` runs OUTSIDE the click context — browsers silently block it. **Console error**: `The AudioContext was not allowed to start. It must be resumed (or created) after a user gesture on the page.` - [ ] **Fix read aloud**: Can't use Dioxus async eval for speechSynthesis. Options: - (a) JS-side MutationObserver: set a global `window._heroAutoRead = true` flag on Read click, add a MutationObserver that watches for new AI message bubbles and auto-speaks them - (b) `dangerous_inner_html` with raw `<button onclick="...">` for the Read toggle - (c) Pre-create speechSynthesis and AudioContext in a top-level JS `<script>` that listens for custom events - **Recommended: option (a)** — cleanest, no Dioxus workarounds needed - [ ] **Fix per-message speaker icon**: Should work since it IS a direct click → test, may just need `resp.ok` check - [ ] **Fix Convo AudioContext**: Same issue — create AudioContext inside direct JS onclick, not Dioxus async ### Level 2: Conversation persistence - [ ] **POST /api/conversations returns 405**: Route exists for GET (list) but POST (create) not registered. Add POST handler in hero_agent routes.rs - [ ] **Conversation list loading**: Fixed in v0.7.0-dev (unwrap `{"conversations":[...]}` wrapper) — verify it works - [ ] **Conversation restore on page navigation**: localStorage saves conversation ID, but `use_effect` restore may have race conditions - [ ] **Long-term**: Conversations should be stored in OSIS (issue #45), not just hero_agent SQLite ### Level 3: Cross-browser voice (Phase 2 from issue #78) - [ ] **Server-side wake word via rustpotter**: Add to hero_voice, detect "Hero" keyword, send `{"type":"wake_word"}` via WebSocket. Works on ALL browsers. - [ ] **Local Whisper STT via ONNX**: Add `ort` crate to hero_voice, export Whisper tiny to ONNX, fallback chain: local → Groq cloud - [ ] **AudioWorkletNode**: Replace deprecated `ScriptProcessorNode` in Convo mode JS ### Level 4: Server-side TTS (nice to have) - [ ] Add TTS model to aibroker (OpenAI TTS via OpenRouter) - [ ] hero_agent `/api/voice/tts` returns audio instead of 404 - [ ] Better voice quality than browser speechSynthesis ## Key Technical Insight **Dioxus async eval cannot satisfy browser user gesture requirements.** The pattern `onclick → spawn(async { document::eval("speechSynthesis.speak(...)") })` does NOT work because: 1. Dioxus `onclick` triggers a Rust closure 2. `spawn()` schedules an async task 3. `document::eval()` calls JS via WASM bridge 4. By this point, the browser no longer considers it a "user gesture" The fix is to keep audio initialization in **pure JS** triggered by DOM events, not through the Dioxus→WASM→JS bridge. ## Files to modify | File | Changes | |------|---------| | `hero_archipelagos/.../ai/src/island.rs` | Read aloud (JS MutationObserver), Convo (AudioContext in JS onclick) | | `hero_archipelagos/.../ai/src/views/message_bubble.rs` | Per-message speaker (verify works) | | `hero_archipelagos/.../ai/src/services/ai_service.rs` | Conversation API fixes | | `hero_agent/.../routes.rs` | POST /api/conversations handler | | `hero_voice/.../audio.rs` | Rustpotter wake word (Level 3) | | `hero_voice/.../ws.rs` | Wake word WebSocket message (Level 3) | ## Repos involved - hero_archipelagos (voice UI fixes) - hero_agent (conversation API) - hero_voice (Phase 2: rustpotter + local Whisper) - hero_services (Dockerfile if needed) ## Build & test ```bash make dist-clean-wasm # island UI changes TAG=local make pack # docker run -d -p 9090:6666 ... make test-local # integration tests (20/20) # Test in browser: both dark + light mode, Brave + Chrome # Then deploy: TAG=0.7.x-dev make pack && deploy make test-integration ENV=herodev ``` ## Priority order 1. Read aloud (JS MutationObserver approach) 2. Create conversation POST handler 3. Per-message speaker verification 4. Convo AudioContext fix 5. Phase 2: rustpotter + local Whisper (issue #78) 6. Server-side TTS (nice to have)

mik-tf referenced this issue

2026-03-23 16:25:32 +00:00

Voice AI — full implementation strategy (hero_agent + hero_voice integration) #74

mik-tf referenced this issue

2026-03-23 16:26:10 +00:00

Comprehensive Hero ecosystem docs update (consolidates #42, #15) #81

mik-tf commented

2026-03-23 16:54:03 +00:00

Author

Owner

Status: Work in Progress

Technical Decision: Pure JS Event Delegation (not MutationObserver or pre-warm hacks)

After assessing all approaches against production standards (clean code, future-proof, industry standard, secure):

Approach	Verdict	Why
Pre-warm / silent utterance	❌ Rejected	Hack — browsers actively close these loopholes. Chrome 117+ already tightened autoplay. Breaks silently on updates.
MutationObserver	❌ Rejected	Over-engineered — event delegation handles dynamic elements without DOM observation overhead.
Pure JS event delegation	✅ Selected	Industry standard (YouTube, Discord, Spotify Web all use this). Works with browser security model, not against it. Spec-intended pattern — will never break.

Architecture: Separation of Concerns

Dioxus/WASM → UI state, rendering, data attributes
JS event delegation → browser audio APIs (speechSynthesis, AudioContext, audio.play())
Communication → Dioxus sets data-* attributes on elements, JS reads them on click

Key pattern:

// Delegated handler — works for dynamically added elements
document.addEventListener('click', (e) => {
    const btn = e.target.closest('[data-read-aloud]');
    if (btn) {
        // Gesture context preserved — browser allows audio APIs
        const ctx = new AudioContext();
        // Server TTS or browser speechSynthesis here
    }
});

For server TTS with slow responses: create AudioContext at click time (gesture valid), then fetch audio and decode — AudioContext stays valid after creation.

Deliverables

Level 1 — Browser gesture fixes (this PR):

Read aloud: delegated JS click on [data-read-aloud] buttons → server TTS with speechSynthesis fallback
Convo mode: create/resume AudioContext in JS onclick of convo toggle, store globally, WASM references but never creates
Per-message speaker icon: verify rendering + click chain end-to-end
Audio autoplay after TTS: use AudioContext created at gesture time — no new Audio() autoplay needed

Level 2 — Backend (this PR):

Add POST /api/conversations handler in hero_agent
Verify conversation list loading + persistence

Out of scope (issue #78):

Server-side wake word (rustpotter)
Local Whisper STT
AudioWorkletNode replacement

Repos touched

hero_archipelagos — AI island JS + message bubble + input area
hero_agent — conversation POST endpoint

Build plan

make dist-clean-wasm → make test-local (20/20) → squash merge → deploy v0.7.1-dev

Signed-off-by: mik-tf

## Status: Work in Progress ### Technical Decision: Pure JS Event Delegation (not MutationObserver or pre-warm hacks) After assessing all approaches against production standards (clean code, future-proof, industry standard, secure): | Approach | Verdict | Why | |----------|---------|-----| | **Pre-warm / silent utterance** | ❌ Rejected | Hack — browsers actively close these loopholes. Chrome 117+ already tightened autoplay. Breaks silently on updates. | | **MutationObserver** | ❌ Rejected | Over-engineered — event delegation handles dynamic elements without DOM observation overhead. | | **Pure JS event delegation** | ✅ Selected | Industry standard (YouTube, Discord, Spotify Web all use this). Works *with* browser security model, not against it. Spec-intended pattern — will never break. | ### Architecture: Separation of Concerns - **Dioxus/WASM** → UI state, rendering, data attributes - **JS event delegation** → browser audio APIs (speechSynthesis, AudioContext, audio.play()) - **Communication** → Dioxus sets `data-*` attributes on elements, JS reads them on click Key pattern: ```js // Delegated handler — works for dynamically added elements document.addEventListener('click', (e) => { const btn = e.target.closest('[data-read-aloud]'); if (btn) { // Gesture context preserved — browser allows audio APIs const ctx = new AudioContext(); // Server TTS or browser speechSynthesis here } }); ``` For server TTS with slow responses: create `AudioContext` at click time (gesture valid), then fetch audio and decode — `AudioContext` stays valid after creation. ### Deliverables **Level 1 — Browser gesture fixes (this PR):** - [ ] Read aloud: delegated JS click on `[data-read-aloud]` buttons → server TTS with speechSynthesis fallback - [ ] Convo mode: create/resume `AudioContext` in JS onclick of convo toggle, store globally, WASM references but never creates - [ ] Per-message speaker icon: verify rendering + click chain end-to-end - [ ] Audio autoplay after TTS: use `AudioContext` created at gesture time — no `new Audio()` autoplay needed **Level 2 — Backend (this PR):** - [ ] Add `POST /api/conversations` handler in hero_agent - [ ] Verify conversation list loading + persistence **Out of scope (issue #78):** - Server-side wake word (rustpotter) - Local Whisper STT - AudioWorkletNode replacement ### Repos touched - `hero_archipelagos` — AI island JS + message bubble + input area - `hero_agent` — conversation POST endpoint ### Build plan `make dist-clean-wasm` → `make test-local` (20/20) → squash merge → deploy v0.7.1-dev Signed-off-by: mik-tf

mik-tf commented

2026-03-23 17:41:00 +00:00

Author

Owner

Update: Rewrote JS delegation → pure web-sys (Rust)

After review, the JS event delegation approach didn't fit Hero's Rust-first architecture. Rewrote to use web-sys bindings directly from Dioxus onclick handlers.

What changed

Component	Before (JS delegation)	After (web-sys)
Read aloud button	`data-tts-text` + JS delegated click	`onclick` → `voice::ensure_tts_context()` + `voice::speak()`
Auto-read	`window._heroTtsSpeak()` global	`voice::speak()` (AudioContext from toggle click)
Stop button	`window._heroTtsStop()` eval	`voice::stop_tts()` (pure Rust)
Convo AudioContext	JS delegated click on `#hero-convo-btn`	`voice::ensure_convo_context()` in Dioxus onclick
Convo WebSocket	JS delegation	`eval()` (callback-heavy API, impractical in pure web-sys)

New file: `voice.rs`

Dedicated module with:

ensure_tts_context() — create/resume AudioContext (gesture-valid)
ensure_convo_context() — 16kHz AudioContext for conversation streaming
speak_browser() — browser speechSynthesis (synchronous)
speak_server() — fetch TTS from hero_agent, play via AudioContext
speak() — server TTS with browser fallback
stop_tts() — cancel all playback
Non-WASM stubs for cargo check on native

Why web-sys works for gesture chain

Dioxus onclick runs the Rust closure synchronously in the click event. web_sys::AudioContext::new() called from that closure is in gesture context — browser allows it. Only document::eval() breaks the chain (async bridge).

Remaining JS eval (acceptable)

Convo mode WebSocket + ScriptProcessor streaming: these APIs are deeply callback-based (onmessage, onaudioprocess). Pure web-sys would require leaked closures. Kept as eval but AudioContext is created in Rust first.

Rebuilding now. Will re-test 20/20 before deploy.

Signed-off-by: mik-tf

## Update: Rewrote JS delegation → pure web-sys (Rust) After review, the JS event delegation approach didn't fit Hero's Rust-first architecture. Rewrote to use `web-sys` bindings directly from Dioxus onclick handlers. ### What changed | Component | Before (JS delegation) | After (web-sys) | |-----------|----------------------|------------------| | Read aloud button | `data-tts-text` + JS delegated click | `onclick` → `voice::ensure_tts_context()` + `voice::speak()` | | Auto-read | `window._heroTtsSpeak()` global | `voice::speak()` (AudioContext from toggle click) | | Stop button | `window._heroTtsStop()` eval | `voice::stop_tts()` (pure Rust) | | Convo AudioContext | JS delegated click on `#hero-convo-btn` | `voice::ensure_convo_context()` in Dioxus onclick | | Convo WebSocket | JS delegation | `eval()` (callback-heavy API, impractical in pure web-sys) | ### New file: `voice.rs` Dedicated module with: - `ensure_tts_context()` — create/resume AudioContext (gesture-valid) - `ensure_convo_context()` — 16kHz AudioContext for conversation streaming - `speak_browser()` — browser speechSynthesis (synchronous) - `speak_server()` — fetch TTS from hero_agent, play via AudioContext - `speak()` — server TTS with browser fallback - `stop_tts()` — cancel all playback - Non-WASM stubs for `cargo check` on native ### Why web-sys works for gesture chain Dioxus `onclick` runs the Rust closure synchronously in the click event. `web_sys::AudioContext::new()` called from that closure is in gesture context — browser allows it. Only `document::eval()` breaks the chain (async bridge). ### Remaining JS eval (acceptable) Convo mode WebSocket + ScriptProcessor streaming: these APIs are deeply callback-based (`onmessage`, `onaudioprocess`). Pure web-sys would require leaked closures. Kept as eval but AudioContext is created in Rust first. Rebuilding now. Will re-test 20/20 before deploy. Signed-off-by: mik-tf

mik-tf commented

2026-03-23 18:06:36 +00:00

Author

Owner

Deployed: v0.7.1-dev on herodev

Test results

Local smoke: 115/115 passed
Local integration: 20/20 passed
Remote verification: 48/48 passed (3 pre-existing failures: hero_cloud_ui, hero_foundry_ui — unrelated)
POST /api/conversations: verified working on herodev

Repos touched

hero_agent (975bfdd): POST/DELETE/PATCH conversation endpoints, list returns full info
hero_archipelagos (f45c6fe): voice.rs web-sys module, pure Rust TTS, gesture-valid AudioContext

What was fixed

Read aloud: web-sys SpeechSynthesis + AudioContext in Dioxus onclick — gesture chain preserved
POST /api/conversations 405: added POST/DELETE/PATCH/GET-messages endpoints
Convo AudioContext blocked: ensure_convo_context() in onclick, WebSocket streaming via eval
Auto-read: uses shared voice::speak() with AudioContext from toggle click
Architecture: dedicated voice.rs module, no JS globals, pure web-sys bindings

Release

https://forge.ourworld.tf/lhumina_code/hero_services/releases/tag/v0.7.1-dev

Signed-off-by: mik-tf

## Deployed: v0.7.1-dev on herodev ### Test results - Local smoke: **115/115** passed - Local integration: **20/20** passed - Remote verification: **48/48** passed (3 pre-existing failures: hero_cloud_ui, hero_foundry_ui — unrelated) - POST /api/conversations: **verified working** on herodev ### Repos touched - **hero_agent** (`975bfdd`): POST/DELETE/PATCH conversation endpoints, list returns full info - **hero_archipelagos** (`f45c6fe`): voice.rs web-sys module, pure Rust TTS, gesture-valid AudioContext ### What was fixed 1. **Read aloud**: web-sys `SpeechSynthesis` + `AudioContext` in Dioxus onclick — gesture chain preserved 2. **POST /api/conversations 405**: added POST/DELETE/PATCH/GET-messages endpoints 3. **Convo AudioContext blocked**: `ensure_convo_context()` in onclick, WebSocket streaming via eval 4. **Auto-read**: uses shared `voice::speak()` with AudioContext from toggle click 5. **Architecture**: dedicated `voice.rs` module, no JS globals, pure web-sys bindings ### Release https://forge.ourworld.tf/lhumina_code/hero_services/releases/tag/v0.7.1-dev Signed-off-by: mik-tf

mik-tf commented

2026-03-23 20:02:19 +00:00

Author

Owner

Status update: v0.7.2-dev

What works in v0.7.2-dev

Conversation CRUD (POST/DELETE/PATCH) — was 405, now full REST API
Voice input: mic → transcribe → send to AI → SSE streaming response
MCP tools: 62 tools discovered and working
Auto-scroll to bottom on new messages (fixed)
voice.rs web-sys module — clean Rust foundation for audio APIs

Known limitation: TTS playback (read aloud / auto-read)

Browser TTS (speechSynthesis + AudioContext) requires user gesture context that expires unpredictably across browsers. The web-sys approach creates AudioContext correctly in onclick, but the actual audio playback call runs async and some browsers reject it.

Decision: defer TTS to issue #78 (server-side audio). Server-side TTS via WebSocket eliminates all browser gesture issues permanently.

Moving to #78 immediately

Instead of fighting browser audio policies with stepping stones, we're implementing the production solution:

Server-side wake word (Rustpotter) — works ALL browsers
Local Whisper STT (ONNX) — zero latency, offline capable
Server TTS via WebSocket — no browser gesture needed
AudioWorkletNode — replace deprecated ScriptProcessorNode

This makes #80 scope = conversation CRUD + voice input + auto-scroll (delivered). TTS playback = #78 scope.

Signed-off-by: mik-tf

## Status update: v0.7.2-dev ### What works in v0.7.2-dev - Conversation CRUD (POST/DELETE/PATCH) — was 405, now full REST API - Voice input: mic → transcribe → send to AI → SSE streaming response - MCP tools: 62 tools discovered and working - Auto-scroll to bottom on new messages (fixed) - `voice.rs` web-sys module — clean Rust foundation for audio APIs ### Known limitation: TTS playback (read aloud / auto-read) Browser TTS (speechSynthesis + AudioContext) requires user gesture context that expires unpredictably across browsers. The web-sys approach creates AudioContext correctly in onclick, but the actual audio playback call runs async and some browsers reject it. **Decision: defer TTS to issue #78 (server-side audio).** Server-side TTS via WebSocket eliminates all browser gesture issues permanently. ### Moving to #78 immediately Instead of fighting browser audio policies with stepping stones, we're implementing the production solution: - Server-side wake word (Rustpotter) — works ALL browsers - Local Whisper STT (ONNX) — zero latency, offline capable - Server TTS via WebSocket — no browser gesture needed - AudioWorkletNode — replace deprecated ScriptProcessorNode This makes #80 scope = conversation CRUD + voice input + auto-scroll (delivered). TTS playback = #78 scope. Signed-off-by: mik-tf

mik-tf referenced this issue

2026-03-23 20:02:37 +00:00

Voice AI Phase 2: cross-browser wake word + local Whisper STT #78

mik-tf commented

2026-03-23 20:06:09 +00:00

Author

Owner

Closing — v0.7.2-dev deployed

Delivered:

Conversation CRUD (POST/DELETE/PATCH)
Voice input pipeline (mic → transcribe → send)
Auto-scroll to bottom on new messages
voice.rs web-sys module (Rust foundation)

TTS playback deferred to #78 (server-side audio — the production solution).

Release: https://forge.ourworld.tf/lhumina_code/hero_services/releases/tag/v0.7.2-dev

Signed-off-by: mik-tf

## Closing — v0.7.2-dev deployed Delivered: - Conversation CRUD (POST/DELETE/PATCH) - Voice input pipeline (mic → transcribe → send) - Auto-scroll to bottom on new messages - voice.rs web-sys module (Rust foundation) TTS playback deferred to #78 (server-side audio — the production solution). Release: https://forge.ourworld.tf/lhumina_code/hero_services/releases/tag/v0.7.2-dev Signed-off-by: mik-tf

mik-tf closed this issue

2026-03-23 20:06:10 +00:00

No labels

meeting-notes

meeting-transcript

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

lhumina_code/home#80

No description provided.

Rows
Columns

Hero Agent v0.7.x: fix read aloud, conversations, convo mode + cross-browser voice #80

Context

What Works (v0.7.0-dev)

What Needs Fixing

Level 1: Browser user gesture issues (BLOCKING)

Level 2: Conversation persistence

Level 3: Cross-browser voice (Phase 2 from issue #78)

Level 4: Server-side TTS (nice to have)

Key Technical Insight

Files to modify

Repos involved

Build & test

Priority order

Status: Work in Progress

Technical Decision: Pure JS Event Delegation (not MutationObserver or pre-warm hacks)

Architecture: Separation of Concerns

Deliverables

Repos touched

Build plan

Update: Rewrote JS delegation → pure web-sys (Rust)

What changed

New file: voice.rs

Why web-sys works for gesture chain

Remaining JS eval (acceptable)

Deployed: v0.7.1-dev on herodev

Test results

Repos touched

What was fixed

Release

Status update: v0.7.2-dev

What works in v0.7.2-dev

Known limitation: TTS playback (read aloud / auto-read)

Moving to #78 immediately

Closing — v0.7.2-dev deployed

New file: `voice.rs`