Voice-to-text widgets backed by local STT daemon with Groq cloud fallback.
  • JavaScript 47.9%
  • Rust 37.2%
  • HTML 9.2%
  • CSS 4.8%
  • CMake 0.7%
  • Other 0.2%
Find a file
mik-tf 9c9dbe8e3a
Some checks failed
lab publish (gnu) / publish-gnu (push) Failing after 5m13s
ci(lab-publish): publish latest from main, latest-dev from development
Match the canonical pattern so development publishes the latest-dev prerelease
tag instead of overwriting the stable latest release. Pick the release tag by
branch (main -> latest, development -> latest-dev, v* -> ref name).

See lhumina_code/home#268

Signed-by: mik-tf <mik-tf@noreply.invalid>
2026-06-07 17:37:00 -04:00
.cargo chore: apply rustfmt and commit .cargo/config.toml 2026-05-19 20:06:03 +02:00
.forgejo/workflows ci(lab-publish): publish latest from main, latest-dev from development 2026-06-07 17:37:00 -04:00
crates chore: rename herolib_derive dependency to herolib_macros 2026-06-06 21:29:33 +02:00
docs/schemas Refactor: split hero_voice into server, web, and admin crates 2026-05-26 12:57:59 +02:00
schema chore: auto-commit local changes before pull 2026-05-31 23:49:24 +02:00
wasm/kws-vad Move STT/TTS to sherpa-onnx fork, add kws+vad wasm scaffold 2026-05-07 01:16:48 +00:00
.gitignore chore: remove Cargo.lock and update gitignore 2026-06-06 08:05:19 +02:00
apikeys.db Switch to direct AI client from hero_proc secret store 2026-05-16 00:19:27 +00:00
Cargo.toml chore: rename herolib_derive dependency to herolib_macros 2026-06-06 21:29:33 +02:00
LICENSE feat: align hero_voice workspace structure with hero_services reference pattern 2026-02-24 11:16:17 +02:00
PURPOSE.md Refactor: split hero_voice into server, web, and admin crates 2026-05-26 12:57:59 +02:00
README.md migrate streaming transcription from chunks+SSE to WebSocket 2026-05-30 01:29:29 +00:00
request_logs.db Switch to direct AI client from hero_proc secret store 2026-05-16 00:19:27 +00:00

Hero Voice

Drop-in voice-to-text widgets for any host UI in the Hero ecosystem, backed by a local STT daemon with a cloud Groq fallback and an AI text-transform pipeline.

Features

  • Drop-in browser widgets<hero-voice-input>, <hero-voice-floating>, <hero-voice-button>, and a data-hero-voice boost for any text input. See Browser widgets below.
  • Click-bounded one-shot captureMediaRecorder posts an Opus blob to /hero_voice/rest/transcribe; the server transcodes to 16 kHz mono WAV.
  • Local-first STThero_voiced runs sherpa-onnx Parakeet locally; Groq WhisperLargeV3Turbo takes over on failure.
  • Text transformations - 14 built-in AI transformation styles:
  • spellcheck - Grammar and spelling correction
  • specs - Technical specifications
  • code - Software architecture documentation
  • docs - User-friendly documentation
  • legal - Legal document formatting
  • story - Creative narrative
  • summary - Bullet-point summary
  • technical - Technical documentation
  • business - Business analysis
  • meeting - Meeting minutes
  • email - Professional email
  • Language translations: Dutch, French, Arabic
  • Topic organization - Hierarchical folder structure for transcriptions
  • Audio archival - Saves recordings as WAV and compressed OGG

Browser widgets

The widgets live at crates/hero_voice_admin/static/voice-widget/, embedded into hero_voice_admin and served under /hero_voice/admin/voice-widget/. Each one is a vanilla custom element with no framework dependency.

<!-- 1. Mic button bound to a target field -->
<hero-voice-input target="#desc"></hero-voice-input>
<textarea id="desc"></textarea>

<!-- 2. Boost any input — mic appears on hover/focus -->
<input data-hero-voice />

<!-- 3. Fixed corner mic — fills the last-focused input -->
<hero-voice-floating position="bottom-right"></hero-voice-floating>

<!-- 4. Event-only mic — emits `hero:voice-text`, calls window.fn -->
<hero-voice-button on-text="window.onVoiceText"></hero-voice-button>

Pull in the scripts (relative to the same socket the page is served from):

<link rel="stylesheet" href="/hero_voice/admin/voice-widget/voice-widget.css" />
<script src="/hero_voice/admin/voice-widget/components.js"></script>
<script src="/hero_voice/admin/voice-widget/floating.js"></script>
<script src="/hero_voice/admin/voice-widget/boost.js"></script>

A standalone demo lives at /hero_voice/admin/voice-widget/test.html.

Requirements

  • Rust 1.92+
  • A running hero_proc (provides the secret store; AI keys are read from it)
  • GROQ_API_KEY in hero_proc — required for the cloud STT/TTS fallback
  • OPENROUTER_API_KEY in hero_proc — required for the transform_content RPC
  • Modern browser with MediaRecorder + microphone support (see Browser Support)

Configuration

hero_voice reads AI provider keys directly from the hero_proc secret store via herolib_ai_direct. There is no AI broker daemon in this path.

hero_proc secret set GROQ_API_KEY gsk_...
hero_proc secret set OPENROUTER_API_KEY sk-or-...

Optional environment variables:

Var Default Purpose
RUST_LOG (unset) Tracing filter, e.g. hero_voice=info
HERO_VOICED_PORT 8094 Local hero_voiced HTTP port — STT/TTS is tried here first
HERO_VOICE_LOCAL_DISABLE (unset) Set to 1 to skip the local fast path and go straight to cloud
HERO_VOICE_SHERPA_DIR ~/hero/share/hero_voice/voice-widget/sherpa Browser-side sherpa WASM/data dir (parked wake-word bundle)

Usage

lab service voice --start

Services listen on Unix sockets only (no TCP). Use hero_proxy for external access.

Sockets

Socket Mount via hero_router Purpose
~/hero/var/sockets/hero_voice/rpc.sock /hero_voice/rpc/ JSON-RPC 2.0 (domain methods)
~/hero/var/sockets/hero_voice/admin.sock /hero_voice/admin/ Admin UI, widget bundle, file downloads, MCP
~/hero/var/sockets/hero_voice/rest.sock /hero_voice/rest/ Transcribe (one-shot + streaming SSE; optional topic-scoped archival), TTS

hero_voiced — local OpenAI-compatible STT/TTS daemon

hero_voiced is a stateless TCP daemon that loads sherpa-onnx Parakeet (STT) and Kokoro (TTS) once and exposes them over an OpenAI-compatible API. hero_voice_admin calls it directly via herolib_ai_direct — overriding Provider::Groq's base URL to http://127.0.0.1:${HERO_VOICED_PORT:-8094}/v1 — and falls back to cloud Groq Whisper / Orpheus on error. Set HERO_VOICE_LOCAL_DISABLE=1 to skip the local fast path entirely.

Endpoints:

  • POST /v1/audio/transcriptions — multipart form (file, model, language, prompt, response_format). Default response {"text": "..."}.
  • POST /v1/audio/speech — JSON {model, input, voice, response_format, speed}. Supports response_format of wav (default) and pcm.
  • GET /v1/models — local engine identifiers.
  • GET /health{status, service, version, models_ready}.
  • GET /.well-known/heroservice.json — discovery manifest.

Environment:

Var Default Purpose
HERO_VOICED_PORT 8094 Loopback TCP port
HERO_VOICED_ADDRESS (unset) Optional second bind (e.g. mycelium IPv6)
HERO_VOICE_STT_SHERPA_DIR ~/hero/share/hero_voice/stt/parakeet Parakeet bundle dir
HERO_VOICE_TTS_KOKORO_DIR ~/hero/share/hero_voice/kokoro-en-v0_19 Kokoro bundle dir

Both bundle dirs auto-populate on first hero_voiced start (~770 MB combined download from the sherpa-onnx GitHub releases).

Run standalone:

lab service voice --start

Architecture

Hero Voice follows the standard Hero three-crate model:

hero_voice/
├── crates/
│ ├── hero_voice/ # Core library (types, domain logic, audio, transcription)
│ ├── hero_voice_server/ # JSON-RPC 2.0 server over Unix socket (rpc.sock)
│ ├── hero_voice_admin/ # Admin UI on admin.sock + REST (transcribe/tts/uploads) on rest.sock (Axum HTTP)
│ ├── hero_voiced/ # Local OpenAI-compatible STT/TTS daemon (TCP)
│ ├── hero_voice_sdk/ # Generated client SDK
│ └── hero_voice_examples/ # Example programs using the SDK
├── schemas/voice/voice.oschema # Domain schema (source of truth)
├── Cargo.toml
└── wasm/ # Browser-side WASM build for the parked wake-word bundle (KWS/VAD)

Data flow

Browser widget (MediaRecorder → Opus blob)
 │
 ▼ POST /hero_voice/rest/transcribe
hero_voice_admin → rest.sock
 ├── /transcribe[?topic_sid=...] → multipart Opus → 16 kHz WAV → STT
 │ ├── hero_voiced (local, priority 0)
 │ ├── Groq Whisper (cloud fallback)
 │ └── (optional) archive original under data/audio/{topic_sid}/
 │ + voiceservice.register_audio bookkeeping
 ├── /transcribe/ws/{sid} → WebSocket: stream Opus/PCM up, transcript segments down
 └── /tts, /tts/voices → speech synthesis

hero_voice_admin → admin.sock (UI)
 ├── /voice-widget/* → widget bundle (components.js, floating.js,
 │ boost.js, bar.js, test.html, parked wake-word/)
 ├── /files/audio/*, /files/transforms/* → data downloads
 ├── /mcp → MCP-to-OpenRPC translation
 └── /* → embedded admin UI assets

hero_voice_server → rpc.sock (reached at /hero_voice/rpc/ via hero_router)
 ├── rpc.health → {"status":"ok"}
 ├── rpc.discover → OpenRPC spec
 └── domain methods (folder.*, topic.*, voiceservice.*)

API

JSON-RPC Endpoint

All data operations use JSON-RPC 2.0 at /hero_voice/rpc/rpc — served by hero_router directly from rpc.sock (no admin-side proxy).

Auto-generated CRUD (Topic and Folder root objects):

  • topic.new, topic.get, topic.set, topic.delete, topic.list
  • folder.new, folder.get, folder.set, folder.delete, folder.list

Custom service methods (VoiceService):

  • voiceservice.create_topic / voiceservice.create_folder
  • voiceservice.rename_topic / voiceservice.rename_folder
  • voiceservice.move_topic / voiceservice.move_folder
  • voiceservice.delete_topic / voiceservice.delete_folder
  • voiceservice.save_content / voiceservice.transform_content
  • voiceservice.register_audio / voiceservice.delete_audio
  • voiceservice.reset_topic / voiceservice.get_audio_path

Transcribe

POST /hero_voice/rest/transcribe[?topic_sid=...] — multipart/form-data with an audio field (Opus in Ogg or WebM, or WAV). Server transcodes to 16 kHz mono WAV before handing to STT. When topic_sid is set, the original (un-transcoded) bytes are archived under data/audio/{topic_sid}/{timestamp}.{ext} and registered via voiceservice.register_audio as a best-effort side effect — archival errors log a warning but don't fail the transcription.

Response: {text, model_id, latency_ms, archived?: {filename, format, size}}.

Static Files

  • GET /hero_voice/admin/files/audio/{filename} - Audio file downloads
  • GET /hero_voice/admin/files/transforms/{filename} - Transform file downloads

Audio Processing

  • Capture: Browser MediaRecorder, Opus in Ogg (Firefox) or WebM (Chromium / Safari 18.4+) container, click-bounded one-shot per recording.
  • Server-side transcode: Opus → 16 kHz mono WAV before STT.
  • Archival: Saved recordings stored as WAV plus compressed OGG Vorbis (~10% of WAV size).

Browser Support

  • Chrome 120+
  • Firefox 120+
  • Safari 18.4+ (Opus in MediaRecorder; older Safari falls back to MP4 which the server doesn't currently decode)
  • Edge 120+

Requires microphone permission.

Embedding & CORS

Hero Voice allows iframe embedding (no X-Frame-Options restrictions), cross-origin API calls, and WebSocket connections from any origin.

License

Apache-2.0