- JavaScript 47.9%
- Rust 37.2%
- HTML 9.2%
- CSS 4.8%
- CMake 0.7%
- Other 0.2%
|
Some checks failed
lab publish (gnu) / publish-gnu (push) Failing after 5m13s
Match the canonical pattern so development publishes the latest-dev prerelease tag instead of overwriting the stable latest release. Pick the release tag by branch (main -> latest, development -> latest-dev, v* -> ref name). See lhumina_code/home#268 Signed-by: mik-tf <mik-tf@noreply.invalid> |
||
|---|---|---|
| .cargo | ||
| .forgejo/workflows | ||
| crates | ||
| docs/schemas | ||
| schema | ||
| wasm/kws-vad | ||
| .gitignore | ||
| apikeys.db | ||
| Cargo.toml | ||
| LICENSE | ||
| PURPOSE.md | ||
| README.md | ||
| request_logs.db | ||
Hero Voice
Drop-in voice-to-text widgets for any host UI in the Hero ecosystem, backed by a local STT daemon with a cloud Groq fallback and an AI text-transform pipeline.
Features
- Drop-in browser widgets —
<hero-voice-input>,<hero-voice-floating>,<hero-voice-button>, and adata-hero-voiceboost for any text input. See Browser widgets below. - Click-bounded one-shot capture —
MediaRecorderposts an Opus blob to/hero_voice/rest/transcribe; the server transcodes to 16 kHz mono WAV. - Local-first STT —
hero_voicedruns sherpa-onnx Parakeet locally; Groq WhisperLargeV3Turbo takes over on failure. - Text transformations - 14 built-in AI transformation styles:
spellcheck- Grammar and spelling correctionspecs- Technical specificationscode- Software architecture documentationdocs- User-friendly documentationlegal- Legal document formattingstory- Creative narrativesummary- Bullet-point summarytechnical- Technical documentationbusiness- Business analysismeeting- Meeting minutesemail- Professional email- Language translations: Dutch, French, Arabic
- Topic organization - Hierarchical folder structure for transcriptions
- Audio archival - Saves recordings as WAV and compressed OGG
Browser widgets
The widgets live at crates/hero_voice_admin/static/voice-widget/,
embedded into hero_voice_admin and served under
/hero_voice/admin/voice-widget/. Each one is a vanilla custom element
with no framework dependency.
<!-- 1. Mic button bound to a target field -->
<hero-voice-input target="#desc"></hero-voice-input>
<textarea id="desc"></textarea>
<!-- 2. Boost any input — mic appears on hover/focus -->
<input data-hero-voice />
<!-- 3. Fixed corner mic — fills the last-focused input -->
<hero-voice-floating position="bottom-right"></hero-voice-floating>
<!-- 4. Event-only mic — emits `hero:voice-text`, calls window.fn -->
<hero-voice-button on-text="window.onVoiceText"></hero-voice-button>
Pull in the scripts (relative to the same socket the page is served from):
<link rel="stylesheet" href="/hero_voice/admin/voice-widget/voice-widget.css" />
<script src="/hero_voice/admin/voice-widget/components.js"></script>
<script src="/hero_voice/admin/voice-widget/floating.js"></script>
<script src="/hero_voice/admin/voice-widget/boost.js"></script>
A standalone demo lives at
/hero_voice/admin/voice-widget/test.html.
Requirements
- Rust 1.92+
- A running
hero_proc(provides the secret store; AI keys are read from it) GROQ_API_KEYin hero_proc — required for the cloud STT/TTS fallbackOPENROUTER_API_KEYin hero_proc — required for thetransform_contentRPC- Modern browser with
MediaRecorder+ microphone support (see Browser Support)
Configuration
hero_voice reads AI provider keys directly from the hero_proc secret store
via herolib_ai_direct. There is no AI broker daemon in this path.
hero_proc secret set GROQ_API_KEY gsk_...
hero_proc secret set OPENROUTER_API_KEY sk-or-...
Optional environment variables:
| Var | Default | Purpose |
|---|---|---|
RUST_LOG |
(unset) | Tracing filter, e.g. hero_voice=info |
HERO_VOICED_PORT |
8094 |
Local hero_voiced HTTP port — STT/TTS is tried here first |
HERO_VOICE_LOCAL_DISABLE |
(unset) | Set to 1 to skip the local fast path and go straight to cloud |
HERO_VOICE_SHERPA_DIR |
~/hero/share/hero_voice/voice-widget/sherpa |
Browser-side sherpa WASM/data dir (parked wake-word bundle) |
Usage
lab service voice --start
Services listen on Unix sockets only (no TCP). Use hero_proxy for external access.
Sockets
| Socket | Mount via hero_router | Purpose |
|---|---|---|
~/hero/var/sockets/hero_voice/rpc.sock |
/hero_voice/rpc/ |
JSON-RPC 2.0 (domain methods) |
~/hero/var/sockets/hero_voice/admin.sock |
/hero_voice/admin/ |
Admin UI, widget bundle, file downloads, MCP |
~/hero/var/sockets/hero_voice/rest.sock |
/hero_voice/rest/ |
Transcribe (one-shot + streaming SSE; optional topic-scoped archival), TTS |
hero_voiced — local OpenAI-compatible STT/TTS daemon
hero_voiced is a stateless TCP daemon that loads sherpa-onnx Parakeet (STT) and
Kokoro (TTS) once and exposes them over an OpenAI-compatible API.
hero_voice_admin calls it directly via herolib_ai_direct — overriding
Provider::Groq's base URL to http://127.0.0.1:${HERO_VOICED_PORT:-8094}/v1
— and falls back to cloud Groq Whisper / Orpheus on error. Set
HERO_VOICE_LOCAL_DISABLE=1 to skip the local fast path entirely.
Endpoints:
POST /v1/audio/transcriptions— multipart form (file,model,language,prompt,response_format). Default response{"text": "..."}.POST /v1/audio/speech— JSON{model, input, voice, response_format, speed}. Supportsresponse_formatofwav(default) andpcm.GET /v1/models— local engine identifiers.GET /health—{status, service, version, models_ready}.GET /.well-known/heroservice.json— discovery manifest.
Environment:
| Var | Default | Purpose |
|---|---|---|
HERO_VOICED_PORT |
8094 |
Loopback TCP port |
HERO_VOICED_ADDRESS |
(unset) | Optional second bind (e.g. mycelium IPv6) |
HERO_VOICE_STT_SHERPA_DIR |
~/hero/share/hero_voice/stt/parakeet |
Parakeet bundle dir |
HERO_VOICE_TTS_KOKORO_DIR |
~/hero/share/hero_voice/kokoro-en-v0_19 |
Kokoro bundle dir |
Both bundle dirs auto-populate on first hero_voiced start (~770 MB combined
download from the sherpa-onnx GitHub releases).
Run standalone:
lab service voice --start
Architecture
Hero Voice follows the standard Hero three-crate model:
hero_voice/
├── crates/
│ ├── hero_voice/ # Core library (types, domain logic, audio, transcription)
│ ├── hero_voice_server/ # JSON-RPC 2.0 server over Unix socket (rpc.sock)
│ ├── hero_voice_admin/ # Admin UI on admin.sock + REST (transcribe/tts/uploads) on rest.sock (Axum HTTP)
│ ├── hero_voiced/ # Local OpenAI-compatible STT/TTS daemon (TCP)
│ ├── hero_voice_sdk/ # Generated client SDK
│ └── hero_voice_examples/ # Example programs using the SDK
├── schemas/voice/voice.oschema # Domain schema (source of truth)
├── Cargo.toml
└── wasm/ # Browser-side WASM build for the parked wake-word bundle (KWS/VAD)
Data flow
Browser widget (MediaRecorder → Opus blob)
│
▼ POST /hero_voice/rest/transcribe
hero_voice_admin → rest.sock
├── /transcribe[?topic_sid=...] → multipart Opus → 16 kHz WAV → STT
│ ├── hero_voiced (local, priority 0)
│ ├── Groq Whisper (cloud fallback)
│ └── (optional) archive original under data/audio/{topic_sid}/
│ + voiceservice.register_audio bookkeeping
├── /transcribe/ws/{sid} → WebSocket: stream Opus/PCM up, transcript segments down
└── /tts, /tts/voices → speech synthesis
hero_voice_admin → admin.sock (UI)
├── /voice-widget/* → widget bundle (components.js, floating.js,
│ boost.js, bar.js, test.html, parked wake-word/)
├── /files/audio/*, /files/transforms/* → data downloads
├── /mcp → MCP-to-OpenRPC translation
└── /* → embedded admin UI assets
hero_voice_server → rpc.sock (reached at /hero_voice/rpc/ via hero_router)
├── rpc.health → {"status":"ok"}
├── rpc.discover → OpenRPC spec
└── domain methods (folder.*, topic.*, voiceservice.*)
API
JSON-RPC Endpoint
All data operations use JSON-RPC 2.0 at /hero_voice/rpc/rpc — served by
hero_router directly from rpc.sock (no admin-side proxy).
Auto-generated CRUD (Topic and Folder root objects):
topic.new,topic.get,topic.set,topic.delete,topic.listfolder.new,folder.get,folder.set,folder.delete,folder.list
Custom service methods (VoiceService):
voiceservice.create_topic/voiceservice.create_foldervoiceservice.rename_topic/voiceservice.rename_foldervoiceservice.move_topic/voiceservice.move_foldervoiceservice.delete_topic/voiceservice.delete_foldervoiceservice.save_content/voiceservice.transform_contentvoiceservice.register_audio/voiceservice.delete_audiovoiceservice.reset_topic/voiceservice.get_audio_path
Transcribe
POST /hero_voice/rest/transcribe[?topic_sid=...] — multipart/form-data
with an audio field (Opus in Ogg or WebM, or WAV). Server transcodes
to 16 kHz mono WAV before handing to STT. When topic_sid is set, the
original (un-transcoded) bytes are archived under
data/audio/{topic_sid}/{timestamp}.{ext} and registered via
voiceservice.register_audio as a best-effort side effect — archival
errors log a warning but don't fail the transcription.
Response: {text, model_id, latency_ms, archived?: {filename, format, size}}.
Static Files
GET /hero_voice/admin/files/audio/{filename}- Audio file downloadsGET /hero_voice/admin/files/transforms/{filename}- Transform file downloads
Audio Processing
- Capture: Browser
MediaRecorder, Opus in Ogg (Firefox) or WebM (Chromium / Safari 18.4+) container, click-bounded one-shot per recording. - Server-side transcode: Opus → 16 kHz mono WAV before STT.
- Archival: Saved recordings stored as WAV plus compressed OGG Vorbis (~10% of WAV size).
Browser Support
- Chrome 120+
- Firefox 120+
- Safari 18.4+ (Opus in MediaRecorder; older Safari falls back to MP4 which the server doesn't currently decode)
- Edge 120+
Requires microphone permission.
Embedding & CORS
Hero Voice allows iframe embedding (no X-Frame-Options restrictions), cross-origin API calls, and WebSocket connections from any origin.
License
Apache-2.0