- Rust 52%
- JavaScript 18.7%
- Shell 17.4%
- HTML 9.8%
- Makefile 2.1%
|
Some checks failed
Build / build (push) Failing after 1m11s
Replace custom zinit lifecycle code in hero_voice_server and hero_voice_ui with the shared ZinitLifecycle from hero_rpc_server. Removes lifecycle.rs from the hero_voice lib crate and the zinit_sdk dependency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .cargo | ||
| .forgejo/workflows | ||
| crates | ||
| docs/schemas | ||
| schemas/voice | ||
| scripts | ||
| .gitignore | ||
| buildenv.sh | ||
| Cargo.lock | ||
| Cargo.toml | ||
| LICENSE | ||
| Makefile | ||
| README.md | ||
Hero Voice
Voice-to-markdown transcription server with real-time speech recognition, AI-powered text transformation, and live preview.
Features
- Real-time voice transcription - Stream audio from browser to server via WebSocket
- Voice Activity Detection - Silero VAD V5 neural network detects speech/silence transitions
- Automatic segmentation - Transcribes on natural pauses (350ms silence threshold)
- AI transcription - Uses Groq WhisperLargeV3Turbo with automatic failover
- Live markdown preview - Split-screen editor with real-time rendered HTML
- Text transformations - 14 built-in AI transformation styles:
spellcheck- Grammar and spelling correctionspecs- Technical specificationscode- Software architecture documentationdocs- User-friendly documentationlegal- Legal document formattingstory- Creative narrativesummary- Bullet-point summarytechnical- Technical documentationbusiness- Business analysismeeting- Meeting minutesemail- Professional email- Language translations: Dutch, French, Arabic
- Topic organization - Hierarchical folder structure for transcriptions
- Audio archival - Saves recordings as WAV and compressed OGG
Requirements
- Rust 1.92+
- Groq API key (required for transcription)
- Modern browser with Web Audio API and microphone support
Configuration
# Required
export GROQ_API_KEY=your-groq-api-key
# Optional fallback providers
export OPENROUTER_API_KEY=your-openrouter-key
export SAMBANOVA_API_KEY=your-sambanova-key
# Server configuration (optional)
export RUST_LOG=hero_voice=info # Log level
Usage
make run
Services listen on Unix sockets only (no TCP). Use hero_proxy for external access.
Sockets
| Service | Socket Path |
|---|---|
| Server (OpenRPC) | ~/hero/var/sockets/hero_voice_server.sock |
| UI (HTTP + /rpc proxy) | ~/hero/var/sockets/hero_voice_ui.sock |
Architecture
Hero Voice follows the standard Hero three-crate model:
hero_voice/
├── crates/
│ ├── hero_voice/ # Core library (types, domain logic, audio, transcription)
│ ├── hero_voice_server/ # JSON-RPC 2.0 server over Unix socket
│ ├── hero_voice_sdk/ # Generated client SDK
│ ├── hero_voice_ui/ # Admin UI (Axum HTTP + /rpc proxy + WebSocket)
│ └── hero_voice_examples/ # Example programs using the SDK
├── schemas/voice/voice.oschema # Domain schema (source of truth)
├── data/ # Runtime data (OTOML storage, audio, transforms)
├── Cargo.toml
├── Makefile
└── buildenv.sh
Data flow
Browser (WebSocket)
│
▼
hero_voice_ui (Unix socket)
├── /rpc endpoint → proxies JSON-RPC to hero_voice_server.sock
├── /mcp endpoint → MCP-to-OpenRPC translation
├── /ws endpoint → WebSocket audio streaming
└── /* fallback → embedded static assets
hero_voice_server (Unix socket)
├── rpc.health → {"status":"ok"}
├── rpc.discover → OpenRPC spec
└── domain methods (folder.*, topic.*, voiceservice.*)
API
JSON-RPC Endpoint
All data operations use JSON-RPC 2.0 via the /rpc proxy on the UI socket.
Auto-generated CRUD (Topic and Folder root objects):
topic.new,topic.get,topic.set,topic.delete,topic.listfolder.new,folder.get,folder.set,folder.delete,folder.list
Custom service methods (VoiceService):
voiceservice.create_topic/voiceservice.create_foldervoiceservice.rename_topic/voiceservice.rename_foldervoiceservice.move_topic/voiceservice.move_foldervoiceservice.delete_topic/voiceservice.delete_foldervoiceservice.save_content/voiceservice.transform_contentvoiceservice.register_audio/voiceservice.delete_audiovoiceservice.reset_topic/voiceservice.get_audio_path
WebSocket
GET /ws - Audio streaming endpoint
Client to Server:
{ "type": "start", "topic": "optional-topic-sid", "audio_dir": "optional-dir" }
{ "type": "stop" }
Plus binary audio data (16-bit PCM, 16kHz, mono, little-endian)
Server to Client:
{ "type": "transcription", "text": "...", "is_final": true }
{ "type": "status", "message": "..." }
{ "type": "error", "message": "..." }
Static Files
GET /files/audio/{filename}- Audio file downloadsGET /files/transforms/{filename}- Transform file downloads
Audio Processing
- Sample rate: 16kHz (required for Silero VAD)
- Chunk size: 512 samples for VAD analysis
- Silence threshold: 350ms triggers transcription
- Speech threshold: 0.20 probability
- Maximum buffer: 30 seconds before forced transcription
- Compression: OGG Vorbis at quality 0.4 (~10% of WAV size)
Browser Support
- Chrome 120+
- Firefox 120+
- Safari 17+
- Edge 120+
Requires microphone permission and WebSocket support.
Embedding & CORS
Hero Voice allows iframe embedding (no X-Frame-Options restrictions), cross-origin API calls, and WebSocket connections from any origin.
License
Apache-2.0