No description

Rust 52%
JavaScript 18.7%
Shell 17.4%
HTML 9.8%
Makefile 2.1%

Find a file

Timur Gordon 80f0158ab4 Some checks failed Build / build (push) Failing after 1m11s Details feat: migrate server and UI to hero_rpc_server::ZinitLifecycle, drop local lifecycle.rs Replace custom zinit lifecycle code in hero_voice_server and hero_voice_ui with the shared ZinitLifecycle from hero_rpc_server. Removes lifecycle.rs from the hero_voice lib crate and the zinit_sdk dependency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>		2026-03-12 14:03:58 +01:00
.cargo	build: update local Cargo config for monorepo dev	2026-02-24 12:19:00 +02:00
.forgejo/workflows	ci: add Cargo cache and SSH keepalive to fix git clone hangs	2026-02-24 13:25:41 +02:00
crates	feat: migrate server and UI to hero_rpc_server::ZinitLifecycle, drop local lifecycle.rs	2026-03-12 14:03:58 +01:00
docs/schemas	fix: resolve cargo check failures for CI	2026-02-24 13:07:17 +02:00
schemas/voice	feat: align hero_voice workspace structure with hero_services reference pattern	2026-02-24 11:16:17 +02:00
scripts	ci: Enhance build system with version sync and workflow improvements	2026-02-07 11:08:12 +04:00
.gitignore	fix: resolve cargo check failures for CI	2026-02-24 13:07:17 +02:00
buildenv.sh	Refactor hero_voice: consolidate HTTP and client crates, create server and SDK architecture	2026-03-01 18:43:34 +03:00
Cargo.lock	feat: migrate server and UI to hero_rpc_server::ZinitLifecycle, drop local lifecycle.rs	2026-03-12 14:03:58 +01:00
Cargo.toml	feat: add ZinitLifecycle to server and UI, update Makefile to use binary subcommands	2026-03-10 12:43:39 +01:00
LICENSE	feat: align hero_voice workspace structure with hero_services reference pattern	2026-02-24 11:16:17 +02:00
Makefile	feat: add ZinitLifecycle to server and UI, update Makefile to use binary subcommands	2026-03-10 12:43:39 +01:00
README.md	Refactor hero_voice: consolidate HTTP and client crates, create server and SDK architecture	2026-03-01 18:43:34 +03:00

README.md

Hero Voice

Voice-to-markdown transcription server with real-time speech recognition, AI-powered text transformation, and live preview.

Features

Real-time voice transcription - Stream audio from browser to server via WebSocket
Voice Activity Detection - Silero VAD V5 neural network detects speech/silence transitions
Automatic segmentation - Transcribes on natural pauses (350ms silence threshold)
AI transcription - Uses Groq WhisperLargeV3Turbo with automatic failover
Live markdown preview - Split-screen editor with real-time rendered HTML
Text transformations - 14 built-in AI transformation styles:
- spellcheck - Grammar and spelling correction
- specs - Technical specifications
- code - Software architecture documentation
- docs - User-friendly documentation
- legal - Legal document formatting
- story - Creative narrative
- summary - Bullet-point summary
- technical - Technical documentation
- business - Business analysis
- meeting - Meeting minutes
- email - Professional email
- Language translations: Dutch, French, Arabic
Topic organization - Hierarchical folder structure for transcriptions
Audio archival - Saves recordings as WAV and compressed OGG

Requirements

Rust 1.92+
Groq API key (required for transcription)
Modern browser with Web Audio API and microphone support

Configuration

# Required
export GROQ_API_KEY=your-groq-api-key

# Optional fallback providers
export OPENROUTER_API_KEY=your-openrouter-key
export SAMBANOVA_API_KEY=your-sambanova-key

# Server configuration (optional)
export RUST_LOG=hero_voice=info  # Log level

Usage

make run

Services listen on Unix sockets only (no TCP). Use hero_proxy for external access.

Sockets

Service	Socket Path
Server (OpenRPC)	`~/hero/var/sockets/hero_voice_server.sock`
UI (HTTP + /rpc proxy)	`~/hero/var/sockets/hero_voice_ui.sock`

Architecture

Hero Voice follows the standard Hero three-crate model:

hero_voice/
├── crates/
│   ├── hero_voice/              # Core library (types, domain logic, audio, transcription)
│   ├── hero_voice_server/       # JSON-RPC 2.0 server over Unix socket
│   ├── hero_voice_sdk/          # Generated client SDK
│   ├── hero_voice_ui/           # Admin UI (Axum HTTP + /rpc proxy + WebSocket)
│   └── hero_voice_examples/     # Example programs using the SDK
├── schemas/voice/voice.oschema  # Domain schema (source of truth)
├── data/                        # Runtime data (OTOML storage, audio, transforms)
├── Cargo.toml
├── Makefile
└── buildenv.sh

Data flow

Browser (WebSocket)
    │
    ▼
hero_voice_ui (Unix socket)
    ├── /rpc endpoint → proxies JSON-RPC to hero_voice_server.sock
    ├── /mcp endpoint → MCP-to-OpenRPC translation
    ├── /ws  endpoint → WebSocket audio streaming
    └── /*   fallback → embedded static assets

hero_voice_server (Unix socket)
    ├── rpc.health   → {"status":"ok"}
    ├── rpc.discover → OpenRPC spec
    └── domain methods (folder.*, topic.*, voiceservice.*)

API

JSON-RPC Endpoint

All data operations use JSON-RPC 2.0 via the /rpc proxy on the UI socket.

Auto-generated CRUD (Topic and Folder root objects):

topic.new, topic.get, topic.set, topic.delete, topic.list
folder.new, folder.get, folder.set, folder.delete, folder.list

Custom service methods (VoiceService):

voiceservice.create_topic / voiceservice.create_folder
voiceservice.rename_topic / voiceservice.rename_folder
voiceservice.move_topic / voiceservice.move_folder
voiceservice.delete_topic / voiceservice.delete_folder
voiceservice.save_content / voiceservice.transform_content
voiceservice.register_audio / voiceservice.delete_audio
voiceservice.reset_topic / voiceservice.get_audio_path

WebSocket

GET /ws - Audio streaming endpoint

Client to Server:

{ "type": "start", "topic": "optional-topic-sid", "audio_dir": "optional-dir" }
{ "type": "stop" }

Plus binary audio data (16-bit PCM, 16kHz, mono, little-endian)

Server to Client:

{ "type": "transcription", "text": "...", "is_final": true }
{ "type": "status", "message": "..." }
{ "type": "error", "message": "..." }

Static Files

GET /files/audio/{filename} - Audio file downloads
GET /files/transforms/{filename} - Transform file downloads

Audio Processing

Sample rate: 16kHz (required for Silero VAD)
Chunk size: 512 samples for VAD analysis
Silence threshold: 350ms triggers transcription
Speech threshold: 0.20 probability
Maximum buffer: 30 seconds before forced transcription
Compression: OGG Vorbis at quality 0.4 (~10% of WAV size)

Browser Support

Chrome 120+
Firefox 120+
Safari 17+
Edge 120+

Requires microphone permission and WebSocket support.

Embedding & CORS

Hero Voice allows iframe embedding (no X-Frame-Options restrictions), cross-origin API calls, and WebSocket connections from any origin.

License

Apache-2.0