No description
  • Rust 52%
  • JavaScript 18.7%
  • Shell 17.4%
  • HTML 9.8%
  • Makefile 2.1%
Find a file
Timur Gordon 80f0158ab4
Some checks failed
Build / build (push) Failing after 1m11s
feat: migrate server and UI to hero_rpc_server::ZinitLifecycle, drop local lifecycle.rs
Replace custom zinit lifecycle code in hero_voice_server and hero_voice_ui
with the shared ZinitLifecycle from hero_rpc_server. Removes lifecycle.rs
from the hero_voice lib crate and the zinit_sdk dependency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 14:03:58 +01:00
.cargo build: update local Cargo config for monorepo dev 2026-02-24 12:19:00 +02:00
.forgejo/workflows ci: add Cargo cache and SSH keepalive to fix git clone hangs 2026-02-24 13:25:41 +02:00
crates feat: migrate server and UI to hero_rpc_server::ZinitLifecycle, drop local lifecycle.rs 2026-03-12 14:03:58 +01:00
docs/schemas fix: resolve cargo check failures for CI 2026-02-24 13:07:17 +02:00
schemas/voice feat: align hero_voice workspace structure with hero_services reference pattern 2026-02-24 11:16:17 +02:00
scripts ci: Enhance build system with version sync and workflow improvements 2026-02-07 11:08:12 +04:00
.gitignore fix: resolve cargo check failures for CI 2026-02-24 13:07:17 +02:00
buildenv.sh Refactor hero_voice: consolidate HTTP and client crates, create server and SDK architecture 2026-03-01 18:43:34 +03:00
Cargo.lock feat: migrate server and UI to hero_rpc_server::ZinitLifecycle, drop local lifecycle.rs 2026-03-12 14:03:58 +01:00
Cargo.toml feat: add ZinitLifecycle to server and UI, update Makefile to use binary subcommands 2026-03-10 12:43:39 +01:00
LICENSE feat: align hero_voice workspace structure with hero_services reference pattern 2026-02-24 11:16:17 +02:00
Makefile feat: add ZinitLifecycle to server and UI, update Makefile to use binary subcommands 2026-03-10 12:43:39 +01:00
README.md Refactor hero_voice: consolidate HTTP and client crates, create server and SDK architecture 2026-03-01 18:43:34 +03:00

Hero Voice

Voice-to-markdown transcription server with real-time speech recognition, AI-powered text transformation, and live preview.

Features

  • Real-time voice transcription - Stream audio from browser to server via WebSocket
  • Voice Activity Detection - Silero VAD V5 neural network detects speech/silence transitions
  • Automatic segmentation - Transcribes on natural pauses (350ms silence threshold)
  • AI transcription - Uses Groq WhisperLargeV3Turbo with automatic failover
  • Live markdown preview - Split-screen editor with real-time rendered HTML
  • Text transformations - 14 built-in AI transformation styles:
    • spellcheck - Grammar and spelling correction
    • specs - Technical specifications
    • code - Software architecture documentation
    • docs - User-friendly documentation
    • legal - Legal document formatting
    • story - Creative narrative
    • summary - Bullet-point summary
    • technical - Technical documentation
    • business - Business analysis
    • meeting - Meeting minutes
    • email - Professional email
    • Language translations: Dutch, French, Arabic
  • Topic organization - Hierarchical folder structure for transcriptions
  • Audio archival - Saves recordings as WAV and compressed OGG

Requirements

  • Rust 1.92+
  • Groq API key (required for transcription)
  • Modern browser with Web Audio API and microphone support

Configuration

# Required
export GROQ_API_KEY=your-groq-api-key

# Optional fallback providers
export OPENROUTER_API_KEY=your-openrouter-key
export SAMBANOVA_API_KEY=your-sambanova-key

# Server configuration (optional)
export RUST_LOG=hero_voice=info  # Log level

Usage

make run

Services listen on Unix sockets only (no TCP). Use hero_proxy for external access.

Sockets

Service Socket Path
Server (OpenRPC) ~/hero/var/sockets/hero_voice_server.sock
UI (HTTP + /rpc proxy) ~/hero/var/sockets/hero_voice_ui.sock

Architecture

Hero Voice follows the standard Hero three-crate model:

hero_voice/
├── crates/
│   ├── hero_voice/              # Core library (types, domain logic, audio, transcription)
│   ├── hero_voice_server/       # JSON-RPC 2.0 server over Unix socket
│   ├── hero_voice_sdk/          # Generated client SDK
│   ├── hero_voice_ui/           # Admin UI (Axum HTTP + /rpc proxy + WebSocket)
│   └── hero_voice_examples/     # Example programs using the SDK
├── schemas/voice/voice.oschema  # Domain schema (source of truth)
├── data/                        # Runtime data (OTOML storage, audio, transforms)
├── Cargo.toml
├── Makefile
└── buildenv.sh

Data flow

Browser (WebSocket)
    │
    ▼
hero_voice_ui (Unix socket)
    ├── /rpc endpoint → proxies JSON-RPC to hero_voice_server.sock
    ├── /mcp endpoint → MCP-to-OpenRPC translation
    ├── /ws  endpoint → WebSocket audio streaming
    └── /*   fallback → embedded static assets

hero_voice_server (Unix socket)
    ├── rpc.health   → {"status":"ok"}
    ├── rpc.discover → OpenRPC spec
    └── domain methods (folder.*, topic.*, voiceservice.*)

API

JSON-RPC Endpoint

All data operations use JSON-RPC 2.0 via the /rpc proxy on the UI socket.

Auto-generated CRUD (Topic and Folder root objects):

  • topic.new, topic.get, topic.set, topic.delete, topic.list
  • folder.new, folder.get, folder.set, folder.delete, folder.list

Custom service methods (VoiceService):

  • voiceservice.create_topic / voiceservice.create_folder
  • voiceservice.rename_topic / voiceservice.rename_folder
  • voiceservice.move_topic / voiceservice.move_folder
  • voiceservice.delete_topic / voiceservice.delete_folder
  • voiceservice.save_content / voiceservice.transform_content
  • voiceservice.register_audio / voiceservice.delete_audio
  • voiceservice.reset_topic / voiceservice.get_audio_path

WebSocket

GET /ws - Audio streaming endpoint

Client to Server:

{ "type": "start", "topic": "optional-topic-sid", "audio_dir": "optional-dir" }
{ "type": "stop" }

Plus binary audio data (16-bit PCM, 16kHz, mono, little-endian)

Server to Client:

{ "type": "transcription", "text": "...", "is_final": true }
{ "type": "status", "message": "..." }
{ "type": "error", "message": "..." }

Static Files

  • GET /files/audio/{filename} - Audio file downloads
  • GET /files/transforms/{filename} - Transform file downloads

Audio Processing

  • Sample rate: 16kHz (required for Silero VAD)
  • Chunk size: 512 samples for VAD analysis
  • Silence threshold: 350ms triggers transcription
  • Speech threshold: 0.20 probability
  • Maximum buffer: 30 seconds before forced transcription
  • Compression: OGG Vorbis at quality 0.4 (~10% of WAV size)

Browser Support

  • Chrome 120+
  • Firefox 120+
  • Safari 17+
  • Edge 120+

Requires microphone permission and WebSocket support.

Embedding & CORS

Hero Voice allows iframe embedding (no X-Frame-Options restrictions), cross-origin API calls, and WebSocket connections from any origin.

License

Apache-2.0