No description
  • Rust 65.1%
  • JavaScript 23.8%
  • HTML 7.6%
  • CSS 2%
  • CMake 1.1%
  • Other 0.3%
Find a file
Timur Gordon e09ac7e0e3
All checks were successful
Build / build (push) Successful in 10m27s
Rename hero_service → hero_lifecycle (#55 downstream) (#35)
2026-05-19 02:32:47 +00:00
.cargo build: update local Cargo config for monorepo dev 2026-02-24 12:19:00 +02:00
.forgejo/workflows ci: put cargo on PATH; switch hero_admin_lib to git dep 2026-05-12 23:27:04 +02:00
.hero chore: update hero_ deps to 0.6.0 and add hero_builder artifacts 2026-05-10 14:36:23 +02:00
crates Rename hero_service → hero_lifecycle (#55 downstream) (#35) 2026-05-19 02:32:47 +00:00
docs Route browser audio through broker, drop /ws, add backend Silero VAD 2026-05-13 04:27:56 +00:00
schemas/voice feat: align hero_voice workspace structure with hero_services reference pattern 2026-02-24 11:16:17 +02:00
wasm/kws-vad Move STT/TTS to sherpa-onnx fork, add kws+vad wasm scaffold 2026-05-07 01:16:48 +00:00
.gitignore D-10 closure + herolib_ai v0.6.0 migration (transcribe/TTS) 2026-05-18 20:21:12 -04:00
apikeys.db Switch to direct AI client from hero_proc secret store 2026-05-16 00:19:27 +00:00
Cargo.lock D-10 closure + herolib_ai v0.6.0 migration (transcribe/TTS) 2026-05-18 20:21:12 -04:00
Cargo.toml Rename hero_service → hero_lifecycle (#55 downstream) (#35) 2026-05-19 02:32:47 +00:00
Cargo.toml.hero_builder_backup chore: update hero_ deps to 0.6.0 and add hero_builder artifacts 2026-05-10 14:36:23 +02:00
LICENSE feat: align hero_voice workspace structure with hero_services reference pattern 2026-02-24 11:16:17 +02:00
PURPOSE.md fix: logging compliance, socket naming, add PURPOSE.md 2026-05-07 12:58:12 +02:00
README.md Switch to direct AI client from hero_proc secret store 2026-05-16 00:19:27 +00:00
request_logs.db Switch to direct AI client from hero_proc secret store 2026-05-16 00:19:27 +00:00

Hero Voice

Voice-to-markdown transcription server with real-time speech recognition, AI-powered text transformation, and live preview.

Features

  • Real-time voice transcription - Stream audio from browser to server via WebSocket
  • Voice Activity Detection - Silero VAD V5 neural network detects speech/silence transitions
  • Automatic segmentation - Transcribes on natural pauses (350ms silence threshold)
  • AI transcription - Uses Groq WhisperLargeV3Turbo with automatic failover
  • Live markdown preview - Split-screen editor with real-time rendered HTML
  • Text transformations - 14 built-in AI transformation styles:
    • spellcheck - Grammar and spelling correction
    • specs - Technical specifications
    • code - Software architecture documentation
    • docs - User-friendly documentation
    • legal - Legal document formatting
    • story - Creative narrative
    • summary - Bullet-point summary
    • technical - Technical documentation
    • business - Business analysis
    • meeting - Meeting minutes
    • email - Professional email
    • Language translations: Dutch, French, Arabic
  • Topic organization - Hierarchical folder structure for transcriptions
  • Audio archival - Saves recordings as WAV and compressed OGG

Requirements

  • Rust 1.92+
  • A running hero_proc (provides the secret store; AI keys are read from it)
  • GROQ_API_KEY in hero_proc — required for the cloud STT/TTS fallback
  • OPENROUTER_API_KEY in hero_proc — required for the transform_content RPC
  • Modern browser with Web Audio API and microphone support

Configuration

hero_voice reads AI provider keys directly from the hero_proc secret store via herolib_ai_direct. There is no AI broker daemon in this path.

hero_proc secret set GROQ_API_KEY        gsk_...
hero_proc secret set OPENROUTER_API_KEY  sk-or-...

Optional environment variables:

Var Default Purpose
RUST_LOG (unset) Tracing filter, e.g. hero_voice=info
HERO_VOICED_PORT 8094 Local hero_voiced HTTP port — STT/TTS is tried here first
HERO_VOICE_LOCAL_DISABLE (unset) Set to 1 to skip the local fast path and go straight to cloud
HERO_VOICE_SHERPA_DIR ~/hero/share/hero_voice/voice-control/sherpa Browser-side sherpa WASM/data dir

Usage

service voice start --update --reset

Services listen on Unix sockets only (no TCP). Use hero_proxy for external access.

Sockets

Service Socket Path
Server (OpenRPC) ~/hero/var/sockets/hero_voice/rpc.sock
UI (HTTP + /rpc proxy + WebSocket) ~/hero/var/sockets/hero_voice/web.sock

hero_voiced — local OpenAI-compatible STT/TTS daemon

hero_voiced is a stateless TCP daemon that loads sherpa-onnx Parakeet (STT) and Kokoro (TTS) once and exposes them over an OpenAI-compatible API. hero_voice_admin calls it directly via herolib_ai_direct — overriding Provider::Groq's base URL to http://127.0.0.1:${HERO_VOICED_PORT:-8094}/v1 — and falls back to cloud Groq Whisper / Orpheus on error. Set HERO_VOICE_LOCAL_DISABLE=1 to skip the local fast path entirely.

Endpoints:

  • POST /v1/audio/transcriptions — multipart form (file, model, language, prompt, response_format). Default response {"text": "..."}.
  • POST /v1/audio/speech — JSON {model, input, voice, response_format, speed}. Supports response_format of wav (default) and pcm.
  • GET /v1/models — local engine identifiers.
  • GET /health{status, service, version, models_ready}.
  • GET /.well-known/heroservice.json — discovery manifest.

Environment:

Var Default Purpose
HERO_VOICED_PORT 8094 Loopback TCP port
HERO_VOICED_ADDRESS (unset) Optional second bind (e.g. mycelium IPv6)
HERO_VOICE_STT_SHERPA_DIR ~/hero/share/hero_voice/stt/parakeet Parakeet bundle dir
HERO_VOICE_TTS_KOKORO_DIR ~/hero/share/hero_voice/kokoro-en-v0_19 Kokoro bundle dir

Both bundle dirs auto-populate on first hero_voiced start (~770 MB combined download from the sherpa-onnx GitHub releases). make parakeet-deps / make tts-deps / make model-deps remain available for offline pre-bake on images and CI.

Run standalone:

make voiced
# or
cargo run -p hero_voiced

Architecture

Hero Voice follows the standard Hero three-crate model:

hero_voice/
├── crates/
│   ├── hero_voice/              # Core library (types, domain logic, audio, transcription)
│   ├── hero_voice_server/       # JSON-RPC 2.0 server over Unix socket
│   ├── hero_voice_sdk/          # Generated client SDK
│   ├── hero_voice_ui/           # Admin UI (Axum HTTP + /rpc proxy + WebSocket)
│   └── hero_voice_examples/     # Example programs using the SDK
├── schemas/voice/voice.oschema  # Domain schema (source of truth)
├── data/                        # Runtime data (OTOML storage, audio, transforms)
├── Cargo.toml
├── Makefile
└── buildenv.sh

Data flow

Browser (WebSocket)
    │
    ▼
hero_voice_ui (Unix socket)
    ├── /rpc endpoint → proxies JSON-RPC to hero_voice_server.sock
    ├── /mcp endpoint → MCP-to-OpenRPC translation
    ├── /ws  endpoint → WebSocket audio streaming
    └── /*   fallback → embedded static assets

hero_voice_server (Unix socket)
    ├── rpc.health   → {"status":"ok"}
    ├── rpc.discover → OpenRPC spec
    └── domain methods (folder.*, topic.*, voiceservice.*)

API

JSON-RPC Endpoint

All data operations use JSON-RPC 2.0 via the /rpc proxy on the UI socket.

Auto-generated CRUD (Topic and Folder root objects):

  • topic.new, topic.get, topic.set, topic.delete, topic.list
  • folder.new, folder.get, folder.set, folder.delete, folder.list

Custom service methods (VoiceService):

  • voiceservice.create_topic / voiceservice.create_folder
  • voiceservice.rename_topic / voiceservice.rename_folder
  • voiceservice.move_topic / voiceservice.move_folder
  • voiceservice.delete_topic / voiceservice.delete_folder
  • voiceservice.save_content / voiceservice.transform_content
  • voiceservice.register_audio / voiceservice.delete_audio
  • voiceservice.reset_topic / voiceservice.get_audio_path

WebSocket

GET /ws - Audio streaming endpoint

Client to Server:

{ "type": "start", "topic": "optional-topic-sid", "audio_dir": "optional-dir" }
{ "type": "stop" }

Plus binary audio data (16-bit PCM, 16kHz, mono, little-endian)

Server to Client:

{ "type": "transcription", "text": "...", "is_final": true }
{ "type": "status", "message": "..." }
{ "type": "error", "message": "..." }

Static Files

  • GET /files/audio/{filename} - Audio file downloads
  • GET /files/transforms/{filename} - Transform file downloads

Audio Processing

  • Sample rate: 16kHz (required for Silero VAD)
  • Chunk size: 512 samples for VAD analysis
  • Silence threshold: 350ms triggers transcription
  • Speech threshold: 0.20 probability
  • Maximum buffer: 30 seconds before forced transcription
  • Compression: OGG Vorbis at quality 0.4 (~10% of WAV size)

Browser Support

  • Chrome 120+
  • Firefox 120+
  • Safari 17+
  • Edge 120+

Requires microphone permission and WebSocket support.

Embedding & CORS

Hero Voice allows iframe embedding (no X-Frame-Options restrictions), cross-origin API calls, and WebSocket connections from any origin.

License

Apache-2.0