Stateless OpenAI-compatible STT/TTS daemon via sherpa-onnx and Kokoro models.
  • Rust 98.4%
  • C 1.3%
  • Makefile 0.3%
Find a file
mik-tf 1839598c8c
Some checks failed
lab publish (gnu) / publish-gnu (push) Failing after 9m19s
ci(lab-publish-gnu): authenticate git for private forge deps
Configure the FORGE_TOKEN credential via git insteadOf and use the git CLI for
cargo fetches, so the gnu build can clone private transitive dependencies.
Matches the other Hero publish workflows.

See lhumina_code/home#268

Signed-by: mik-tf <mik-tf@noreply.invalid>
2026-06-07 19:39:44 -04:00
.forgejo/workflows ci(lab-publish-gnu): authenticate git for private forge deps 2026-06-07 19:39:44 -04:00
crates chore: rename herolib_derive to herolib_macros 2026-06-06 21:30:00 +02:00
schema chore: auto-commit local changes before pull 2026-05-31 23:51:05 +02:00
.gitignore chore: remove Cargo.lock and update gitignore 2026-06-06 08:04:50 +02:00
Cargo.toml chore: rename herolib_derive to herolib_macros 2026-06-06 21:30:00 +02:00
Cargo.toml.hero_builder_backup Add OpenRPC spec, JSON-RPC layer, and workspace config 2026-05-26 11:22:35 +02:00
README.md fix(rpc): pass request id directly to handlers; add dev profile and OpenRPC docs 2026-05-29 22:52:37 +02:00
rust-toolchain.toml chore: migrate off hero_rpc patch, update rust-version to 1.96, pin hero_rpc_derive to df2f8b1 2026-06-01 14:03:39 +02:00
transcript.md fix(rpc): pass request id directly to handlers; add dev profile and OpenRPC docs 2026-05-29 22:52:37 +02:00

hero_voice_provider

Stateless OpenAI-compatible STT and TTS daemon backed by sherpa-onnx Parakeet (speech-to-text) and Kokoro (text-to-speech).

What it does

Exposes an OpenAI-compatible HTTP API for local speech processing. No auth, no sessions, no database. Models are loaded once at startup and kept in memory.

Bind address

The daemon always listens on port 8094.

  • Linux: probes the local mycelium daemon at startup. If a mycelium overlay address is found, binds there so other nodes on the overlay can reach it. Falls back to 127.0.0.1 if mycelium is not running.
  • macOS / other: always binds on 127.0.0.1.

Connection details

Property Value
Base URL http://127.0.0.1:8094
Protocol HTTP REST (OpenAI-compatible)
Auth None

Unix sockets (hero_router)

Socket Path Protocol
REST $PATH_SOCKETS/hero_voice_provider/rest.sock HTTP REST
RPC $PATH_SOCKETS/hero_voice_provider/rpc.sock JSON-RPC 2.0 (OpenRPC)

The TCP interface is the easiest way to connect directly. Use the Unix sockets when routing through hero_router.

Endpoints

Method Path Description
POST /v1/audio/transcriptions STT — multipart form with file field (WAV, Ogg/Opus, WebM/Opus)
POST /v1/audio/speech TTS — JSON body {input, voice?, response_format?, speed?}
GET /v1/models List available local model IDs
GET /health `{"status": "ok"

Transcription request

POST /v1/audio/transcriptions
Content-Type: multipart/form-data

file=<audio bytes> # WAV (16 kHz mono), Ogg/Opus, or WebM/Opus
response_format=json|text # optional, default json
prompt=<hotwords> # optional, one hotword per line (boosts recognition)

Speech request

POST /v1/audio/speech
{
 "input": "Hello world",
 "voice": "af_heart",
 "response_format": "wav",
 "speed": 1.0
}

Supported response_format: wav (default), pcm (raw 16-bit signed little-endian mono).

JSON-RPC (OpenRPC)

The rpc.sock socket serves a JSON-RPC 2.0 interface at POST /rpc, with the OpenRPC document at GET /openrpc.json. This is the interface used when routing through hero_router.

Method Params Result
health {status, service, version, models_ready}
models.list {object: "list", data: [{id, object, owned_by}]}
stt.transcribe {audio_base64, content_type?} {text}
tts.synthesize {input, voice?, response_format?, speed?} {audio_base64, content_type}

content_type defaults to audio/wav; other types (Ogg/Opus, WebM/Opus) are decoded internally. response_format is wav (default) or pcm. Audio is base64-encoded in both directions.

curl -s --unix-socket "$PATH_SOCKETS/hero_voice_provider/rpc.sock" \
  -X POST http://localhost/rpc \
  -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":1,"method":"tts.synthesize","params":{"input":"Hello world"}}'

Models

ID Engine
parakeet-v2 sherpa-onnx Parakeet TDT 0.6B v2 (STT)
kokoro-en-v0_19 Kokoro TTS
tts alias for kokoro-en-v0_19

TTS voices

Voice Speaker Description
af 0 American female (default)
af_bella 1 American female — Bella
af_nicole 2 American female — Nicole
af_sarah 3 American female — Sarah
af_sky 4 American female — Sky
am_adam 5 American male — Adam
am_michael 6 American male — Michael
bf_emma 7 British female — Emma
bf_isabella 8 British female — Isabella
bm_george 9 British male — George
bm_lewis 10 British male — Lewis

Aliases: af_heartaf_bella, defaultaf.

Model bundles are downloaded automatically on first startup. STT degrades gracefully to 5xx if the bundle is unavailable; TTS fails on first call.

Health during startup

Model loading can take several minutes on a cold start. A minimal health endpoint is served immediately on startup and reports {"status": "starting"} until models are ready. This prevents hero_proc health probes from flapping during the load window.

Build

cargo build -p hero_voice_provider

Cross-compile to musl targets (Linux x86_64 / aarch64) is supported via the build.rs shim that resolves missing symbols in older musl sysroots.