Stateless OpenAI-compatible STT/TTS daemon via sherpa-onnx and Kokoro models.

Rust 98.4%
C 1.3%
Makefile 0.3%

Find a file

mik-tf 1839598c8c Some checks failed lab publish (gnu) / publish-gnu (push) Failing after 9m19s Details ci(lab-publish-gnu): authenticate git for private forge deps Configure the FORGE_TOKEN credential via git insteadOf and use the git CLI for cargo fetches, so the gnu build can clone private transitive dependencies. Matches the other Hero publish workflows. See lhumina_code/home#268 Signed-by: mik-tf <mik-tf@noreply.invalid>		2026-06-07 19:39:44 -04:00
.forgejo/workflows	ci(lab-publish-gnu): authenticate git for private forge deps	2026-06-07 19:39:44 -04:00
crates	chore: rename herolib_derive to herolib_macros	2026-06-06 21:30:00 +02:00
schema	chore: auto-commit local changes before pull	2026-05-31 23:51:05 +02:00
.gitignore	chore: remove Cargo.lock and update gitignore	2026-06-06 08:04:50 +02:00
Cargo.toml	chore: rename herolib_derive to herolib_macros	2026-06-06 21:30:00 +02:00
Cargo.toml.hero_builder_backup	Add OpenRPC spec, JSON-RPC layer, and workspace config	2026-05-26 11:22:35 +02:00
README.md	fix(rpc): pass request id directly to handlers; add dev profile and OpenRPC docs	2026-05-29 22:52:37 +02:00
rust-toolchain.toml	chore: migrate off hero_rpc patch, update rust-version to 1.96, pin hero_rpc_derive to df2f8b1	2026-06-01 14:03:39 +02:00
transcript.md	fix(rpc): pass request id directly to handlers; add dev profile and OpenRPC docs	2026-05-29 22:52:37 +02:00

README.md

hero_voice_provider

Stateless OpenAI-compatible STT and TTS daemon backed by sherpa-onnx Parakeet (speech-to-text) and Kokoro (text-to-speech).

What it does

Exposes an OpenAI-compatible HTTP API for local speech processing. No auth, no sessions, no database. Models are loaded once at startup and kept in memory.

Bind address

The daemon always listens on port 8094.

Linux: probes the local mycelium daemon at startup. If a mycelium overlay address is found, binds there so other nodes on the overlay can reach it. Falls back to 127.0.0.1 if mycelium is not running.
macOS / other: always binds on 127.0.0.1.

Connection details

Property	Value
Base URL	`http://127.0.0.1:8094`
Protocol	HTTP REST (OpenAI-compatible)
Auth	None

Unix sockets (hero_router)

Socket	Path	Protocol
REST	`$PATH_SOCKETS/hero_voice_provider/rest.sock`	HTTP REST
RPC	`$PATH_SOCKETS/hero_voice_provider/rpc.sock`	JSON-RPC 2.0 (OpenRPC)

The TCP interface is the easiest way to connect directly. Use the Unix sockets when routing through hero_router.

Endpoints

Method	Path	Description
`POST`	`/v1/audio/transcriptions`	STT — multipart form with `file` field (WAV, Ogg/Opus, WebM/Opus)
`POST`	`/v1/audio/speech`	TTS — JSON body `{input, voice?, response_format?, speed?}`
`GET`	`/v1/models`	List available local model IDs
`GET`	`/health`	`{"status": "ok"

Transcription request

POST /v1/audio/transcriptions
Content-Type: multipart/form-data

file=<audio bytes> # WAV (16 kHz mono), Ogg/Opus, or WebM/Opus
response_format=json|text # optional, default json
prompt=<hotwords> # optional, one hotword per line (boosts recognition)

Speech request

POST /v1/audio/speech
{
 "input": "Hello world",
 "voice": "af_heart",
 "response_format": "wav",
 "speed": 1.0
}

Supported response_format: wav (default), pcm (raw 16-bit signed little-endian mono).

JSON-RPC (OpenRPC)

The rpc.sock socket serves a JSON-RPC 2.0 interface at POST /rpc, with the OpenRPC document at GET /openrpc.json. This is the interface used when routing through hero_router.

Method	Params	Result
`health`	—	`{status, service, version, models_ready}`
`models.list`	—	`{object: "list", data: [{id, object, owned_by}]}`
`stt.transcribe`	`{audio_base64, content_type?}`	`{text}`
`tts.synthesize`	`{input, voice?, response_format?, speed?}`	`{audio_base64, content_type}`

content_type defaults to audio/wav; other types (Ogg/Opus, WebM/Opus) are decoded internally. response_format is wav (default) or pcm. Audio is base64-encoded in both directions.

curl -s --unix-socket "$PATH_SOCKETS/hero_voice_provider/rpc.sock" \
  -X POST http://localhost/rpc \
  -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":1,"method":"tts.synthesize","params":{"input":"Hello world"}}'

Models

ID	Engine
`parakeet-v2`	sherpa-onnx Parakeet TDT 0.6B v2 (STT)
`kokoro-en-v0_19`	Kokoro TTS
`tts`	alias for `kokoro-en-v0_19`

TTS voices

Voice	Speaker	Description
`af`	0	American female (default)
`af_bella`	1	American female — Bella
`af_nicole`	2	American female — Nicole
`af_sarah`	3	American female — Sarah
`af_sky`	4	American female — Sky
`am_adam`	5	American male — Adam
`am_michael`	6	American male — Michael
`bf_emma`	7	British female — Emma
`bf_isabella`	8	British female — Isabella
`bm_george`	9	British male — George
`bm_lewis`	10	British male — Lewis

Aliases: af_heart → af_bella, default → af.

Model bundles are downloaded automatically on first startup. STT degrades gracefully to 5xx if the bundle is unavailable; TTS fails on first call.

Health during startup

Model loading can take several minutes on a cold start. A minimal health endpoint is served immediately on startup and reports {"status": "starting"} until models are ready. This prevents hero_proc health probes from flapping during the load window.

Build

cargo build -p hero_voice_provider

Cross-compile to musl targets (Linux x86_64 / aarch64) is supported via the build.rs shim that resolves missing symbols in older musl sysroots.