- Rust 98.4%
- C 1.3%
- Makefile 0.3%
|
Some checks failed
lab publish (gnu) / publish-gnu (push) Failing after 9m19s
Configure the FORGE_TOKEN credential via git insteadOf and use the git CLI for cargo fetches, so the gnu build can clone private transitive dependencies. Matches the other Hero publish workflows. See lhumina_code/home#268 Signed-by: mik-tf <mik-tf@noreply.invalid> |
||
|---|---|---|
| .forgejo/workflows | ||
| crates | ||
| schema | ||
| .gitignore | ||
| Cargo.toml | ||
| Cargo.toml.hero_builder_backup | ||
| README.md | ||
| rust-toolchain.toml | ||
| transcript.md | ||
hero_voice_provider
Stateless OpenAI-compatible STT and TTS daemon backed by sherpa-onnx Parakeet (speech-to-text) and Kokoro (text-to-speech).
What it does
Exposes an OpenAI-compatible HTTP API for local speech processing. No auth, no sessions, no database. Models are loaded once at startup and kept in memory.
Bind address
The daemon always listens on port 8094.
- Linux: probes the local mycelium daemon at startup. If a mycelium overlay address is found, binds there so other nodes on the overlay can reach it. Falls back to
127.0.0.1if mycelium is not running. - macOS / other: always binds on
127.0.0.1.
Connection details
| Property | Value |
|---|---|
| Base URL | http://127.0.0.1:8094 |
| Protocol | HTTP REST (OpenAI-compatible) |
| Auth | None |
Unix sockets (hero_router)
| Socket | Path | Protocol |
|---|---|---|
| REST | $PATH_SOCKETS/hero_voice_provider/rest.sock |
HTTP REST |
| RPC | $PATH_SOCKETS/hero_voice_provider/rpc.sock |
JSON-RPC 2.0 (OpenRPC) |
The TCP interface is the easiest way to connect directly. Use the Unix sockets when routing through
hero_router.
Endpoints
| Method | Path | Description |
|---|---|---|
POST |
/v1/audio/transcriptions |
STT — multipart form with file field (WAV, Ogg/Opus, WebM/Opus) |
POST |
/v1/audio/speech |
TTS — JSON body {input, voice?, response_format?, speed?} |
GET |
/v1/models |
List available local model IDs |
GET |
/health |
`{"status": "ok" |
Transcription request
POST /v1/audio/transcriptions
Content-Type: multipart/form-data
file=<audio bytes> # WAV (16 kHz mono), Ogg/Opus, or WebM/Opus
response_format=json|text # optional, default json
prompt=<hotwords> # optional, one hotword per line (boosts recognition)
Speech request
POST /v1/audio/speech
{
"input": "Hello world",
"voice": "af_heart",
"response_format": "wav",
"speed": 1.0
}
Supported response_format: wav (default), pcm (raw 16-bit signed little-endian mono).
JSON-RPC (OpenRPC)
The rpc.sock socket serves a JSON-RPC 2.0 interface at POST /rpc, with the OpenRPC document at GET /openrpc.json. This is the interface used when routing through hero_router.
| Method | Params | Result |
|---|---|---|
health |
— | {status, service, version, models_ready} |
models.list |
— | {object: "list", data: [{id, object, owned_by}]} |
stt.transcribe |
{audio_base64, content_type?} |
{text} |
tts.synthesize |
{input, voice?, response_format?, speed?} |
{audio_base64, content_type} |
content_type defaults to audio/wav; other types (Ogg/Opus, WebM/Opus) are decoded internally. response_format is wav (default) or pcm. Audio is base64-encoded in both directions.
curl -s --unix-socket "$PATH_SOCKETS/hero_voice_provider/rpc.sock" \
-X POST http://localhost/rpc \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":1,"method":"tts.synthesize","params":{"input":"Hello world"}}'
Models
| ID | Engine |
|---|---|
parakeet-v2 |
sherpa-onnx Parakeet TDT 0.6B v2 (STT) |
kokoro-en-v0_19 |
Kokoro TTS |
tts |
alias for kokoro-en-v0_19 |
TTS voices
| Voice | Speaker | Description |
|---|---|---|
af |
0 | American female (default) |
af_bella |
1 | American female — Bella |
af_nicole |
2 | American female — Nicole |
af_sarah |
3 | American female — Sarah |
af_sky |
4 | American female — Sky |
am_adam |
5 | American male — Adam |
am_michael |
6 | American male — Michael |
bf_emma |
7 | British female — Emma |
bf_isabella |
8 | British female — Isabella |
bm_george |
9 | British male — George |
bm_lewis |
10 | British male — Lewis |
Aliases: af_heart → af_bella, default → af.
Model bundles are downloaded automatically on first startup. STT degrades gracefully to 5xx if the bundle is unavailable; TTS fails on first call.
Health during startup
Model loading can take several minutes on a cold start. A minimal health endpoint is served immediately on startup and reports {"status": "starting"} until models are ready. This prevents hero_proc health probes from flapping during the load window.
Build
cargo build -p hero_voice_provider
Cross-compile to musl targets (Linux x86_64 / aarch64) is supported via the build.rs shim that resolves missing symbols in older musl sysroots.