Add support for alternative transcription models #16
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Currently transcription is locked to a single model. Users working in different environments — with access to OpenRouter or a locally running Whisper instance — have no way to switch providers. Supporting multiple backends makes the tool usable in air-gapped or cost-sensitive setups.
Goals
Implementation Spec for Issue #16
Objective
Extend
hero_slidesso users can select, configure, and persist the transcription backend used byvoice.transcribe. Currently the server always callsAiClient::from_env()and hardcodesTranscriptionModel::WhisperLargeV3Turbovia the Groq provider. This spec adds:user.settings.get/user.settings.saveJSON-RPC pair backed by~/.config/hero_slides/settings.jsonon the server.Requirements
~/.config/hero_slides/settings.json(created on first save).transcriptionsub-object with:backend: one of"groq"|"openrouter"|"local_whisper"(default"groq").openrouter_api_key: optional string.openrouter_model: optional string (defaults to"openai/whisper-1").local_whisper_url: optional string (base URL, e.g."http://localhost:9000").user.settings.get(returns settings object) anduser.settings.save(partial update, returns{saved: true}).voice.transcribereads persisted settings at call time and constructs the correctAiClient.dirsis already present).openrpc.jsonis updated with the two new methods.Files to Modify
crates/hero_slides_lib/src/voice.rsTranscriptionBackendenum; updatevoice_transcribeto accept optional backend configcrates/hero_slides_server/src/rpc.rshandle_voice_transcribecrates/hero_slides_server/openrpc.jsonuser.settings.getanduser.settings.saveentriescrates/hero_slides_ui/templates/index.htmlcrates/hero_slides_ui/static/js/dashboard.jsloadSettings(),saveTranscriptionSettings(),onTranscriptionBackendChange()Implementation Plan
Step 1: Add
TranscriptionBackendenum tovoice.rsFiles:
crates/hero_slides_lib/src/voice.rs,crates/hero_slides_lib/src/lib.rspub enum TranscriptionBackend { Groq, OpenRouter { api_key, model }, LocalWhisper { base_url } }voice_transcribeto acceptbackend: Option<TranscriptionBackend>Groq/None: keep current behaviourOpenRouterandLocalWhisper: build the HTTP multipart POST directly (bypassAiClient::transcribe_bytes) to avoidTranscriptionModelenum limitationsTranscriptionBackendfromlib.rsDependencies: none
Step 2: Add settings persistence and RPC handlers in
rpc.rsFiles:
crates/hero_slides_server/src/rpc.rsUserSettingsandTranscriptionSettingsstructs (serde)settings_path(),load_user_settings(),save_user_settings()helpershandle_user_settings_get()andhandle_user_settings_save()async handlershandle_requestdispatchDependencies: none
Step 3: Update
handle_voice_transcribeto use settingsFiles:
crates/hero_slides_server/src/rpc.rsTranscriptionBackendfrom settingshero_slides_lib::voice_transcribeDependencies: Steps 1 and 2
Step 4: Update
openrpc.jsonFiles:
crates/hero_slides_server/openrpc.jsonuser.settings.getanduser.settings.savemethod entries with full schemasDependencies: none
Step 5: Add Settings card to
index.htmlFiles:
crates/hero_slides_ui/templates/index.html<div class="admin-section">inside#tab-admin<select>with options: Groq, OpenRouter, Local WhispersaveTranscriptionSettings()Dependencies: none
Step 6: Add JavaScript handlers in
dashboard.jsFiles:
crates/hero_slides_ui/static/js/dashboard.jsloadSettings(): callsuser.settings.getRPC, populates form fieldsonTranscriptionBackendChange(): shows/hides conditional fieldssaveTranscriptionSettings(): callsuser.settings.saveRPC, shows toastloadSettings()onDOMContentLoadedDependencies: Steps 2 and 5
Acceptance Criteria
user.settings.getreturns{ transcription: { backend, openrouter_api_key, openrouter_model, local_whisper_url } }user.settings.savepersists to~/.config/hero_slides/settings.jsonand returns{ saved: true }openrpc.jsonincludes both new method entriesNotes
TranscriptionModelenum inherolib_aionly has Groq-backed variants. For OpenRouter and local Whisper, build the multipart POST directly rather than going throughAiClient::transcribe_bytes, which cannot express arbitrary model names.AiClient::from_env().settings.json— acceptable for a single-user local server, same as env vars.dirscrate is already in bothCargo.tomlfiles; no new dependency needed.Implementation Spec for Issue #16 (revised)
Objective
Extend
hero_slidesso users can select, configure, and persist the transcription backend used byvoice.transcribe. This spec adds:user.settings.get/user.settings.saveJSON-RPC pair backed by~/.config/hero_slides/settings.jsonon the server./audio/transcriptionsendpoint (URL + model name). This covers Whisper, Voxtral, and any other compatible server.Requirements
~/.config/hero_slides/settings.json(created on first save).transcriptionsub-object with:backend: one of"groq"|"openrouter"|"local_model"(default"groq").openrouter_api_key: string (API key for OpenRouter).openrouter_model: string (model ID, e.g."openai/whisper-1"; default"openai/whisper-1").local_model_url: string (base URL, e.g."http://localhost:9000").local_model_name: string (model name sent in the multipart request, e.g."whisper-1","voxtral-1"; default"whisper-1").user.settings.getanduser.settings.save.voice.transcribereads persisted settings at call time and builds the correct HTTP request.dirsis already present).openrpc.jsonis updated with the two new methods.Files to Modify
crates/hero_slides_lib/src/voice.rsTranscriptionBackendenum; updatevoice_transcribeto accept optional backend configcrates/hero_slides_server/src/rpc.rshandle_voice_transcribecrates/hero_slides_server/openrpc.jsonuser.settings.getanduser.settings.saveentriescrates/hero_slides_ui/templates/index.htmlcrates/hero_slides_ui/static/js/dashboard.jsloadSettings(),saveTranscriptionSettings(),onTranscriptionBackendChange()Implementation Plan
Step 1: Add
TranscriptionBackendenum tovoice.rsFiles:
crates/hero_slides_lib/src/voice.rs,crates/hero_slides_lib/src/lib.rsvoice_transcribeto acceptbackend: Option<TranscriptionBackend>Groq/None: keep current behaviour (useAiClient::from_env())OpenRouterandLocalModel: build the multipart POST directly to avoidTranscriptionModelenum limitations — POST to{base_url}/audio/transcriptionswithmodelfield set to the user-supplied model nameTranscriptionBackendfromlib.rsDependencies: none
Step 2: Add settings persistence and RPC handlers in
rpc.rsFiles:
crates/hero_slides_server/src/rpc.rsTranscriptionSettingsstruct:UserSettings { transcription: TranscriptionSettings }structsettings_path(),load_user_settings(),save_user_settings()helpershandle_user_settings_get()andhandle_user_settings_save()handlershandle_requestdispatchDependencies: none
Step 3: Update
handle_voice_transcribeto use settingsFiles:
crates/hero_slides_server/src/rpc.rsTranscriptionBackendfrom settings (backendfield)hero_slides_lib::voice_transcribeDependencies: Steps 1 and 2
Step 4: Update
openrpc.jsonFiles:
crates/hero_slides_server/openrpc.jsonuser.settings.getanduser.settings.savemethod entries includinglocal_model_urlandlocal_model_namefieldsDependencies: none
Step 5: Add Settings card to
index.htmlFiles:
crates/hero_slides_ui/templates/index.html<div class="admin-section">inside#tab-admin<select>: Groq (default), OpenRouter, Local Model (OpenAI-compatible)openai/whisper-1)http://localhost:9000), model name input (placeholder:whisper-1, voxtral-1, ...)saveTranscriptionSettings()Dependencies: none
Step 6: Add JavaScript handlers in
dashboard.jsFiles:
crates/hero_slides_ui/static/js/dashboard.jsloadSettings(): callsuser.settings.get, populates all form fieldsonTranscriptionBackendChange(): shows/hides OpenRouter or Local Model fieldssaveTranscriptionSettings(): callsuser.settings.savewith all fields includinglocal_model_urlandlocal_model_nameloadSettings()onDOMContentLoadedDependencies: Steps 2 and 5
Acceptance Criteria
user.settings.getreturns{ transcription: { backend, openrouter_api_key, openrouter_model, local_model_url, local_model_name } }user.settings.savepersists to~/.config/hero_slides/settings.jsonand returns{ saved: true }openrpc.jsonincludes both new method entries withlocal_model_urlandlocal_model_nameNotes
local_model_namefield is sent as themodelparameter in the multipart POST to/audio/transcriptions. Any OpenAI-compatible server (Whisper.cpp, Voxtral, Faster-Whisper, etc.) reads this field — setting it correctly is the user's responsibility.TranscriptionModelenum inherolib_aionly covers Groq variants. OpenRouter and local model backends bypassAiClient::transcribe_bytesand build the multipart POST directly.Test Results
Breakdown by crate
All tests passed. Build completed in 19.01s. Two warnings (unused imports/variables) noted but no errors.
Implementation Summary
All changes have been implemented across 5 files.
Changes Made
crates/hero_slides_lib/src/voice.rsTranscriptionBackendenum with three variants:Groq,OpenRouter { api_key, model }, andLocalModel { base_url, model_name }voice_transcribeto acceptbackend: Option<TranscriptionBackend>as a fourth parameterNone/Groq: unchanged behaviour (usesAiClient::from_env()+ WhisperLargeV3Turbo)OpenRouterandLocalModel: build multipart POST directly to/audio/transcriptionsendpoint, bypassingAiClient::transcribe_bytesto allow arbitrary model names (e.g.voxtral-1)crates/hero_slides_lib/src/lib.rsTranscriptionBackendcrates/hero_slides_server/src/rpc.rsUserSettingsandTranscriptionSettingsstructs with serde derive and defaultssettings_path(),load_user_settings(),save_user_settings()helpers (backed by~/.config/hero_slides/settings.json)handle_user_settings_get()andhandle_user_settings_save()async handlershandle_requestdispatchhandle_voice_transcribeto load settings, resolve the correct backend, and pass it tovoice_transcribecrates/hero_slides_server/openrpc.jsonuser.settings.getanduser.settings.savemethod entries with full schemas includinglocal_model_urlandlocal_model_namefieldscrates/hero_slides_ui/templates/index.htmlcrates/hero_slides_ui/static/js/dashboard.jsloadSettings(): populates settings form fromuser.settings.geton page loadonTranscriptionBackendChange(): shows/hides conditional fields based on selected backendsaveTranscriptionSettings(): callsuser.settings.saveand shows success/error toastTest Results
Pull request opened: #18