lhumina_code/hero_slides

Fork 0

Add support for alternative transcription models #16

New issue

Closed

opened 2026-04-14 10:01:35 +00:00 by casper-stevens · 5 comments

casper-stevens commented

2026-04-14 10:01:35 +00:00

Member

Context

Currently transcription is locked to a single model. Users working in different environments — with access to OpenRouter or a locally running Whisper instance — have no way to switch providers. Supporting multiple backends makes the tool usable in air-gapped or cost-sensitive setups.

Goals

Add a transcription model selector in the settings UI
Support OpenRouter as a transcription backend (configurable API key and model name)
Support a local Whisper model as a transcription backend (configurable endpoint URL)
Persist the selected backend and its configuration alongside other user settings
Fall back gracefully with a clear error message when the selected backend is unreachable

## Context Currently transcription is locked to a single model. Users working in different environments — with access to OpenRouter or a locally running Whisper instance — have no way to switch providers. Supporting multiple backends makes the tool usable in air-gapped or cost-sensitive setups. ## Goals - Add a transcription model selector in the settings UI - Support OpenRouter as a transcription backend (configurable API key and model name) - Support a local Whisper model as a transcription backend (configurable endpoint URL) - Persist the selected backend and its configuration alongside other user settings - Fall back gracefully with a clear error message when the selected backend is unreachable

casper-stevens commented

2026-04-14 10:11:08 +00:00

Author

Member

Implementation Spec for Issue #16

Objective

Extend hero_slides so users can select, configure, and persist the transcription backend used by voice.transcribe. Currently the server always calls AiClient::from_env() and hardcodes TranscriptionModel::WhisperLargeV3Turbo via the Groq provider. This spec adds:

A user.settings.get / user.settings.save JSON-RPC pair backed by ~/.config/hero_slides/settings.json on the server.
A transcription backend selector in the Settings UI (a new card in the Admin tab).
Support for OpenRouter as a transcription backend (API key + model name).
Support for a local Whisper backend (OpenAI-compatible endpoint URL).
Graceful fallback with a clear error message when the selected backend is unreachable.

Requirements

The server persists user settings in ~/.config/hero_slides/settings.json (created on first save).
The settings object includes a transcription sub-object with:
- backend: one of "groq" | "openrouter" | "local_whisper" (default "groq").
- openrouter_api_key: optional string.
- openrouter_model: optional string (defaults to "openai/whisper-1").
- local_whisper_url: optional string (base URL, e.g. "http://localhost:9000").
Two new RPC methods: user.settings.get (returns settings object) and user.settings.save (partial update, returns {saved: true}).
voice.transcribe reads persisted settings at call time and constructs the correct AiClient.
If the selected backend is unreachable, the error message names the backend and gives actionable guidance.
The Admin tab gains a "Transcription Settings" card to read and write these settings.
No new external Rust crates required (dirs is already present).
openrpc.json is updated with the two new methods.

Files to Modify

File	Action	Description
`crates/hero_slides_lib/src/voice.rs`	Modify	Add `TranscriptionBackend` enum; update `voice_transcribe` to accept optional backend config
`crates/hero_slides_server/src/rpc.rs`	Modify	Add settings structs, persistence helpers, two RPC handlers, update `handle_voice_transcribe`
`crates/hero_slides_server/openrpc.json`	Modify	Add `user.settings.get` and `user.settings.save` entries
`crates/hero_slides_ui/templates/index.html`	Modify	Add Transcription Settings card in Admin tab
`crates/hero_slides_ui/static/js/dashboard.js`	Modify	Add `loadSettings()`, `saveTranscriptionSettings()`, `onTranscriptionBackendChange()`

Implementation Plan

Step 1: Add `TranscriptionBackend` enum to `voice.rs`

Files: crates/hero_slides_lib/src/voice.rs, crates/hero_slides_lib/src/lib.rs

Add pub enum TranscriptionBackend { Groq, OpenRouter { api_key, model }, LocalWhisper { base_url } }
Update voice_transcribe to accept backend: Option<TranscriptionBackend>
For Groq/None: keep current behaviour
For OpenRouter and LocalWhisper: build the HTTP multipart POST directly (bypass AiClient::transcribe_bytes) to avoid TranscriptionModel enum limitations
Re-export TranscriptionBackend from lib.rs
Dependencies: none

Step 2: Add settings persistence and RPC handlers in `rpc.rs`

Files: crates/hero_slides_server/src/rpc.rs

Add UserSettings and TranscriptionSettings structs (serde)
Add settings_path(), load_user_settings(), save_user_settings() helpers
Add handle_user_settings_get() and handle_user_settings_save() async handlers
Wire into handle_request dispatch
Dependencies: none

Step 3: Update `handle_voice_transcribe` to use settings

Files: crates/hero_slides_server/src/rpc.rs

Load settings at start of handler
Resolve TranscriptionBackend from settings
Pass backend into updated hero_slides_lib::voice_transcribe
Map errors to include backend label
Dependencies: Steps 1 and 2

Step 4: Update `openrpc.json`

Files: crates/hero_slides_server/openrpc.json

Append user.settings.get and user.settings.save method entries with full schemas
Dependencies: none

Step 5: Add Settings card to `index.html`

Files: crates/hero_slides_ui/templates/index.html

Add a "Transcription Settings" <div class="admin-section"> inside #tab-admin
Backend <select> with options: Groq, OpenRouter, Local Whisper
Conditional fields for OpenRouter (API key, model) and Local Whisper (URL)
Save button calling saveTranscriptionSettings()
Dependencies: none

Step 6: Add JavaScript handlers in `dashboard.js`

Files: crates/hero_slides_ui/static/js/dashboard.js

loadSettings(): calls user.settings.get RPC, populates form fields
onTranscriptionBackendChange(): shows/hides conditional fields
saveTranscriptionSettings(): calls user.settings.save RPC, shows toast
Call loadSettings() on DOMContentLoaded
Dependencies: Steps 2 and 5

Acceptance Criteria

user.settings.get returns { transcription: { backend, openrouter_api_key, openrouter_model, local_whisper_url } }
user.settings.save persists to ~/.config/hero_slides/settings.json and returns { saved: true }
Settings survive a server restart
Groq backend works identically to current implementation
OpenRouter backend successfully transcribes with a valid API key
Local Whisper backend works against a running local server
Unreachable backend returns a named, actionable error message
Admin tab shows the Transcription Settings card with conditional fields
Page load pre-populates form with persisted settings
openrpc.json includes both new method entries

Notes

TranscriptionModel enum in herolib_ai only has Groq-backed variants. For OpenRouter and local Whisper, build the multipart POST directly rather than going through AiClient::transcribe_bytes, which cannot express arbitrary model names.
The Mercury2 cleanup step (chat completion after transcription) is not affected — it continues to use AiClient::from_env().
The OpenRouter API key is stored in plaintext in settings.json — acceptable for a single-user local server, same as env vars.
dirs crate is already in both Cargo.toml files; no new dependency needed.

## Implementation Spec for Issue #16 ### Objective Extend `hero_slides` so users can select, configure, and persist the transcription backend used by `voice.transcribe`. Currently the server always calls `AiClient::from_env()` and hardcodes `TranscriptionModel::WhisperLargeV3Turbo` via the Groq provider. This spec adds: 1. A `user.settings.get` / `user.settings.save` JSON-RPC pair backed by `~/.config/hero_slides/settings.json` on the server. 2. A transcription backend selector in the Settings UI (a new card in the Admin tab). 3. Support for **OpenRouter** as a transcription backend (API key + model name). 4. Support for a **local Whisper** backend (OpenAI-compatible endpoint URL). 5. Graceful fallback with a clear error message when the selected backend is unreachable. --- ### Requirements - The server persists user settings in `~/.config/hero_slides/settings.json` (created on first save). - The settings object includes a `transcription` sub-object with: - `backend`: one of `"groq"` | `"openrouter"` | `"local_whisper"` (default `"groq"`). - `openrouter_api_key`: optional string. - `openrouter_model`: optional string (defaults to `"openai/whisper-1"`). - `local_whisper_url`: optional string (base URL, e.g. `"http://localhost:9000"`). - Two new RPC methods: `user.settings.get` (returns settings object) and `user.settings.save` (partial update, returns `{saved: true}`). - `voice.transcribe` reads persisted settings at call time and constructs the correct `AiClient`. - If the selected backend is unreachable, the error message names the backend and gives actionable guidance. - The Admin tab gains a "Transcription Settings" card to read and write these settings. - No new external Rust crates required (`dirs` is already present). - `openrpc.json` is updated with the two new methods. --- ### Files to Modify | File | Action | Description | |---|---|---| | `crates/hero_slides_lib/src/voice.rs` | Modify | Add `TranscriptionBackend` enum; update `voice_transcribe` to accept optional backend config | | `crates/hero_slides_server/src/rpc.rs` | Modify | Add settings structs, persistence helpers, two RPC handlers, update `handle_voice_transcribe` | | `crates/hero_slides_server/openrpc.json` | Modify | Add `user.settings.get` and `user.settings.save` entries | | `crates/hero_slides_ui/templates/index.html` | Modify | Add Transcription Settings card in Admin tab | | `crates/hero_slides_ui/static/js/dashboard.js` | Modify | Add `loadSettings()`, `saveTranscriptionSettings()`, `onTranscriptionBackendChange()` | --- ### Implementation Plan #### Step 1: Add `TranscriptionBackend` enum to `voice.rs` Files: `crates/hero_slides_lib/src/voice.rs`, `crates/hero_slides_lib/src/lib.rs` - Add `pub enum TranscriptionBackend { Groq, OpenRouter { api_key, model }, LocalWhisper { base_url } }` - Update `voice_transcribe` to accept `backend: Option<TranscriptionBackend>` - For `Groq`/`None`: keep current behaviour - For `OpenRouter` and `LocalWhisper`: build the HTTP multipart POST directly (bypass `AiClient::transcribe_bytes`) to avoid `TranscriptionModel` enum limitations - Re-export `TranscriptionBackend` from `lib.rs` Dependencies: none #### Step 2: Add settings persistence and RPC handlers in `rpc.rs` Files: `crates/hero_slides_server/src/rpc.rs` - Add `UserSettings` and `TranscriptionSettings` structs (serde) - Add `settings_path()`, `load_user_settings()`, `save_user_settings()` helpers - Add `handle_user_settings_get()` and `handle_user_settings_save()` async handlers - Wire into `handle_request` dispatch Dependencies: none #### Step 3: Update `handle_voice_transcribe` to use settings Files: `crates/hero_slides_server/src/rpc.rs` - Load settings at start of handler - Resolve `TranscriptionBackend` from settings - Pass backend into updated `hero_slides_lib::voice_transcribe` - Map errors to include backend label Dependencies: Steps 1 and 2 #### Step 4: Update `openrpc.json` Files: `crates/hero_slides_server/openrpc.json` - Append `user.settings.get` and `user.settings.save` method entries with full schemas Dependencies: none #### Step 5: Add Settings card to `index.html` Files: `crates/hero_slides_ui/templates/index.html` - Add a "Transcription Settings" `<div class="admin-section">` inside `#tab-admin` - Backend `<select>` with options: Groq, OpenRouter, Local Whisper - Conditional fields for OpenRouter (API key, model) and Local Whisper (URL) - Save button calling `saveTranscriptionSettings()` Dependencies: none #### Step 6: Add JavaScript handlers in `dashboard.js` Files: `crates/hero_slides_ui/static/js/dashboard.js` - `loadSettings()`: calls `user.settings.get` RPC, populates form fields - `onTranscriptionBackendChange()`: shows/hides conditional fields - `saveTranscriptionSettings()`: calls `user.settings.save` RPC, shows toast - Call `loadSettings()` on `DOMContentLoaded` Dependencies: Steps 2 and 5 --- ### Acceptance Criteria - [ ] `user.settings.get` returns `{ transcription: { backend, openrouter_api_key, openrouter_model, local_whisper_url } }` - [ ] `user.settings.save` persists to `~/.config/hero_slides/settings.json` and returns `{ saved: true }` - [ ] Settings survive a server restart - [ ] Groq backend works identically to current implementation - [ ] OpenRouter backend successfully transcribes with a valid API key - [ ] Local Whisper backend works against a running local server - [ ] Unreachable backend returns a named, actionable error message - [ ] Admin tab shows the Transcription Settings card with conditional fields - [ ] Page load pre-populates form with persisted settings - [ ] `openrpc.json` includes both new method entries --- ### Notes - `TranscriptionModel` enum in `herolib_ai` only has Groq-backed variants. For OpenRouter and local Whisper, build the multipart POST directly rather than going through `AiClient::transcribe_bytes`, which cannot express arbitrary model names. - The Mercury2 cleanup step (chat completion after transcription) is not affected — it continues to use `AiClient::from_env()`. - The OpenRouter API key is stored in plaintext in `settings.json` — acceptable for a single-user local server, same as env vars. - `dirs` crate is already in both `Cargo.toml` files; no new dependency needed.

casper-stevens commented

2026-04-14 10:31:27 +00:00

Author

Member

Implementation Spec for Issue #16 (revised)

Objective

Extend hero_slides so users can select, configure, and persist the transcription backend used by voice.transcribe. This spec adds:

A user.settings.get / user.settings.save JSON-RPC pair backed by ~/.config/hero_slides/settings.json on the server.
A transcription backend selector in the Settings UI (a new card in the Admin tab).
Support for OpenRouter as a transcription backend (API key + model name).
Support for any local model with an OpenAI-compatible /audio/transcriptions endpoint (URL + model name). This covers Whisper, Voxtral, and any other compatible server.
Graceful fallback with a clear error message when the selected backend is unreachable.

Requirements

The server persists user settings in ~/.config/hero_slides/settings.json (created on first save).
The settings object includes a transcription sub-object with:
- backend: one of "groq" | "openrouter" | "local_model" (default "groq").
- openrouter_api_key: string (API key for OpenRouter).
- openrouter_model: string (model ID, e.g. "openai/whisper-1"; default "openai/whisper-1").
- local_model_url: string (base URL, e.g. "http://localhost:9000").
- local_model_name: string (model name sent in the multipart request, e.g. "whisper-1", "voxtral-1"; default "whisper-1").
Two new RPC methods: user.settings.get and user.settings.save.
voice.transcribe reads persisted settings at call time and builds the correct HTTP request.
Unreachable backend returns an error naming the backend with actionable guidance.
The Admin tab gains a "Transcription Settings" card.
No new external Rust crates required (dirs is already present).
openrpc.json is updated with the two new methods.

Files to Modify

File	Action	Description
`crates/hero_slides_lib/src/voice.rs`	Modify	Add `TranscriptionBackend` enum; update `voice_transcribe` to accept optional backend config
`crates/hero_slides_server/src/rpc.rs`	Modify	Add settings structs, persistence helpers, two RPC handlers, update `handle_voice_transcribe`
`crates/hero_slides_server/openrpc.json`	Modify	Add `user.settings.get` and `user.settings.save` entries
`crates/hero_slides_ui/templates/index.html`	Modify	Add Transcription Settings card in Admin tab
`crates/hero_slides_ui/static/js/dashboard.js`	Modify	Add `loadSettings()`, `saveTranscriptionSettings()`, `onTranscriptionBackendChange()`

Implementation Plan

Step 1: Add `TranscriptionBackend` enum to `voice.rs`

Files: crates/hero_slides_lib/src/voice.rs, crates/hero_slides_lib/src/lib.rs

Add:

pub enum TranscriptionBackend {
    Groq,
    OpenRouter { api_key: String, model: String },
    LocalModel { base_url: String, model_name: String },
}

Update voice_transcribe to accept backend: Option<TranscriptionBackend>
For Groq/None: keep current behaviour (use AiClient::from_env())
For OpenRouter and LocalModel: build the multipart POST directly to avoid TranscriptionModel enum limitations — POST to {base_url}/audio/transcriptions with model field set to the user-supplied model name
Re-export TranscriptionBackend from lib.rs
Dependencies: none

Step 2: Add settings persistence and RPC handlers in `rpc.rs`

Files: crates/hero_slides_server/src/rpc.rs

Add TranscriptionSettings struct:

pub struct TranscriptionSettings {
    pub backend: String,           // "groq" | "openrouter" | "local_model"
    pub openrouter_api_key: String,
    pub openrouter_model: String,  // default "openai/whisper-1"
    pub local_model_url: String,   // e.g. "http://localhost:9000"
    pub local_model_name: String,  // e.g. "whisper-1", "voxtral-1"
}

Add UserSettings { transcription: TranscriptionSettings } struct
Add settings_path(), load_user_settings(), save_user_settings() helpers
Add handle_user_settings_get() and handle_user_settings_save() handlers
Wire into handle_request dispatch
Dependencies: none

Step 3: Update `handle_voice_transcribe` to use settings

Files: crates/hero_slides_server/src/rpc.rs

Load settings at start of handler
Resolve TranscriptionBackend from settings (backend field)
Pass into updated hero_slides_lib::voice_transcribe
Map errors to include backend label
Dependencies: Steps 1 and 2

Step 4: Update `openrpc.json`

Files: crates/hero_slides_server/openrpc.json

Append user.settings.get and user.settings.save method entries including local_model_url and local_model_name fields
Dependencies: none

Step 5: Add Settings card to `index.html`

Files: crates/hero_slides_ui/templates/index.html

Add "Transcription Settings" <div class="admin-section"> inside #tab-admin
Backend <select>: Groq (default), OpenRouter, Local Model (OpenAI-compatible)
OpenRouter fields: API key input, model input (placeholder: openai/whisper-1)
Local Model fields: URL input (placeholder: http://localhost:9000), model name input (placeholder: whisper-1, voxtral-1, ...)
Save button calling saveTranscriptionSettings()
Dependencies: none

Step 6: Add JavaScript handlers in `dashboard.js`

Files: crates/hero_slides_ui/static/js/dashboard.js

loadSettings(): calls user.settings.get, populates all form fields
onTranscriptionBackendChange(): shows/hides OpenRouter or Local Model fields
saveTranscriptionSettings(): calls user.settings.save with all fields including local_model_url and local_model_name
Call loadSettings() on DOMContentLoaded
Dependencies: Steps 2 and 5

Acceptance Criteria

user.settings.get returns { transcription: { backend, openrouter_api_key, openrouter_model, local_model_url, local_model_name } }
user.settings.save persists to ~/.config/hero_slides/settings.json and returns { saved: true }
Settings survive a server restart
Groq backend works identically to current implementation
OpenRouter backend transcribes successfully with a valid API key and model
Local Model backend works against any OpenAI-compatible transcription server (tested with Whisper and/or Voxtral)
Unreachable backend returns a named, actionable error
Admin tab shows the Transcription Settings card with correct conditional fields
Local Model fields show URL and model name inputs with descriptive placeholders
Page load pre-populates form with persisted settings
openrpc.json includes both new method entries with local_model_url and local_model_name

Notes

The local_model_name field is sent as the model parameter in the multipart POST to /audio/transcriptions. Any OpenAI-compatible server (Whisper.cpp, Voxtral, Faster-Whisper, etc.) reads this field — setting it correctly is the user's responsibility.
TranscriptionModel enum in herolib_ai only covers Groq variants. OpenRouter and local model backends bypass AiClient::transcribe_bytes and build the multipart POST directly.
Mercury2 cleanup (chat completion after transcription) is not affected by this change.
The OpenRouter API key is stored in plaintext — acceptable for a single-user local server.

## Implementation Spec for Issue #16 (revised) ### Objective Extend `hero_slides` so users can select, configure, and persist the transcription backend used by `voice.transcribe`. This spec adds: 1. A `user.settings.get` / `user.settings.save` JSON-RPC pair backed by `~/.config/hero_slides/settings.json` on the server. 2. A transcription backend selector in the Settings UI (a new card in the Admin tab). 3. Support for **OpenRouter** as a transcription backend (API key + model name). 4. Support for any **local model** with an OpenAI-compatible `/audio/transcriptions` endpoint (URL + model name). This covers Whisper, Voxtral, and any other compatible server. 5. Graceful fallback with a clear error message when the selected backend is unreachable. --- ### Requirements - The server persists user settings in `~/.config/hero_slides/settings.json` (created on first save). - The settings object includes a `transcription` sub-object with: - `backend`: one of `"groq"` | `"openrouter"` | `"local_model"` (default `"groq"`). - `openrouter_api_key`: string (API key for OpenRouter). - `openrouter_model`: string (model ID, e.g. `"openai/whisper-1"`; default `"openai/whisper-1"`). - `local_model_url`: string (base URL, e.g. `"http://localhost:9000"`). - `local_model_name`: string (model name sent in the multipart request, e.g. `"whisper-1"`, `"voxtral-1"`; default `"whisper-1"`). - Two new RPC methods: `user.settings.get` and `user.settings.save`. - `voice.transcribe` reads persisted settings at call time and builds the correct HTTP request. - Unreachable backend returns an error naming the backend with actionable guidance. - The Admin tab gains a "Transcription Settings" card. - No new external Rust crates required (`dirs` is already present). - `openrpc.json` is updated with the two new methods. --- ### Files to Modify | File | Action | Description | |---|---|---| | `crates/hero_slides_lib/src/voice.rs` | Modify | Add `TranscriptionBackend` enum; update `voice_transcribe` to accept optional backend config | | `crates/hero_slides_server/src/rpc.rs` | Modify | Add settings structs, persistence helpers, two RPC handlers, update `handle_voice_transcribe` | | `crates/hero_slides_server/openrpc.json` | Modify | Add `user.settings.get` and `user.settings.save` entries | | `crates/hero_slides_ui/templates/index.html` | Modify | Add Transcription Settings card in Admin tab | | `crates/hero_slides_ui/static/js/dashboard.js` | Modify | Add `loadSettings()`, `saveTranscriptionSettings()`, `onTranscriptionBackendChange()` | --- ### Implementation Plan #### Step 1: Add `TranscriptionBackend` enum to `voice.rs` Files: `crates/hero_slides_lib/src/voice.rs`, `crates/hero_slides_lib/src/lib.rs` - Add: ```rust pub enum TranscriptionBackend { Groq, OpenRouter { api_key: String, model: String }, LocalModel { base_url: String, model_name: String }, } ``` - Update `voice_transcribe` to accept `backend: Option<TranscriptionBackend>` - For `Groq`/`None`: keep current behaviour (use `AiClient::from_env()`) - For `OpenRouter` and `LocalModel`: build the multipart POST directly to avoid `TranscriptionModel` enum limitations — POST to `{base_url}/audio/transcriptions` with `model` field set to the user-supplied model name - Re-export `TranscriptionBackend` from `lib.rs` Dependencies: none #### Step 2: Add settings persistence and RPC handlers in `rpc.rs` Files: `crates/hero_slides_server/src/rpc.rs` - Add `TranscriptionSettings` struct: ```rust pub struct TranscriptionSettings { pub backend: String, // "groq" | "openrouter" | "local_model" pub openrouter_api_key: String, pub openrouter_model: String, // default "openai/whisper-1" pub local_model_url: String, // e.g. "http://localhost:9000" pub local_model_name: String, // e.g. "whisper-1", "voxtral-1" } ``` - Add `UserSettings { transcription: TranscriptionSettings }` struct - Add `settings_path()`, `load_user_settings()`, `save_user_settings()` helpers - Add `handle_user_settings_get()` and `handle_user_settings_save()` handlers - Wire into `handle_request` dispatch Dependencies: none #### Step 3: Update `handle_voice_transcribe` to use settings Files: `crates/hero_slides_server/src/rpc.rs` - Load settings at start of handler - Resolve `TranscriptionBackend` from settings (`backend` field) - Pass into updated `hero_slides_lib::voice_transcribe` - Map errors to include backend label Dependencies: Steps 1 and 2 #### Step 4: Update `openrpc.json` Files: `crates/hero_slides_server/openrpc.json` - Append `user.settings.get` and `user.settings.save` method entries including `local_model_url` and `local_model_name` fields Dependencies: none #### Step 5: Add Settings card to `index.html` Files: `crates/hero_slides_ui/templates/index.html` - Add "Transcription Settings" `<div class="admin-section">` inside `#tab-admin` - Backend `<select>`: Groq (default), OpenRouter, Local Model (OpenAI-compatible) - OpenRouter fields: API key input, model input (placeholder: `openai/whisper-1`) - Local Model fields: URL input (placeholder: `http://localhost:9000`), model name input (placeholder: `whisper-1, voxtral-1, ...`) - Save button calling `saveTranscriptionSettings()` Dependencies: none #### Step 6: Add JavaScript handlers in `dashboard.js` Files: `crates/hero_slides_ui/static/js/dashboard.js` - `loadSettings()`: calls `user.settings.get`, populates all form fields - `onTranscriptionBackendChange()`: shows/hides OpenRouter or Local Model fields - `saveTranscriptionSettings()`: calls `user.settings.save` with all fields including `local_model_url` and `local_model_name` - Call `loadSettings()` on `DOMContentLoaded` Dependencies: Steps 2 and 5 --- ### Acceptance Criteria - [ ] `user.settings.get` returns `{ transcription: { backend, openrouter_api_key, openrouter_model, local_model_url, local_model_name } }` - [ ] `user.settings.save` persists to `~/.config/hero_slides/settings.json` and returns `{ saved: true }` - [ ] Settings survive a server restart - [ ] Groq backend works identically to current implementation - [ ] OpenRouter backend transcribes successfully with a valid API key and model - [ ] Local Model backend works against any OpenAI-compatible transcription server (tested with Whisper and/or Voxtral) - [ ] Unreachable backend returns a named, actionable error - [ ] Admin tab shows the Transcription Settings card with correct conditional fields - [ ] Local Model fields show URL and model name inputs with descriptive placeholders - [ ] Page load pre-populates form with persisted settings - [ ] `openrpc.json` includes both new method entries with `local_model_url` and `local_model_name` --- ### Notes - The `local_model_name` field is sent as the `model` parameter in the multipart POST to `/audio/transcriptions`. Any OpenAI-compatible server (Whisper.cpp, Voxtral, Faster-Whisper, etc.) reads this field — setting it correctly is the user's responsibility. - `TranscriptionModel` enum in `herolib_ai` only covers Groq variants. OpenRouter and local model backends bypass `AiClient::transcribe_bytes` and build the multipart POST directly. - Mercury2 cleanup (chat completion after transcription) is not affected by this change. - The OpenRouter API key is stored in plaintext — acceptable for a single-user local server.

casper-stevens commented

2026-04-14 10:41:36 +00:00

Author

Member

Test Results

Result: PASS
Total: 75
Passed: 74
Failed: 0
Ignored: 1 (test_generate_single_slide_ai - skipped, requires AI service)

Breakdown by crate

Crate	Tests	Passed	Ignored
hero_slides (bin)	0	0	0
hero_slides_lib (unit)	47	47	0
hero_slides_lib (integration)	13	12	1
hero_slides_rhai	0	0	0
hero_slides_sdk	0	0	0
hero_slides_server	0	0	0
hero_slides_ui	0	0	0
doc-tests hero_slides_lib	2	2	0
doc-tests hero_slides_rhai	1	1	0
doc-tests hero_slides_sdk	0	0	0

All tests passed. Build completed in 19.01s. Two warnings (unused imports/variables) noted but no errors.

## Test Results - Result: PASS - Total: 75 - Passed: 74 - Failed: 0 - Ignored: 1 (test_generate_single_slide_ai - skipped, requires AI service) ### Breakdown by crate | Crate | Tests | Passed | Ignored | |---|---|---|---| | hero_slides (bin) | 0 | 0 | 0 | | hero_slides_lib (unit) | 47 | 47 | 0 | | hero_slides_lib (integration) | 13 | 12 | 1 | | hero_slides_rhai | 0 | 0 | 0 | | hero_slides_sdk | 0 | 0 | 0 | | hero_slides_server | 0 | 0 | 0 | | hero_slides_ui | 0 | 0 | 0 | | doc-tests hero_slides_lib | 2 | 2 | 0 | | doc-tests hero_slides_rhai | 1 | 1 | 0 | | doc-tests hero_slides_sdk | 0 | 0 | 0 | All tests passed. Build completed in 19.01s. Two warnings (unused imports/variables) noted but no errors.

casper-stevens commented

2026-04-14 10:42:12 +00:00

Author

Member

Implementation Summary

All changes have been implemented across 5 files.

Changes Made

crates/hero_slides_lib/src/voice.rs

Added TranscriptionBackend enum with three variants: Groq, OpenRouter { api_key, model }, and LocalModel { base_url, model_name }
Updated voice_transcribe to accept backend: Option<TranscriptionBackend> as a fourth parameter
None / Groq: unchanged behaviour (uses AiClient::from_env() + WhisperLargeV3Turbo)
OpenRouter and LocalModel: build multipart POST directly to /audio/transcriptions endpoint, bypassing AiClient::transcribe_bytes to allow arbitrary model names (e.g. voxtral-1)
Error messages include the backend label for actionable diagnostics

crates/hero_slides_lib/src/lib.rs

Re-exported TranscriptionBackend

crates/hero_slides_server/src/rpc.rs

Added UserSettings and TranscriptionSettings structs with serde derive and defaults
Added settings_path(), load_user_settings(), save_user_settings() helpers (backed by ~/.config/hero_slides/settings.json)
Added handle_user_settings_get() and handle_user_settings_save() async handlers
Wired both handlers into the handle_request dispatch
Updated handle_voice_transcribe to load settings, resolve the correct backend, and pass it to voice_transcribe

crates/hero_slides_server/openrpc.json

Added user.settings.get and user.settings.save method entries with full schemas including local_model_url and local_model_name fields

crates/hero_slides_ui/templates/index.html

Added "Transcription Settings" admin section in the Admin tab
Backend selector: Groq (default), OpenRouter, Local Model (OpenAI-compatible)
Conditional fields: OpenRouter shows API key + model inputs; Local Model shows endpoint URL + model name inputs with descriptive placeholders

crates/hero_slides_ui/static/js/dashboard.js

Added loadSettings(): populates settings form from user.settings.get on page load
Added onTranscriptionBackendChange(): shows/hides conditional fields based on selected backend
Added saveTranscriptionSettings(): calls user.settings.save and shows success/error toast

Test Results

Total: 75
Passed: 74
Failed: 0
Ignored: 1 (requires live AI service)

## Implementation Summary All changes have been implemented across 5 files. ### Changes Made **`crates/hero_slides_lib/src/voice.rs`** - Added `TranscriptionBackend` enum with three variants: `Groq`, `OpenRouter { api_key, model }`, and `LocalModel { base_url, model_name }` - Updated `voice_transcribe` to accept `backend: Option<TranscriptionBackend>` as a fourth parameter - `None` / `Groq`: unchanged behaviour (uses `AiClient::from_env()` + WhisperLargeV3Turbo) - `OpenRouter` and `LocalModel`: build multipart POST directly to `/audio/transcriptions` endpoint, bypassing `AiClient::transcribe_bytes` to allow arbitrary model names (e.g. `voxtral-1`) - Error messages include the backend label for actionable diagnostics **`crates/hero_slides_lib/src/lib.rs`** - Re-exported `TranscriptionBackend` **`crates/hero_slides_server/src/rpc.rs`** - Added `UserSettings` and `TranscriptionSettings` structs with serde derive and defaults - Added `settings_path()`, `load_user_settings()`, `save_user_settings()` helpers (backed by `~/.config/hero_slides/settings.json`) - Added `handle_user_settings_get()` and `handle_user_settings_save()` async handlers - Wired both handlers into the `handle_request` dispatch - Updated `handle_voice_transcribe` to load settings, resolve the correct backend, and pass it to `voice_transcribe` **`crates/hero_slides_server/openrpc.json`** - Added `user.settings.get` and `user.settings.save` method entries with full schemas including `local_model_url` and `local_model_name` fields **`crates/hero_slides_ui/templates/index.html`** - Added "Transcription Settings" admin section in the Admin tab - Backend selector: Groq (default), OpenRouter, Local Model (OpenAI-compatible) - Conditional fields: OpenRouter shows API key + model inputs; Local Model shows endpoint URL + model name inputs with descriptive placeholders **`crates/hero_slides_ui/static/js/dashboard.js`** - Added `loadSettings()`: populates settings form from `user.settings.get` on page load - Added `onTranscriptionBackendChange()`: shows/hides conditional fields based on selected backend - Added `saveTranscriptionSettings()`: calls `user.settings.save` and shows success/error toast ### Test Results - Total: 75 - Passed: 74 - Failed: 0 - Ignored: 1 (requires live AI service)

casper-stevens referenced this issue from a commit

2026-04-14 14:24:53 +00:00

feat(transcription): add support for alternative transcription backends

casper-stevens referenced this issue

2026-04-14 14:25:09 +00:00

feat(transcription): add support for alternative transcription backends #18

casper-stevens commented

2026-04-14 14:25:16 +00:00

Author

Member

Pull request opened: #18

Pull request opened: https://forge.ourworld.tf/lhumina_code/hero_slides/pulls/18

casper-stevens closed this issue

2026-04-14 14:49:30 +00:00

No labels

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

lhumina_code/hero_slides#16

No description provided.

Rows
Columns

Add support for alternative transcription models #16

Context

Goals

Implementation Spec for Issue #16

Objective

Requirements

Files to Modify

Implementation Plan

Step 1: Add TranscriptionBackend enum to voice.rs

Step 2: Add settings persistence and RPC handlers in rpc.rs

Step 3: Update handle_voice_transcribe to use settings

Step 4: Update openrpc.json

Step 5: Add Settings card to index.html

Step 6: Add JavaScript handlers in dashboard.js

Acceptance Criteria

Notes

Implementation Spec for Issue #16 (revised)

Objective

Requirements

Files to Modify

Implementation Plan

Step 1: Add TranscriptionBackend enum to voice.rs

Step 2: Add settings persistence and RPC handlers in rpc.rs

Step 3: Update handle_voice_transcribe to use settings

Step 4: Update openrpc.json

Step 5: Add Settings card to index.html

Step 6: Add JavaScript handlers in dashboard.js

Acceptance Criteria

Notes

Test Results

Breakdown by crate

Implementation Summary

Changes Made

Test Results

Step 1: Add `TranscriptionBackend` enum to `voice.rs`

Step 2: Add settings persistence and RPC handlers in `rpc.rs`

Step 3: Update `handle_voice_transcribe` to use settings

Step 4: Update `openrpc.json`

Step 5: Add Settings card to `index.html`

Step 6: Add JavaScript handlers in `dashboard.js`

Step 1: Add `TranscriptionBackend` enum to `voice.rs`

Step 2: Add settings persistence and RPC handlers in `rpc.rs`

Step 3: Update `handle_voice_transcribe` to use settings

Step 4: Update `openrpc.json`

Step 5: Add Settings card to `index.html`

Step 6: Add JavaScript handlers in `dashboard.js`