No description
Find a file
mik-tf 93ace4538d
All checks were successful
Test / test (push) Successful in 1m44s
ci: Add test workflow and trigger Linux builds only on version tags
2026-01-30 12:11:18 -05:00
.cargo refactor: switch herolib-ai from path to git URL dependency 2026-01-30 11:21:56 -05:00
.forgejo/workflows ci: Add test workflow and trigger Linux builds only on version tags 2026-01-30 12:11:18 -05:00
src chore: rename herovoice -> hero_voice (binary, package, CI artifacts) 2026-01-29 18:58:53 -05:00
static feat: Add theme sync with Hero OS parent frame via message listener 2026-01-29 08:55:17 -05:00
.gitignore refactor: switch herolib-ai from path to git URL dependency 2026-01-30 11:21:56 -05:00
Cargo.lock refactor: switch herolib-ai from path to git URL dependency 2026-01-30 11:21:56 -05:00
Cargo.toml refactor: switch herolib-ai from path to git URL dependency 2026-01-30 11:21:56 -05:00
Makefile chore: rename herovoice -> hero_voice (binary, package, CI artifacts) 2026-01-29 18:58:53 -05:00
README.md Add README and enable CORS/iframe embedding 2026-01-28 00:00:28 +01:00
SPECS.md Initial commit: HeroVoice - Voice to Markdown transcription 2026-01-26 07:02:05 +01:00

HeroVoice

Voice-to-markdown transcription server with real-time speech recognition, AI-powered text transformation, and live preview.

Features

  • Real-time voice transcription - Stream audio from browser to server via WebSocket
  • Voice Activity Detection - Silero VAD V5 neural network detects speech/silence transitions
  • Automatic segmentation - Transcribes on natural pauses (350ms silence threshold)
  • AI transcription - Uses Groq WhisperLargeV3Turbo with automatic failover
  • Live markdown preview - Split-screen editor with real-time rendered HTML
  • Text transformations - 11 built-in AI transformation styles:
    • spellcheck - Grammar and spelling correction
    • specs - Technical specifications
    • code - Software architecture documentation
    • docs - User-friendly documentation
    • legal - Legal document formatting
    • story - Creative narrative
    • summary - Bullet-point summary
    • technical - Technical documentation
    • business - Business analysis
    • meeting - Meeting minutes
    • email - Professional email
    • Language translations: Dutch, French, Arabic
  • Topic organization - Hierarchical folder structure for transcriptions
  • Audio archival - Saves recordings as WAV and compressed OGG

Requirements

  • Rust 1.70+
  • Groq API key (required)
  • Modern browser with Web Audio API and microphone support

Installation

# Clone the repository
git clone <repository-url>
cd herovoice

# Build release binary
make build

# Install to ~/hero/bin (optional)
make install

Configuration

Set the required environment variables:

# Required
export GROQ_API_KEY=your-groq-api-key

# Optional fallback providers
export OPENROUTER_API_KEY=your-openrouter-key
export SAMBANOVA_API_KEY=your-sambanova-key

# Server configuration (optional)
export HOST=0.0.0.0          # Bind address (default: 0.0.0.0)
export PORT=2756             # Listen port (default: 2756)
export RUST_LOG=herovoice=info  # Log level

Usage

# Start the server
make run

# Or with debug logging
make dev

Open http://localhost:2756 in your browser.

Recording

  1. Click the microphone button or press Ctrl+Shift+R to start recording
  2. Speak naturally - transcription happens automatically on pauses
  3. Click stop or press the shortcut again to end recording
  4. Use Ctrl+Shift+C to copy the markdown content

Topics

  • Create folders and topics in the sidebar tree
  • Each topic stores its content, audio recordings, and transforms
  • Right-click for rename, move, and delete options

Transformations

  1. Select a transformation style from the dropdown
  2. Click "Transform" to apply AI formatting
  3. Transforms are saved in the topic's transforms/ folder

Architecture

Browser (WebSocket)
    │
    ▼
Axum Server
    ├── WebSocket Handler → AudioRecorder (WAV)
    │                    → AudioProcessor (VAD)
    │
    ├── Transcriber (herolib-ai)
    │   └── Groq → OpenRouter → SambaNova (failover)
    │
    ├── Topic API (CRUD + file management)
    │
    └── TextTransformer (LLM transformations)

API Endpoints

WebSocket

  • GET /ws - Audio streaming endpoint

Topics

Method Endpoint Description
GET /api/topics Get topic tree
POST /api/topics Create topic
POST /api/topics/folder Create folder
POST /api/topics/rename Rename topic/folder
POST /api/topics/move Move topic/folder
POST /api/topics/delete Delete to trash
GET /api/topics/content Read content
POST /api/topics/content Save content
POST /api/topics/transform Apply transformation
GET /api/topics/file Download file
POST /api/topics/file Save file
DELETE /api/topics/file Delete file
POST /api/topics/reset Clear topic

WebSocket Protocol

Client to Server:

{ "type": "start", "topic": "optional-topic-path" }
{ "type": "stop" }
{ "type": "config", "language": "en" }

Plus binary audio data (16-bit PCM, 16kHz, mono)

Server to Client:

{ "type": "transcription", "text": "...", "is_final": true }
{ "type": "status", "message": "..." }
{ "type": "error", "message": "..." }

Development

make help       # Show all commands
make build      # Build release binary
make run        # Build and run
make dev        # Development mode with debug logging
make check      # Fast code check
make fmt        # Format code
make lint       # Run clippy linter
make test       # Run tests
make clean      # Remove build artifacts
make deps       # Show dependency tree

Project Structure

herovoice/
├── src/
│   ├── main.rs          # Axum server, WebSocket handler, API
│   ├── transcriber.rs   # AI transcription client
│   ├── audio.rs         # Voice Activity Detection
│   ├── convert.rs       # WAV to OGG conversion
│   └── topics.rs        # Topic/folder management
├── static/
│   ├── index.html       # Single-page application
│   └── app.js           # Frontend JavaScript
├── data/                # Runtime data (topics, audio, transforms)
├── Cargo.toml           # Dependencies
├── Makefile             # Build automation
└── SPECS.md             # Technical specifications

Audio Processing

  • Sample rate: 16kHz (required for Silero VAD)
  • Chunk size: 512 samples for VAD analysis
  • Silence threshold: 350ms triggers transcription
  • Speech threshold: 0.20 probability
  • Maximum buffer: 30 seconds before forced transcription
  • Compression: OGG Vorbis at quality 0.4 (~10% of WAV size)

Browser Support

  • Chrome 120+
  • Firefox 120+
  • Safari 17+
  • Edge 120+

Requires microphone permission and WebSocket support.

Embedding & CORS

HeroVoice is configured to allow:

  • Iframe embedding - No X-Frame-Options restrictions, can be embedded in any page
  • Cross-origin API calls - Full CORS support for all API endpoints
  • WebSocket connections - From any origin

This makes it suitable for embedding in other applications or dashboards.

License

MIT