No description
|
|
||
|---|---|---|
| .cargo | ||
| .forgejo/workflows | ||
| src | ||
| static | ||
| .gitignore | ||
| Cargo.lock | ||
| Cargo.toml | ||
| Makefile | ||
| README.md | ||
| SPECS.md | ||
HeroVoice
Voice-to-markdown transcription server with real-time speech recognition, AI-powered text transformation, and live preview.
Features
- Real-time voice transcription - Stream audio from browser to server via WebSocket
- Voice Activity Detection - Silero VAD V5 neural network detects speech/silence transitions
- Automatic segmentation - Transcribes on natural pauses (350ms silence threshold)
- AI transcription - Uses Groq WhisperLargeV3Turbo with automatic failover
- Live markdown preview - Split-screen editor with real-time rendered HTML
- Text transformations - 11 built-in AI transformation styles:
spellcheck- Grammar and spelling correctionspecs- Technical specificationscode- Software architecture documentationdocs- User-friendly documentationlegal- Legal document formattingstory- Creative narrativesummary- Bullet-point summarytechnical- Technical documentationbusiness- Business analysismeeting- Meeting minutesemail- Professional email- Language translations: Dutch, French, Arabic
- Topic organization - Hierarchical folder structure for transcriptions
- Audio archival - Saves recordings as WAV and compressed OGG
Requirements
- Rust 1.70+
- Groq API key (required)
- Modern browser with Web Audio API and microphone support
Installation
# Clone the repository
git clone <repository-url>
cd herovoice
# Build release binary
make build
# Install to ~/hero/bin (optional)
make install
Configuration
Set the required environment variables:
# Required
export GROQ_API_KEY=your-groq-api-key
# Optional fallback providers
export OPENROUTER_API_KEY=your-openrouter-key
export SAMBANOVA_API_KEY=your-sambanova-key
# Server configuration (optional)
export HOST=0.0.0.0 # Bind address (default: 0.0.0.0)
export PORT=2756 # Listen port (default: 2756)
export RUST_LOG=herovoice=info # Log level
Usage
# Start the server
make run
# Or with debug logging
make dev
Open http://localhost:2756 in your browser.
Recording
- Click the microphone button or press
Ctrl+Shift+Rto start recording - Speak naturally - transcription happens automatically on pauses
- Click stop or press the shortcut again to end recording
- Use
Ctrl+Shift+Cto copy the markdown content
Topics
- Create folders and topics in the sidebar tree
- Each topic stores its content, audio recordings, and transforms
- Right-click for rename, move, and delete options
Transformations
- Select a transformation style from the dropdown
- Click "Transform" to apply AI formatting
- Transforms are saved in the topic's
transforms/folder
Architecture
Browser (WebSocket)
│
▼
Axum Server
├── WebSocket Handler → AudioRecorder (WAV)
│ → AudioProcessor (VAD)
│
├── Transcriber (herolib-ai)
│ └── Groq → OpenRouter → SambaNova (failover)
│
├── Topic API (CRUD + file management)
│
└── TextTransformer (LLM transformations)
API Endpoints
WebSocket
GET /ws- Audio streaming endpoint
Topics
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/topics |
Get topic tree |
| POST | /api/topics |
Create topic |
| POST | /api/topics/folder |
Create folder |
| POST | /api/topics/rename |
Rename topic/folder |
| POST | /api/topics/move |
Move topic/folder |
| POST | /api/topics/delete |
Delete to trash |
| GET | /api/topics/content |
Read content |
| POST | /api/topics/content |
Save content |
| POST | /api/topics/transform |
Apply transformation |
| GET | /api/topics/file |
Download file |
| POST | /api/topics/file |
Save file |
| DELETE | /api/topics/file |
Delete file |
| POST | /api/topics/reset |
Clear topic |
WebSocket Protocol
Client to Server:
{ "type": "start", "topic": "optional-topic-path" }
{ "type": "stop" }
{ "type": "config", "language": "en" }
Plus binary audio data (16-bit PCM, 16kHz, mono)
Server to Client:
{ "type": "transcription", "text": "...", "is_final": true }
{ "type": "status", "message": "..." }
{ "type": "error", "message": "..." }
Development
make help # Show all commands
make build # Build release binary
make run # Build and run
make dev # Development mode with debug logging
make check # Fast code check
make fmt # Format code
make lint # Run clippy linter
make test # Run tests
make clean # Remove build artifacts
make deps # Show dependency tree
Project Structure
herovoice/
├── src/
│ ├── main.rs # Axum server, WebSocket handler, API
│ ├── transcriber.rs # AI transcription client
│ ├── audio.rs # Voice Activity Detection
│ ├── convert.rs # WAV to OGG conversion
│ └── topics.rs # Topic/folder management
├── static/
│ ├── index.html # Single-page application
│ └── app.js # Frontend JavaScript
├── data/ # Runtime data (topics, audio, transforms)
├── Cargo.toml # Dependencies
├── Makefile # Build automation
└── SPECS.md # Technical specifications
Audio Processing
- Sample rate: 16kHz (required for Silero VAD)
- Chunk size: 512 samples for VAD analysis
- Silence threshold: 350ms triggers transcription
- Speech threshold: 0.20 probability
- Maximum buffer: 30 seconds before forced transcription
- Compression: OGG Vorbis at quality 0.4 (~10% of WAV size)
Browser Support
- Chrome 120+
- Firefox 120+
- Safari 17+
- Edge 120+
Requires microphone permission and WebSocket support.
Embedding & CORS
HeroVoice is configured to allow:
- Iframe embedding - No X-Frame-Options restrictions, can be embedded in any page
- Cross-origin API calls - Full CORS support for all API endpoints
- WebSocket connections - From any origin
This makes it suitable for embedding in other applications or dashboards.
License
MIT