No description
Find a file
mik-tf b3e2093c96
All checks were successful
Test / test (push) Successful in 2m20s
ci: add test workflow and trigger builds only on tags
2026-01-30 12:11:12 -05:00
.forgejo/workflows ci: add test workflow and trigger builds only on tags 2026-01-30 12:11:12 -05:00
data Simplify TriviaQA bundling: use gzipped text file instead of tar.gz 2026-01-22 16:18:50 +01:00
docs Split stats into server and namespace panels with independent polling 2026-01-22 15:41:14 +01:00
scripts Remove incorrect web UI message from install script 2026-01-22 16:41:06 +01:00
src chore: rename embedderd/embedder -> hero_embedder/hero_embedder_client 2026-01-29 18:59:19 -05:00
templates feat: Switch base template to auto theme detection 2026-01-29 14:56:29 -05:00
.gitignore Simplify TriviaQA bundling: use gzipped text file instead of tar.gz 2026-01-22 16:18:50 +01:00
build.sh Split stats into server and namespace panels with independent polling 2026-01-22 15:41:14 +01:00
Cargo.toml chore: rename embedderd/embedder -> hero_embedder/hero_embedder_client 2026-01-29 18:59:19 -05:00
download_models.sh Update README: quality is per-namespace, fix data storage structure 2026-01-22 15:49:56 +01:00
favicon.svg icons 2026-01-26 10:12:05 +01:00
install.sh Add install.sh - full installation script 2026-01-22 16:00:34 +01:00
Makefile chore: rename embedderd/embedder -> hero_embedder/hero_embedder_client 2026-01-29 18:59:19 -05:00
openrpc.json Split stats into server and namespace panels with independent polling 2026-01-22 15:41:14 +01:00
README.md Fix default port in README (3752, not 3000) 2026-01-22 16:46:03 +01:00
run.sh chore: rename embedderd/embedder -> hero_embedder/hero_embedder_client 2026-01-29 18:59:19 -05:00

HeroEmbedder

A fast, local embedding server for RAG applications. Provides dense vector embeddings, similarity search, and reranking via a JSON-RPC 2.0 API with namespace support for isolated document collections.

Features

  • Embedding Generation: BGE models (small/base) with INT8/FP32 options
  • Semantic Search: Fast cosine similarity search
  • Reranking: Cross-encoder model for improved accuracy
  • Namespaces: Isolated document collections for multi-tenant use
  • Persistence: Documents stored in redb databases
  • Web UI: Bootstrap-based interface with live updates

Installation

Download and run the installer script:

curl -sSL https://forge.ourworld.tf/geomind_code/hero_embedder/raw/branch/main/scripts/install_download_run.sh | bash

This will:

  • Detect your platform (macOS arm64 or Linux amd64)
  • Download embedderd (server) and embedder (CLI client)
  • Install to ~/hero/bin
  • Configure your PATH

Manual Download

# macOS (Apple Silicon)
curl -OJ https://forge.ourworld.tf/api/packages/geomind_code/generic/hero_embedder/dev/embedderd-darwin-arm64
curl -OJ https://forge.ourworld.tf/api/packages/geomind_code/generic/hero_embedder/dev/embedder-darwin-arm64

# Linux (x86_64)
curl -OJ https://forge.ourworld.tf/api/packages/geomind_code/generic/hero_embedder/dev/embedderd-linux-amd64
curl -OJ https://forge.ourworld.tf/api/packages/geomind_code/generic/hero_embedder/dev/embedder-linux-amd64

chmod +x embedderd* embedder*

Build from Source

# 1. Install ONNX Runtime
brew install onnxruntime  # macOS
# or see https://onnxruntime.ai for Linux

# 2. Download models
./download_models.sh

# 3. Build and run
./run.sh

Quick Start

# Start the server
embedderd

# In another terminal, use the CLI
embedder embed "hello world"
embedder search "query"
embedder --help

Server starts at http://localhost:3752

Quality Levels

Quality is set per namespace when creating it. All 4 models are loaded at startup for instant availability.

Level Name Model Weights Embeddings Dimensions Use Case
1 Fast bge-small INT8 INT8 384 Real-time, low latency
2 Balanced bge-small FP32 FP16 384 Default, good balance
3 Quality bge-base INT8 INT8 768 Better accuracy
4 Best bge-base FP32 FP16 768 Maximum quality

Reranker is available for all quality levels via the use_reranker search parameter.

Namespaces

Namespaces provide isolated document collections. Each namespace has its own index and storage.

// List namespaces
{"jsonrpc": "2.0", "id": 1, "method": "namespace.list", "params": []}

// Create namespace with quality level (1-4)
{"jsonrpc": "2.0", "id": 1, "method": "namespace.create", "params": ["my-docs", 2]}

// Delete namespace
{"jsonrpc": "2.0", "id": 1, "method": "namespace.delete", "params": ["my-docs"]}

All document operations accept an optional namespace parameter (defaults to "default"):

// Add to specific namespace
{"jsonrpc": "2.0", "id": 1, "method": "index.add", "params": [[{"id": "1", "text": "..."}], "my-docs"]}

// Search in namespace
{"jsonrpc": "2.0", "id": 1, "method": "search", "params": ["query", 10, "my-docs"]}

API

JSON-RPC 2.0 endpoint at POST /rpc

Server Info

{"jsonrpc": "2.0", "id": 1, "method": "info", "params": []}
{"jsonrpc": "2.0", "id": 1, "method": "health", "params": []}

Embedding

{"jsonrpc": "2.0", "id": 1, "method": "embed", "params": [["hello world", "another text"]]}

Index Management

// Add documents (with optional namespace)
{"jsonrpc": "2.0", "id": 1, "method": "index.add", "params": [[{"id": "doc1", "text": "hello"}], "namespace"]}

// Get document
{"jsonrpc": "2.0", "id": 1, "method": "index.get", "params": ["doc1", "namespace"]}

// Delete document
{"jsonrpc": "2.0", "id": 1, "method": "index.delete", "params": ["doc1", "namespace"]}

// Count documents
{"jsonrpc": "2.0", "id": 1, "method": "index.count", "params": ["namespace"]}

// Clear namespace
{"jsonrpc": "2.0", "id": 1, "method": "index.clear", "params": ["namespace"]}
// Search with optional namespace and reranker
{"jsonrpc": "2.0", "id": 1, "method": "search", "params": ["query text", 10, "namespace", true]}

Rerank

{"jsonrpc": "2.0", "id": 1, "method": "rerank", "params": ["query", [{"id": "1", "text": "..."}], 5]}

Corpus (TriviaQA)

// Download TriviaQA dataset
{"jsonrpc": "2.0", "id": 1, "method": "corpus.download", "params": []}

// Load into namespace
{"jsonrpc": "2.0", "id": 1, "method": "corpus.load", "params": [1000, "namespace"]}

// Benchmark
{"jsonrpc": "2.0", "id": 1, "method": "corpus.benchmark", "params": [null, 10, "namespace"]}

CLI Client

# Build with client feature
cargo build --features client

# Commands
heroembedder health
heroembedder stats
heroembedder embed "hello world"
heroembedder search "query" -k 10
heroembedder add doc1 "document text"

SDK Usage (Rust)

use hero_embedder::sdk::HeroEmbedderClient;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let client = HeroEmbedderClient::new("http://localhost:3752/rpc");
    
    // List namespaces
    let namespaces = client.namespace_list().await?;
    
    // Add documents
    let docs = vec![
        Doc { id: "1".into(), text: "Hello world".into() },
    ];
    client.index_add(docs, Some("my-namespace")).await?;
    
    // Search
    let results = client.search("hello", 10, Some("my-namespace"), None).await?;
    
    Ok(())
}

Environment Variables

Variable Default Description
EMBEDDER_MODELS ~/hero/var/embedder/models Models directory
EMBEDDER_DATA ~/hero/var/embedder/data Data directory

Data Storage

~/hero/var/embedder/
├── models/
│   ├── bge-small/
│   ├── bge-base/
│   └── bge-reranker-base/
└── data/
    ├── default/             # Namespace: default
    │   └── q2/              # Quality level 2
    │       └── rag.redb
    ├── my-namespace/        # Namespace: my-namespace
    │   └── q3/              # Quality level 3
    │       └── rag.redb
    └── corpus.redb          # Shared TriviaQA corpus

Building

# Server only (default)
cargo build --release

# Server + CLI client
cargo build --release --features client

# All features
cargo build --release --features full

Project Structure

hero_embedder/
├── src/
│   ├── main.rs          # Server binary
│   ├── lib.rs           # Library root
│   ├── config.rs        # Quality profiles
│   ├── namespace.rs     # Namespace manager
│   ├── state.rs         # App state
│   ├── handlers.rs      # HTML handlers
│   ├── api/             # JSON-RPC handlers
│   ├── ml/              # Embedding & reranking
│   ├── retrieval/       # Vector search
│   ├── storage/         # Persistence
│   ├── sdk/             # Client SDK
│   └── cli/             # CLI commands
├── templates/           # Web UI templates
├── docs/                # Documentation
│   └── API.md           # Full API docs
├── download_models.sh   # Model downloader
├── build.sh             # Build script
└── run.sh               # Run server

Documentation

See docs/API.md for complete API documentation.

License

MIT