- Rust 65.6%
- HTML 19.6%
- Shell 12.1%
- Makefile 1.3%
- JavaScript 1.1%
- Other 0.3%
|
All checks were successful
Test / test (push) Successful in 3m25s
Reviewed-on: #25 |
||
|---|---|---|
| .forgejo/workflows | ||
| crates | ||
| docs | ||
| scripts | ||
| .gitignore | ||
| build.sh | ||
| buildenv.sh | ||
| Cargo.toml | ||
| download_models.sh | ||
| favicon.svg | ||
| install.sh | ||
| MACOS_ONNX_FIX.md | ||
| Makefile | ||
| MAKEFILE_ROBUSTNESS.md | ||
| OAUTH_DEBUG.md | ||
| openrpc.json | ||
| README.md | ||
HeroEmbedder
A fast, local embedding server for RAG applications. Provides dense vector embeddings, similarity search, and reranking via a JSON-RPC 2.0 API with namespace support for isolated document collections.
Architecture
hero_embedder/
├── crates/
│ ├── hero_embedder_lib/ # Library: server internals (ML, storage, retrieval)
│ ├── hero_embedder_server/ # Binary: JSON-RPC daemon (Unix socket)
│ ├── hero_embedder_sdk/ # Library: JSON-RPC client and types
│ ├── hero_embedder/ # Binary: CLI using the SDK
│ ├── hero_embedder_ui/ # Binary: Axum web dashboard using the SDK
│ └── hero_embedder_examples/ # Examples: SDK usage demonstrations
├── scripts/ # Build and deployment scripts
├── Cargo.toml # Workspace root
├── Makefile # Build orchestration
└── buildenv.sh # Environment configuration
Dependency Graph
hero_embedder_server
↑
hero_embedder_lib (server internals)
hero_embedder_sdk (JSON-RPC client)
↑ ↑ ↑
│ │ │
hero_embedder hero_embedder_ui hero_embedder_examples
Deployment Modes
hero_embedder can be deployed in three modes controlled by flags on the
service_embedder Nu script. Choose the mode that fits your setup.
All-in-one (default)
All three processes run under the same user in a single hero_proc service
(hero_embedder). Good for a standalone development machine or single-user
node where memory is not a concern.
service_embedder start # register + start all three
service_embedder status # query "hero_embedder" service
service_embedder stop # stop + unregister
--embedderd — ONNX daemon only
Starts only hero_embedderd (TCP, loads all ONNX models) under service
hero_embedderd. Run as root to share the loaded models across every
tenant process on the host and to minimise total RAM usage.
service_embedder start --embedderd --root # daemon only, root's hero_proc
service_embedder status --embedderd --root
service_embedder stop --embedderd --root
After the daemon comes up the script:
- Polls
http://127.0.0.1:<port>/health(up to 60 s) to confirm the models finished loading. - Reads the node's mycelium IPv6 address via
mycelium address --rootand prints the external URL other mycelium peers can use throughhero_router.
Note:
hero_embedderdcurrently binds127.0.0.1only. For cross-machine mycelium access, configurehero_routerto proxy the TCP port and use the printed URL as--embedderd-urlon the client node.
Environment variable set by the action:
| Variable | Value |
|---|---|
HERO_EMBEDDERD_PORT |
TCP port (default 8092) |
--userspace — server + UI only
Starts hero_embedder_server + hero_embedder_ui under service
hero_embedder_userspace, delegating all embed/rerank work to an already-
running hero_embedderd. No ONNX models are loaded in this process —
memory footprint is a fraction of the full stack.
# Same machine — delegates to root's embedderd on 127.0.0.1:8092
service_embedder start --userspace
# Cross-machine — embedderd reachable via mycelium through hero_router
service_embedder start --userspace \
--embedderd-url http://[<mycelium_ipv6>]:8092
service_embedder status --userspace
service_embedder stop --userspace
Environment variable set by the action:
| Variable | Value |
|---|---|
HERO_EMBEDDERD_URL |
URL of the running hero_embedderd (default http://127.0.0.1:8092) |
Typical split-mode deployment on one host
Root layer (heavy, shared):
service_embedder start --embedderd --root
└─ hero_embedderd binds 127.0.0.1:8092, loads all ONNX models
Userspace layer (lightweight, per-tenant):
service_embedder start --userspace
├─ hero_embedder_server Unix socket, delegates embed/rerank to root
└─ hero_embedder_ui Unix socket, admin dashboard
This pattern lets you run many tenant instances while paying the model-load cost only once.
Sockets
| Service | Socket Path | Type |
|---|---|---|
| Server | $HERO_SOCKET_DIR/hero_embedder/rpc.sock |
Unix Socket (OpenRPC / JSON-RPC 2.0) |
| UI | $HERO_SOCKET_DIR/hero_embedder/ui.sock |
Unix Socket (HTTP admin dashboard) |
| Daemon | TCP 127.0.0.1:8092 (configurable) |
HTTP JSON-RPC + /health |
All server/UI sockets are Unix sockets only. External access is provided by hero_proxy.
The daemon TCP port is intended for loopback use; cross-node access goes through hero_router.
Features
- Embedding Generation: BGE models (small/base) with INT8/FP32 options
- Semantic Search: Fast cosine similarity search
- Reranking: Cross-encoder model for improved accuracy
- Namespaces: Isolated document collections for multi-tenant use
- Persistence: Documents stored in redb databases
- Web UI: Bootstrap-based admin dashboard with live updates
Quick Start
# Full setup: install deps, download models, build, install
make setup
# Run server + UI
make run
# CLI health check
hero_embedder health
Quality Levels
Quality is set per namespace when creating it. All 4 models are loaded at startup.
| Level | Name | Model | Weights | Embeddings | Dimensions | Use Case |
|---|---|---|---|---|---|---|
| 1 | Fast | bge-small | INT8 | INT8 | 384 | Real-time, low latency |
| 2 | Balanced | bge-small | FP32 | FP16 | 384 | Default, good balance |
| 3 | Quality | bge-base | INT8 | INT8 | 768 | Better accuracy |
| 4 | Best | bge-base | FP32 | FP16 | 768 | Maximum quality |
API
JSON-RPC 2.0 endpoint at POST /rpc
Server Info
{"jsonrpc": "2.0", "id": 1, "method": "info", "params": []}
{"jsonrpc": "2.0", "id": 1, "method": "health", "params": []}
Embedding
{"jsonrpc": "2.0", "id": 1, "method": "embed", "params": [["hello world", "another text"]]}
Index Management
{"jsonrpc": "2.0", "id": 1, "method": "index.add", "params": [[{"id": "doc1", "text": "hello"}], "namespace"]}
{"jsonrpc": "2.0", "id": 1, "method": "index.get", "params": ["doc1", "namespace"]}
{"jsonrpc": "2.0", "id": 1, "method": "index.delete", "params": ["doc1", "namespace"]}
{"jsonrpc": "2.0", "id": 1, "method": "index.count", "params": ["namespace"]}
{"jsonrpc": "2.0", "id": 1, "method": "index.clear", "params": ["namespace"]}
Search
{"jsonrpc": "2.0", "id": 1, "method": "search", "params": ["query text", 10, "namespace", true]}
Rerank
{"jsonrpc": "2.0", "id": 1, "method": "rerank", "params": ["query", [{"id": "1", "text": "..."}], 5]}
Namespaces
{"jsonrpc": "2.0", "id": 1, "method": "namespace.list", "params": []}
{"jsonrpc": "2.0", "id": 1, "method": "namespace.create", "params": ["my-docs", 2]}
{"jsonrpc": "2.0", "id": 1, "method": "namespace.delete", "params": ["my-docs"]}
CLI Client
hero_embedder health
hero_embedder stats
hero_embedder embed "hello world"
hero_embedder search "query" -k 10
hero_embedder add doc1 "document text"
hero_embedder ns-list
hero_embedder ns-create my-docs
SDK Usage (Rust)
use hero_embedder_sdk::HeroEmbedderClient;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let socket = format!("{}/hero/var/sockets/hero_embedder/rpc.sock",
std::env::var("HOME")?);
let client = HeroEmbedderClient::new(format!("unix://{socket}"));
let results = client.search("hello", 10, None, None).await?;
Ok(())
}
Environment Variables
| Variable | Default | Description |
|---|---|---|
EMBEDDER_MODELS |
~/hero/var/embedder/models |
Models directory |
EMBEDDER_DATA |
~/hero/var/embedder/data |
Data directory |
HERO_EMBEDDERD_PORT |
8092 |
TCP port hero_embedderd listens on |
HERO_EMBEDDERD_URL |
http://127.0.0.1:8092 |
URL hero_embedder_server uses to reach the daemon |
HERO_SOCKET_DIR |
~/hero/var/sockets |
Base directory for Unix sockets |
Data Storage
~/hero/var/embedder/
├── models/
│ ├── bge-small/
│ ├── bge-base/
│ └── bge-reranker-base/
└── data/
├── default/
│ └── q2/
│ └── rag.redb
└── corpus.redb
Building
make build # Release build
make check # Fast code check
make test # Unit tests
make lint # Clippy linter
make run # Full stack (server + UI)
make run-server # Server only
make run-ui # UI only
make stop # Stop all services
License
MIT