No description
  • Rust 65.6%
  • HTML 19.6%
  • Shell 12.1%
  • Makefile 1.3%
  • JavaScript 1.1%
  • Other 0.3%
Find a file
2026-04-28 08:55:00 +00:00
.forgejo/workflows fix(ci): green CI — clippy fixes + jsonwebtoken dev-dep + build-linux tag-only (#19) 2026-04-25 20:28:31 -04:00
crates Merge pull request 'fix(embedder): proxy openrpc returns 502 JSON + EmbedderdClient async' (#25) from development_proxy_openrpc_error into development 2026-04-28 08:55:00 +00:00
docs feat: remove auth module and add ensure_deps ONNX Runtime setup 2026-03-20 18:11:56 +01:00
scripts feat: sessions 17-18 — native dioxus islands, new URL routing, OSIS auth fix, build safety 2026-04-12 09:58:06 -04:00
.gitignore Fix RerankOutput serialization: use transparent wrapper instead of flatten 2026-04-15 09:08:46 +02:00
build.sh Split stats into server and namespace panels with independent polling 2026-01-22 15:41:14 +01:00
buildenv.sh fix: align repo with hero_crates, hero_sockets, hero_proc_service_selfstart standards 2026-03-20 08:24:36 +01:00
Cargo.toml chore(deps): bump ort to =2.0.0-rc.12 (ONNX Runtime 1.24) 2026-04-27 11:03:01 -04:00
download_models.sh Update README: quality is per-namespace, fix data storage structure 2026-01-22 15:49:56 +01:00
favicon.svg fix: update favicon.svg to match navbar search-heart icon 2026-02-10 16:17:28 -05:00
install.sh refactor: complete remaining zinit → hero_proc renames (#65) 2026-03-20 11:20:17 -04:00
MACOS_ONNX_FIX.md docs: Add macOS ONNX Runtime library path fix documentation 2026-02-08 13:20:08 +04:00
Makefile fix: align socket paths and service names with hero_sockets pattern 2026-04-06 12:31:22 +02:00
MAKEFILE_ROBUSTNESS.md docs: Add Makefile robustness and validation documentation 2026-02-08 13:46:59 +04:00
OAUTH_DEBUG.md fix: Correct OAuth2 parameters for hero_auth integration 2026-02-10 22:31:10 +04:00
openrpc.json fix: absolute binary paths, graceful shutdown, rename client to SDK 2026-02-28 18:42:47 +03:00
README.md docs: add deployment modes and daemon configuration documentation 2026-04-19 13:42:39 +02:00

HeroEmbedder

A fast, local embedding server for RAG applications. Provides dense vector embeddings, similarity search, and reranking via a JSON-RPC 2.0 API with namespace support for isolated document collections.

Architecture

hero_embedder/
├── crates/
│   ├── hero_embedder_lib/         # Library: server internals (ML, storage, retrieval)
│   ├── hero_embedder_server/      # Binary: JSON-RPC daemon (Unix socket)
│   ├── hero_embedder_sdk/         # Library: JSON-RPC client and types
│   ├── hero_embedder/             # Binary: CLI using the SDK
│   ├── hero_embedder_ui/          # Binary: Axum web dashboard using the SDK
│   └── hero_embedder_examples/    # Examples: SDK usage demonstrations
├── scripts/                       # Build and deployment scripts
├── Cargo.toml                     # Workspace root
├── Makefile                       # Build orchestration
└── buildenv.sh                    # Environment configuration

Dependency Graph

hero_embedder_server
        ↑
hero_embedder_lib (server internals)

hero_embedder_sdk (JSON-RPC client)
        ↑        ↑        ↑
        │        │        │
hero_embedder   hero_embedder_ui   hero_embedder_examples

Deployment Modes

hero_embedder can be deployed in three modes controlled by flags on the service_embedder Nu script. Choose the mode that fits your setup.

All-in-one (default)

All three processes run under the same user in a single hero_proc service (hero_embedder). Good for a standalone development machine or single-user node where memory is not a concern.

service_embedder start           # register + start all three
service_embedder status          # query "hero_embedder" service
service_embedder stop            # stop + unregister

--embedderd — ONNX daemon only

Starts only hero_embedderd (TCP, loads all ONNX models) under service hero_embedderd. Run as root to share the loaded models across every tenant process on the host and to minimise total RAM usage.

service_embedder start --embedderd --root    # daemon only, root's hero_proc
service_embedder status --embedderd --root
service_embedder stop  --embedderd --root

After the daemon comes up the script:

  1. Polls http://127.0.0.1:<port>/health (up to 60 s) to confirm the models finished loading.
  2. Reads the node's mycelium IPv6 address via mycelium address --root and prints the external URL other mycelium peers can use through hero_router.

Note: hero_embedderd currently binds 127.0.0.1 only. For cross-machine mycelium access, configure hero_router to proxy the TCP port and use the printed URL as --embedderd-url on the client node.

Environment variable set by the action:

Variable Value
HERO_EMBEDDERD_PORT TCP port (default 8092)

--userspace — server + UI only

Starts hero_embedder_server + hero_embedder_ui under service hero_embedder_userspace, delegating all embed/rerank work to an already- running hero_embedderd. No ONNX models are loaded in this process — memory footprint is a fraction of the full stack.

# Same machine — delegates to root's embedderd on 127.0.0.1:8092
service_embedder start --userspace

# Cross-machine — embedderd reachable via mycelium through hero_router
service_embedder start --userspace \
    --embedderd-url http://[<mycelium_ipv6>]:8092

service_embedder status --userspace
service_embedder stop   --userspace

Environment variable set by the action:

Variable Value
HERO_EMBEDDERD_URL URL of the running hero_embedderd (default http://127.0.0.1:8092)

Typical split-mode deployment on one host

Root layer (heavy, shared):
  service_embedder start --embedderd --root
    └─ hero_embedderd  binds 127.0.0.1:8092, loads all ONNX models

Userspace layer (lightweight, per-tenant):
  service_embedder start --userspace
    ├─ hero_embedder_server  Unix socket, delegates embed/rerank to root
    └─ hero_embedder_ui      Unix socket, admin dashboard

This pattern lets you run many tenant instances while paying the model-load cost only once.


Sockets

Service Socket Path Type
Server $HERO_SOCKET_DIR/hero_embedder/rpc.sock Unix Socket (OpenRPC / JSON-RPC 2.0)
UI $HERO_SOCKET_DIR/hero_embedder/ui.sock Unix Socket (HTTP admin dashboard)
Daemon TCP 127.0.0.1:8092 (configurable) HTTP JSON-RPC + /health

All server/UI sockets are Unix sockets only. External access is provided by hero_proxy. The daemon TCP port is intended for loopback use; cross-node access goes through hero_router.

Features

  • Embedding Generation: BGE models (small/base) with INT8/FP32 options
  • Semantic Search: Fast cosine similarity search
  • Reranking: Cross-encoder model for improved accuracy
  • Namespaces: Isolated document collections for multi-tenant use
  • Persistence: Documents stored in redb databases
  • Web UI: Bootstrap-based admin dashboard with live updates

Quick Start

# Full setup: install deps, download models, build, install
make setup

# Run server + UI
make run

# CLI health check
hero_embedder health

Quality Levels

Quality is set per namespace when creating it. All 4 models are loaded at startup.

Level Name Model Weights Embeddings Dimensions Use Case
1 Fast bge-small INT8 INT8 384 Real-time, low latency
2 Balanced bge-small FP32 FP16 384 Default, good balance
3 Quality bge-base INT8 INT8 768 Better accuracy
4 Best bge-base FP32 FP16 768 Maximum quality

API

JSON-RPC 2.0 endpoint at POST /rpc

Server Info

{"jsonrpc": "2.0", "id": 1, "method": "info", "params": []}
{"jsonrpc": "2.0", "id": 1, "method": "health", "params": []}

Embedding

{"jsonrpc": "2.0", "id": 1, "method": "embed", "params": [["hello world", "another text"]]}

Index Management

{"jsonrpc": "2.0", "id": 1, "method": "index.add", "params": [[{"id": "doc1", "text": "hello"}], "namespace"]}
{"jsonrpc": "2.0", "id": 1, "method": "index.get", "params": ["doc1", "namespace"]}
{"jsonrpc": "2.0", "id": 1, "method": "index.delete", "params": ["doc1", "namespace"]}
{"jsonrpc": "2.0", "id": 1, "method": "index.count", "params": ["namespace"]}
{"jsonrpc": "2.0", "id": 1, "method": "index.clear", "params": ["namespace"]}
{"jsonrpc": "2.0", "id": 1, "method": "search", "params": ["query text", 10, "namespace", true]}

Rerank

{"jsonrpc": "2.0", "id": 1, "method": "rerank", "params": ["query", [{"id": "1", "text": "..."}], 5]}

Namespaces

{"jsonrpc": "2.0", "id": 1, "method": "namespace.list", "params": []}
{"jsonrpc": "2.0", "id": 1, "method": "namespace.create", "params": ["my-docs", 2]}
{"jsonrpc": "2.0", "id": 1, "method": "namespace.delete", "params": ["my-docs"]}

CLI Client

hero_embedder health
hero_embedder stats
hero_embedder embed "hello world"
hero_embedder search "query" -k 10
hero_embedder add doc1 "document text"
hero_embedder ns-list
hero_embedder ns-create my-docs

SDK Usage (Rust)

use hero_embedder_sdk::HeroEmbedderClient;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let socket = format!("{}/hero/var/sockets/hero_embedder/rpc.sock",
        std::env::var("HOME")?);
    let client = HeroEmbedderClient::new(format!("unix://{socket}"));

    let results = client.search("hello", 10, None, None).await?;
    Ok(())
}

Environment Variables

Variable Default Description
EMBEDDER_MODELS ~/hero/var/embedder/models Models directory
EMBEDDER_DATA ~/hero/var/embedder/data Data directory
HERO_EMBEDDERD_PORT 8092 TCP port hero_embedderd listens on
HERO_EMBEDDERD_URL http://127.0.0.1:8092 URL hero_embedder_server uses to reach the daemon
HERO_SOCKET_DIR ~/hero/var/sockets Base directory for Unix sockets

Data Storage

~/hero/var/embedder/
├── models/
│   ├── bge-small/
│   ├── bge-base/
│   └── bge-reranker-base/
└── data/
    ├── default/
    │   └── q2/
    │       └── rag.redb
    └── corpus.redb

Building

make build          # Release build
make check          # Fast code check
make test           # Unit tests
make lint           # Clippy linter
make run            # Full stack (server + UI)
make run-server     # Server only
make run-ui         # UI only
make stop           # Stop all services

License

MIT