Lightweight LLM broker with OpenAI-compatible API routing to multiple LLM providers.

Rust 85.8%
HTML 12.4%
CSS 1.6%
Shell 0.2%

Find a file

nabil_salah e4a63b0092 All checks were successful Build and Test / build (push) Successful in 16m29s Details lab publish / publish (push) Successful in 28m23s Details Merge pull request 'feat: implement api endpoints to know usage per ip/s' (#143 ) from develop_usage_api into main Reviewed-on: #143		2026-06-07 13:26:59 +00:00
.cargo	chore: add .cargo/config.toml and bump dependency lockfile	2026-05-30 15:21:15 +02:00
.forgejo/workflows	ci: align workflows with hero_proc canonical set	2026-06-03 16:56:57 +02:00
.hero	chore: rename FORGEJO_TOKEN→FORGE_TOKEN, HERO_SOCKET_DIR→PATH_SOCKET, reformat docs	2026-05-26 12:25:31 +02:00
crates	feat: implement api endpoints to know usage per ip/s	2026-06-07 15:56:22 +03:00
docs	chore: rename FORGEJO_TOKEN→FORGE_TOKEN, HERO_SOCKET_DIR→PATH_SOCKET, reformat docs	2026-05-26 12:25:31 +02:00
scripts	feat: add quick test scripts for manual verification	2026-05-20 20:02:08 +02:00
.gitignore	feat(sdk): per-domain typed clients for Phase-9 broker (#131 )	2026-05-13 08:57:41 +00:00
api_state.md	chore: rename FORGEJO_TOKEN→FORGE_TOKEN, HERO_SOCKET_DIR→PATH_SOCKET, reformat docs	2026-05-26 12:25:31 +02:00
Cargo.lock	chore(deps): update Cargo.lock for hero deps pinned to main	2026-06-03 16:58:06 +02:00
Cargo.toml	chore(deps): pin all hero deps to main (hero_lib/website/proc_sdk/archipelagos)	2026-06-03 16:52:29 +03:00
Cargo.toml.hero_builder_backup	chore: downgrade rust-version to 1.95.0 and add Cargo.toml builder backup	2026-06-01 07:37:16 +02:00
mcp_servers.example.json	fix: correct socket paths to use service-scoped subdirectories	2026-04-06 14:15:30 +02:00
modelsconfig.yml	fix(server): adapt callers to (T, Backend, model) tuple returns from services	2026-05-11 07:20:59 +02:00
PURPOSE.md	chore: rename FORGEJO_TOKEN→FORGE_TOKEN, HERO_SOCKET_DIR→PATH_SOCKET, reformat docs	2026-05-26 12:25:31 +02:00
README.md	chore: rename FORGEJO_TOKEN→FORGE_TOKEN, HERO_SOCKET_DIR→PATH_SOCKET, reformat docs	2026-05-26 12:25:31 +02:00
rust-toolchain.toml	chore: add rust-toolchain.toml pinning channel to 1.96	2026-06-01 13:07:01 +02:00

README.md

AI Broker

A lightweight LLM request broker with an OpenAI-compatible REST API that intelligently routes requests to multiple LLM providers with cost-aware strategies. All communication is via Unix Domain Sockets — no TCP ports.

Features

OpenAI-Compatible API — Drop-in replacement for OpenAI clients (via Unix socket)
OpenRouter API Compliance — Drop-in replacement for https://openrouter.ai/api/v1; any OpenRouter client works by changing the base URL. See docs/api.md.
Sub-Provider Selection — Pin or constrain the upstream sub-provider via the standard OpenRouter provider request field (order, only, ignore, allow_fallbacks, sort, max_price, …)
Live OpenRouter Model Catalog — Hourly refresh of the OpenRouter /models and per-model /endpoints lists, merged with the local YAML registry
Auxiliary Endpoints — /v1/credits, /v1/key, /v1/generation, /v1/models/{author}/{slug}/endpoints, /v1/completions
Multi-Provider Support — OpenAI, OpenRouter, Groq, SambaNova
Smart Routing — Automatic model selection based on cost or quality
Cost Tracking — Per-request cost calculation and tracking
Request Tracking — Detailed per-IP request tracking with timestamps and durations
Streaming Support — Real-time streaming responses via SSE, including delta.reasoning and the OpenRouter terminal usage chunk
MCP Broker — Aggregate tools from multiple MCP (Model Context Protocol) servers
Rate Limiting — Per-IP rate limiting with configurable limits
Audio APIs — Text-to-speech and speech-to-text support (Groq, SambaNova, OpenAI)
Config-Based Audio Models — STT/TTS models defined in modelsconfig.yml with automatic fallback
Embeddings — Vector embedding generation
Many Chat Models — Latest Claude 4.x, Gemini 3, GPT-5.2, o3-mini, Grok 4.1, Kimi K2.5 and more
Persistent Billing — SQLite-based request logging for billing and analytics
API Key Support — Optional API key authentication system
Unix Socket Architecture — All services communicate over Unix Domain Sockets; no open TCP ports (optional TCP listener for cascade — see below)
Cascade / Multi-Broker — Run multiple brokers and chain them: a child broker forwards to a "mother" broker via a TCP listener, with priority + admin UI persisted in hero_db. See Cascade — multi-broker setup.

OpenRouter compliance

The REST surface is wire-compatible with the OpenRouter API. That includes the full request schema (provider, reasoning, transforms, route, models fallback list, response_format), the response shape (top-level provider, system_fingerprint, extended usage with cost, cached_tokens, reasoning_tokens), and the auxiliary REST endpoints (/v1/completions, /v1/credits, /v1/key, /v1/generation, /v1/models/{author}/{slug}/endpoints).

Attribution headers HTTP-Referer and X-Title are forwarded when the caller supplies them, and fall back to the OPENROUTER_HTTP_REFERER / OPENROUTER_X_TITLE env vars otherwise.

When the resolved backend is OpenRouter, the broker returns the OpenRouter response shape verbatim. For non-OpenRouter backends the broker still adds the legacy x_aibroker envelope so existing internal callers keep working unchanged. See docs/api.md for the full reference.

Project Structure

hero_aibroker/
├── crates/
│ ├── hero_aibroker/ # CLI (chat, models, tools, health)
│ ├── hero_aibroker_lib/ # Core business logic (shared library)
│ ├── hero_aibroker_sdk/ # Generated OpenRPC client + types
│ ├── hero_aibroker_server/ # Server binary (two Unix sockets)
│ ├── hero_aibroker_admin/ # Admin dashboard binary (Unix socket)
│ ├── hero_aibroker_examples/ # SDK examples and integration tests
│ ├── hero_aibroker_services/ # Multi-MCP broker binary
│ └── mcp/
│ ├── mcp_common/ # Shared MCP utilities
│ ├── mcp_exa/ # Exa semantic search
│ ├── mcp_hero/ # Hero MCP integration
│ ├── mcp_ping/ # Ping test server
│ ├── mcp_scraperapi/ # ScraperAPI web scraping
│ ├── mcp_scrapfly/ # Scrapfly web scraping
│ ├── mcp_serpapi/ # SerpAPI web search
│ └── mcp_serper/ # Serper web search
├── modelsconfig.yml # Model definitions and pricing
└── mcp_servers.example.json # MCP server configuration template

Dependency Graph

hero_aibroker_lib (core logic)
 ↑
hero_aibroker_sdk (types, protocol, RPC client)
 ↑ ↑ ↑
 | | |
 server CLI UI

Architecture

All services bind Unix Domain Sockets under ~/hero/var/sockets/. There are no TCP listeners.

┌──────────────────────────────────────────────────────────────────┐
│ CLI (hero_aibroker chat/models/tools/health) │
│ connects via: ~/hero/var/sockets/hero_aibroker/rpc.sock │
└──────────────────────────────────┬───────────────────────────────┘
 │ JSON-RPC
┌──────────────────────────────────▼───────────────────────────────┐
│ hero_aibroker_server │
│ ├── JSON-RPC admin API → ~/hero/var/sockets/hero_aibroker/ │
│ │ rpc.sock │
│ └── REST (OpenAI-compat) → ~/hero/var/sockets/hero_aibroker/ │
│ rest.sock │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Service Layer │ │
│ │ (Routing logic, model selection, cost calculation) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Provider Layer │ │
│ │ (OpenAI, Groq, SambaNova, OpenRouter adapters) │ │
│ └──────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────┐
│ hero_aibroker_admin (admin dashboard) │
│ binds: ~/hero/var/sockets/hero_aibroker/admin.sock │
│ proxies JSON-RPC requests to hero_aibroker_server │
└──────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────┐
│ hero_aibroker_services (MCP broker) │
│ binds multiple sockets for aggregated MCP services │
└──────────────────────────────────────────────────────────────────┘

All services are registered with and managed by hero_proc via nushell scripts. Use service aibroker start --update --reset to manage the service lifecycle.

Quick Start

Prerequisites

Rust 1.70 or later
hero_proc installed and running
At least one LLM provider API key

Environment Variables

API keys are managed via hero_proc secrets — see hero_proc secrets for details. No manual env file sourcing required.

LLM provider keys (at least one required):

Variable	Description
`GROQ_API_KEY` / `GROQ_API_KEYS`	Groq API key(s)
`OPENROUTER_API_KEY` / `OPENROUTER_API_KEYS`	OpenRouter API key(s)
`SAMBANOVA_API_KEY` / `SAMBANOVA_API_KEYS`	SambaNova API key(s)
`OPENAI_API_KEY` / `OPENAI_API_KEYS`	OpenAI API key(s)

Both singular and plural variants are accepted. Use comma-separated values for multiple keys per provider — the broker creates separate provider instances and distributes requests across them for higher throughput, load distribution, and automatic failover.

Web/search tool keys (optional, used by MCP servers):

Variable	Description
`SERPAPI_API_KEYS`	SerpAPI web search
`SERPER_API_KEYS`	Serper web search
`EXA_API_KEYS`	Exa semantic search
`SCRAPERAPI_API_KEYS`	ScraperAPI web scraping
`SCRAPFLY_API_KEYS`	Scrapfly web scraping

Service configuration:

Variable	Default	Description
`ROUTING_STRATEGY`	`cheapest`	`cheapest` or `best`
`MCP_CONFIG_PATH`	—	Path to MCP server config JSON
`MODELS_CONFIG_PATH`	—	Path to model config YAML
`ADMIN_TOKEN`	—	Simple admin auth token
`HERO_SECRET`	—	Hero Auth JWT secret

Run

service aibroker start --update --reset

Stop

service aibroker stop

Status

service aibroker status

API Reference

All REST endpoints are served on ~/hero/var/sockets/hero_aibroker/rest.sock. Use curl --unix-socket to reach them.

List Models

curl --unix-socket ~/hero/var/sockets/hero_aibroker/rest.sock \
 http://localhost/v1/models

Chat Completions

curl --unix-socket ~/hero/var/sockets/hero_aibroker/rest.sock \
 http://localhost/v1/chat/completions \
 -H "Content-Type: application/json" \
 -d '{
 "model": "gpt-4o",
 "messages": [{"role": "user", "content": "Hello!"}],
 "stream": true
 }'

Text-to-Speech

curl --unix-socket ~/hero/var/sockets/hero_aibroker/rest.sock \
 http://localhost/v1/audio/speech \
 -H "Content-Type: application/json" \
 -d '{"model": "tts-1", "input": "Hello, world!", "voice": "alloy"}' \
 --output speech.mp3

Available TTS models: tts-1, tts-1-hd (requires OPENAI_API_KEY)

Speech-to-Text

curl --unix-socket ~/hero/var/sockets/hero_aibroker/rest.sock \
 http://localhost/v1/audio/transcriptions \
 -F "file=@audio.mp3" \
 -F "model=whisper-1"

Available STT models:

whisper-1 — multi-provider (Groq → SambaNova → OpenAI fallback chain)
whisper-large-v3 — direct Groq/SambaNova access

Embeddings

curl --unix-socket ~/hero/var/sockets/hero_aibroker/rest.sock \
 http://localhost/v1/embeddings \
 -H "Content-Type: application/json" \
 -d '{"model": "text-embedding-3-small", "input": "Hello, world!"}'

JSON-RPC Admin API

The admin API is served on ~/hero/var/sockets/hero_aibroker/rpc.sock:

# Health check
curl --unix-socket ~/hero/var/sockets/hero_aibroker/rpc.sock \
 http://localhost/rpc \
 -H "Content-Type: application/json" \
 -d '{"jsonrpc":"2.0","method":"health","params":{},"id":1}'

# List models
curl --unix-socket ~/hero/var/sockets/hero_aibroker/rpc.sock \
 http://localhost/rpc \
 -H "Content-Type: application/json" \
 -d '{"jsonrpc":"2.0","method":"models.list","params":{},"id":2}'

# List MCP tools
curl --unix-socket ~/hero/var/sockets/hero_aibroker/rpc.sock \
 http://localhost/rpc \
 -H "Content-Type: application/json" \
 -d '{"jsonrpc":"2.0","method":"mcp.list_tools","params":{},"id":3}'

Billing & Usage

# View all IP usage and costs
curl --unix-socket ~/hero/var/sockets/hero_aibroker/rest.sock \
 http://localhost/billing/usage

# View specific IP usage
curl --unix-socket ~/hero/var/sockets/hero_aibroker/rest.sock \
 http://localhost/billing/usage/127.0.0.1

All requests are persisted to SQLite with IP address, model, token usage, cost in USD, timestamps, and success/error status.

# Export to CSV
sqlite3 -header -csv requests.db "SELECT * FROM request_logs;" > billing.csv

Cascade — multi-broker setup

A broker can act as a client of another broker. This lets you run multiple hero_aibroker instances and chain them, e.g. one broker per host that forwards to a single root broker holding the real upstream-provider keys.

┌──────────────────┐ TCP /v1/* ┌────────────────────┐
│ hero_aibroker │ ───────────────► │ hero_aibroker │ HTTPS
│ (child / user) │ 127.0.0.1:33850 │ (mother / root) │ ───────► OpenRouter,
│ UDS-only │ │ TCP + UDS │ OpenAI, Groq, …
│ ai_broker_mother│ │ --mother banner │
│ ↑ provider │ │ │
└──────────────────┘ └────────────────────┘

The child registers the mother as a provider named ai_broker_mother (or ai_broker_mother2, … if there are several). Chat requests routed to that provider are forwarded over /v1/chat/completions to the mother's TCP address, which then makes the real upstream call.

Bind flags

hero_aibroker_server always opens its three Unix sockets under $PATH_SOCKET/hero_aibroker/. Pass these flags to also expose a TCP listener (REST /v1/* + admin /rpc on a single port):

Flag	Description
`--address <ip>`	Bind on this address. `127.0.0.1`, `::` (any), or a concrete mycelium IPv6 address. Required with `--port`.
`--port <num>`	TCP port for the combined listener. Required with `--address`.
`--mother`	Self-identify as the cascade root. Surfaced via `info` RPC; the admin UI renders a banner across the top of every page. Pure self-identification — does not change routing.

# Mother — bind on localhost, mark as root
hero_aibroker_server --address 127.0.0.1 --port 33850 --mother

# Mother — bind on mycelium so other hosts can reach it
hero_aibroker_server --address fc00::1 --port 33850 --mother

# Child — UDS-only is fine; cascade target is registered at runtime
hero_aibroker_server

Persistence: hero_db

Cascade configuration (the mother list) and per-provider priority overrides are persisted in hero_db under database aibroker_config. hero_db is a hard dependency of the broker: startup fails fast if its socket is unreachable.

Schema (all writes go through admin RPC, not direct hero_db calls):

Key	Kind	Fields
`mothers:ids`	set	mother ids (`ai_broker_mother`, `ai_broker_mother2`, …)
`mother:<id>`	hash	`address`, `port`, `label`, `priority`, `enabled`
`provider_priority`	hash	`<provider_name>` → priority integer (sparse, lower wins)

The default hero_db management socket is ~/hero/var/sockets/hero_db/rpc.sock. Override via the HERO_AIBROKER_DB_SOCKET env var when the broker runs with an isolated PATH_SOCKET but should still talk to the operator's hero_db (see the cascade integration test for an example).

Priority list — sparse integers

Routing picks the lowest-priority backend for a given model. Both the YAML registry's Backend.priority and admin overrides use the same "sparse integer" convention: prefer 1, 5, 10, 15, 20… so an operator can drop a new provider in between two others without renumbering.

The priority table layered on top of the YAML default:

YAML Backend.priority from modelsconfig.yml — baseline.
Per-provider override stored in hero_db provider_priority hash — wins when set; clear it (set to null) to revert to YAML.

Edit priorities from the Providers tab in the admin UI: each row has a numeric input, blank means "use YAML". Mother brokers also appear in the Providers tab tagged cascade, so reordering the cascade vs. direct providers happens in one place.

Admin UI

Two tabs in the admin dashboard:

Cascade — list/add/edit/remove upstream brokers. Address, port, optional label, priority, enabled toggle. Auto-assigns id on add.
Providers — adds a Priority column for all providers (direct + mother). Mother rows show a cascade badge.

When the broker was started with --mother, every page also shows a purple banner across the top: "This broker is the MOTHER (root of cascade)…". The banner reads info.is_mother and shows/hides automatically.

Admin RPC

All cascade operations go through the JSON-RPC /rpc endpoint (documented in crates/hero_aibroker_server/openrpc.json):

Method	Description
`mothers.list`	Return every registered upstream broker.
`mothers.add`	`{address, port, label?, priority?, enabled?}` → returns auto-assigned `id`.
`mothers.update`	`{id, ...patch}` — patch any subset of fields.
`mothers.remove`	`{id}` — drop from hero_db + rebuild provider map.
`priority.list`	Return the override map (provider name → priority).
`priority.set`	`{provider, priority}` — set; pass `priority: null` to clear.
`info`	Now also returns `is_mother` (bool) and `mother_count` (int).

Every mutation persists to hero_db and rebuilds the live provider map + chat service so changes apply without a restart.

Running two brokers via hero_proc — `--split`

The nushell service module service_aibroker.nu in hero_skills/tools/modules/services/ knows how to set up the cascade end-to-end:

# Single-broker default (unchanged)
service aibroker start

# Cascade: register both brokers; child auto-registers the mother
service aibroker start --split
service aibroker stop --split

# --reset wipes hero_db `aibroker_config` (mothers + priority overrides)
# and deletes apikeys.db before starting
service aibroker start --reset

# --update pulls newest source via `forge merge` and rebuilds
service aibroker start --update

--split semantics:

The regular service hero_aibroker becomes the child — UDS-only, user-space.
A second hero_proc service hero_aibroker_root is registered for the mother. Same binary, started with --mother --address 127.0.0.1 --port 33850, in its own PATH_SOCKET (…/hero_aibroker_root/) so its UDS sockets don't collide with the child.
macOS: both run as the invoking user (one hero_proc supervises both).
Linux: the mother is registered under root's hero_proc (the script auto-sudos); the child stays in user space. This lets the mother hold the privileged credentials / network position while the child remains a normal-user process.
After both are up, the script issues a mothers.add RPC against the child pointing at 127.0.0.1:33850 (label split-mode-mother, priority 1).

The default port is 33850. Change it by editing SVX_MOTHER_PORT at the top of service_aibroker.nu.

Manually wiring two brokers

If you don't want to use the nu module, the same cascade is just three RPC calls. With the mother running on 127.0.0.1:33850 and the child on its UDS:

# Confirm the mother is up and self-identifies
curl -s http://127.0.0.1:33850/rpc \
 -H 'content-type: application/json' \
 -d '{"jsonrpc":"2.0","id":1,"method":"info"}' | jq '.result.is_mother'
# → true

# Register the mother on the child
curl -s --unix-socket ~/hero/var/sockets/hero_aibroker/rpc.sock \
 http://localhost/rpc -H 'content-type: application/json' \
 -d '{
 "jsonrpc":"2.0","id":1,"method":"mothers.add",
 "params":{"address":"127.0.0.1","port":33850,"priority":1}
 }'
# → {"result":{"id":"ai_broker_mother","success":true}}

# Verify it sticks across restarts (the child reloads from hero_db on boot)
curl -s --unix-socket ~/hero/var/sockets/hero_aibroker/rpc.sock \
 http://localhost/rpc -H 'content-type: application/json' \
 -d '{"jsonrpc":"2.0","id":1,"method":"mothers.list"}' | jq

Integration test

crates/hero_aibroker_examples/tests/cascade.rs spawns two hero_aibroker_server child processes, registers the mother on the child via mothers.add, and asserts the cascade is live (info, mothers.list round-trip). Skips silently when hero_db isn't running.

cargo test -p hero_aibroker_examples --test cascade -- --nocapture

The test uses HERO_AIBROKER_DB_SOCKET so each broker runs with an isolated PATH_SOCKET while sharing the operator's hero_db.

CLI Usage

The hero_aibroker binary is the interactive CLI. It connects via ~/hero/var/sockets/hero_aibroker/rpc.sock.

# Interactive chat
hero_aibroker chat --model gpt-4o

# Chat with the default auto-routing model
hero_aibroker chat

# List available models
hero_aibroker models

# List MCP tools
hero_aibroker tools

# Check server health
hero_aibroker health

CLI Options

Global options:

-m, --model <MODEL> — model to use for chat (default: auto)
--socket <PATH> — custom socket path (default: ~/hero/var/sockets/hero_aibroker/rpc.sock)

Chat sub-command options:

-m, --model <MODEL> — model to use (overrides global --model)

Service Management

service aibroker start # register all services with hero_proc and start them
service aibroker stop # stop all services via hero_proc
service aibroker status # show service status

Model Configuration

Models are defined in modelsconfig.yml. The file controls display names, tiers, capabilities, context windows, and per-provider backends with pricing:

models:
 gpt-4o:
 display_name: "GPT-4o"
 tier: premium
 capabilities:
 - tool_calling
 - vision
 context_window: 128000
 backends:
 - provider: openrouter
 model_id: openai/gpt-4o
 priority: 1
 input_cost: 2.5 # USD per million tokens
 output_cost: 10.0

Set MODELS_CONFIG_PATH to point to your config file, or place modelsconfig.yml in the working directory.

Auto Model Selection

Use special model names for automatic selection:

Model Name	Description
`auto`	Use the configured `ROUTING_STRATEGY`
`autocheapest`	Select the cheapest available model
`autobest`	Select the best premium model

MCP Integration

The broker aggregates tools from multiple MCP (Model Context Protocol) servers managed by hero_aibroker_services. Configure servers in a JSON file pointed to by MCP_CONFIG_PATH (see mcp_servers.example.json):

{
 "mcpServers": [
 {
 "name": "serper",
 "command": "/path/to/mcp_serper",
 "args": [],
 "env": {}
 },
 {
 "name": "exa",
 "command": "/path/to/mcp_exa",
 "args": [],
 "env": {}
 }
 ]
}

Included MCP Servers

All MCP binaries are built as part of the workspace and managed by hero_aibroker_services:

Binary	Description	Required Key
`mcp_serper`	Web search via Serper	`SERPER_API_KEYS`
`mcp_serpapi`	Web search via SerpAPI	`SERPAPI_API_KEYS`
`mcp_exa`	Semantic search via Exa	`EXA_API_KEYS`
`mcp_scraperapi`	Web scraping via ScraperAPI	`SCRAPERAPI_API_KEYS`
`mcp_scrapfly`	Web scraping via Scrapfly	`SCRAPFLY_API_KEYS`
`mcp_ping`	Ping/test server	—
`mcp_hero`	Hero OS service discovery + LLM-driven Python code generation and execution	`HERO_SECRET`

MCP REST Endpoints

Endpoint	Description
`GET /mcp/tools`	List all aggregated tools
`POST /mcp/tools/:name`	Call a specific tool
`GET /mcp/sse`	SSE endpoint for MCP clients

Development

Building

# Release build (all workspace crates)
cargo build --release

# Debug build
cargo build

# Build a specific crate
cargo build -p hero_aibroker_server
cargo build -p hero_aibroker

# Check (no codegen)
cargo check --all

# Format
cargo fmt --all

# Lint
cargo clippy --all -- -D warnings

Running Tests

# Run all tests
cargo test --all

# Run tests for a specific crate
cargo test -p hero_aibroker_lib

Logs

proc logs tail hero_aibroker_server
proc logs tail hero_aibroker_admin
proc logs tail hero_aibroker_services

Deployment

Install Binaries

service aibroker install --update --reset

License

MIT License