rust port of kimi agent, adjusted to hero env

Rust 90.6%
JavaScript 4.1%
CSS 2.6%
Shell 1.7%
HTML 0.7%
Other 0.2%

Find a file

omarz b9210c99b4 All checks were successful lab publish / publish (push) Successful in 30m39s Details build: make 'make start' bind hero_kimi_web to router by default WEB_PORT now defaults to empty => UDS-only (router) launch; set WEB_PORT to also expose a direct TCP port.		2026-06-04 20:20:02 +02:00
.forgejo/workflows	ci: publish musl-x86_64 binaries to rolling releases via lab	2026-06-02 14:46:46 +02:00
.github/workflows	ci: publish musl-x86_64 binaries to rolling releases via lab	2026-06-02 14:46:46 +02:00
crates	Merge development into main: deferred MCP tools + reliability suite	2026-06-04 20:07:00 +02:00
docs	feat: Dioxus single-file chat web component (fake data) + AI pipeline	2026-06-01 11:16:31 +02:00
schema	chore: fix service.toml field names and add wire protocol oschema	2026-05-31 11:21:49 +02:00
scripts	build: add cargo build/install make targets; rename Dioxus build -> bundle	2026-06-04 13:45:12 +02:00
.gitignore	feat: Dioxus single-file chat web component (fake data) + AI pipeline	2026-06-01 11:16:31 +02:00
.gitmodules	build: remove kimi-cli submodule, add Makefile and parity docs	2026-05-18 06:20:40 +02:00
.pre-commit-config.yaml	feat(rust): add rust version of kimi agent kernel - `kagent` (#717 )	2026-02-04 22:32:31 +08:00
AGENTS.md	fix(hero_kimi_web): serve the chat UI on the standard <service>/web.sock socket	2026-06-04 00:15:59 -04:00
Cargo.lock	fix(hero_kimi_agent): embed builtin agent specs, skills, and plugins in the binary	2026-06-04 09:35:16 -04:00
Cargo.toml	feat: Dioxus single-file chat web component (fake data) + AI pipeline	2026-06-01 11:16:31 +02:00
Cargo.toml.hero_builder_backup	feat: restructure workspace into crates/ layout and add hero_kimi_web service	2026-05-18 11:51:32 +02:00
instructions_porting.md	docs: normalize indentation in markdown docs + fix service.toml kind and env var name	2026-05-26 12:43:00 +02:00
LICENSE	feat: initialize kimi-agent repo	2026-02-06 22:26:36 +08:00
Makefile	build: make 'make start' bind hero_kimi_web to router by default	2026-06-04 20:20:02 +02:00
MCP_IMPROVEMENTS.md	Add MCP_IMPROVEMENTS.md doc	2026-06-04 14:28:34 +02:00
NOTICE	feat: restructure workspace into crates/ layout and add hero_kimi_web service	2026-05-18 11:51:32 +02:00
README.md	fix(mcp): make read-only result cache opt-in with a TTL (was stale-prone)	2026-06-04 13:20:30 +02:00
run_kimi_web_debug.sh	feat(hero_kimi_agent): add SessionStart/SessionEnd/StopFailure/Notification hooks	2026-05-25 12:30:48 +02:00
rust-toolchain.toml	chore: migrate service_base imports to hero_lifecycle and add dev profile	2026-06-01 07:43:34 +02:00

README.md

Kimi Agent (Rust)

Rust implementation of Kimi Code CLI. Wire-only JSON-RPC agent server over stdio.

What is hero_kimi_agent?

hero_kimi_agent is an AI agent runtime that powers code generation and analysis workflows. It:

Runs as a JSON-RPC 2.0 server over stdin/stdout (wire protocol)
Manages multi-turn agent conversations with LLM orchestration
Provides tool execution (shell, file I/O, code analysis, testing)
Supports Model Context Protocol (MCP) for extensible tool integrations
Abstracts LLM providers (OpenAI, Anthropic, Kimi, etc.) via the kosong crate
Handles agent state, configuration, and session management
Implements approval workflows and result sharing

Quick Start

Build

cargo build -p hero_kimi_agent

Run immediately:

./target/debug/hero_kimi_agent

Install

Build and install the binary to $PATH_ROOT/bin/:

# Dev build
PATH_ROOT=~/.local make install

# Release build (optimized)
PATH_ROOT=~/.local make install-release

After installation, the hero_kimi_agent binary will be in ~/.local/bin/ and available on PATH (if ~/.local/bin is in your shell's $PATH).

Make targets

make build — build dev binary
make build-release — build optimized release binary
make install — build dev + install to $PATH_ROOT/bin
make install-release — build release + install to $PATH_ROOT/bin
make clean — remove build artifacts
make help — show all targets

Note: make install* requires PATH_ROOT environment variable to be set.

CLI Reference

hero_kimi_agent [OPTIONS] [COMMAND]

Session & working directory

Flag	Short	Description
`--work-dir <PATH>`	`-w`	Working directory for the agent. Default: current directory.
`--session <SESSION_ID>`	`-S`	Resume a specific session by ID. Default: create new session.
`--continue`	`-C`	Continue the most recent session for the working directory.

Model & provider

Flag	Short	Description
`--model <NAME>`	`-m`	LLM model to use (must match a key in config `[models]`). Default: `default_model` from config.
`--thinking`		Enable extended thinking/reasoning mode.
`--no-thinking`		Disable thinking mode (overrides config default).

Configuration

Flag	Description
`--config-file <PATH>`	Load config from a TOML or JSON file instead of `~/.kimi/config.toml`.
`--config <TOML_OR_JSON>`	Inline config string (TOML or JSON). Useful for one-off overrides without a file.

--config and --config-file are mutually exclusive.

Agent specification

Flag	Description
`--agent <builtin>`	Use a named builtin agent (`default`, `okabe`).
`--agent-file <PATH>`	Load a custom agent YAML file.

--agent and --agent-file are mutually exclusive.

Skills

Flag	Description
`--skills-dir <PATH>`	Override the skills directory. Default: auto-discovered from `~/.kimi/skills/`.

MCP (Model Context Protocol)

Flag	Description
`--mcp-config-file <PATH>`	Load an MCP config file. Repeat to add multiple.
`--mcp-config <JSON>`	Inline MCP config JSON. Repeat to add multiple.

Deferred MCP tools — generic dispatch (context savings)

MCP servers often expose dozens or hundreds of tools, each with a bulky JSON schema. Sending all of them to the model on every turn wastes a large share of the context window — and because tool definitions are re-serialized on every turn, an "active" tool keeps costing tokens for the rest of the session.

Instead, the entire MCP surface is fronted by two fixed tools, so the per-turn tool list is O(1) no matter how many servers or tools are connected:

mcp_search(query) — finds tools by keyword and returns the top matches' name, description, and argument schema as a tool result (not as tool definitions). A tool's schema is therefore paid for once — on the turn it's searched — instead of every turn.
mcp_call(name, arguments) — invokes a tool by its mcp__<server>__<tool> name. Arguments are validated against the target tool's real schema inside the dispatcher, so the model still gets precise, schema-aware errors.

A compact, per-server index is appended to the system prompt so the model knows what's connected. The index is bounded: one line per server up to a cap, then a single summary line — so connecting 1000+ servers stays O(1) in prompt cost. mcp_search returns at most a handful of matches per call (default 5) so a broad query can't dump 100 schemas at once.

Measured on three real servers (53 tools): the per-turn tool payload is flat — registering 1 MCP tool or all 53 yields the same ~1,170-byte tool list (2 tools). Versus sending every schema each turn (~34,500 bytes), that's ~19× smaller / ~95% less, and it does not grow as more tools or servers connect. See the proof tests below.

Output side. MCP tools also return large blobs (a full list, a log dump, an OpenRPC spec). An mcp_call result is head+tail truncated to ~12k characters with a notice telling the model how to fetch just the part it needs (filter, pagination, a smaller limit, a specific id). This bounds the output the same way dispatch bounds the input — e.g. a real 170 KB rpc_discover result is trimmed to ~12k chars while still showing the start and end.

Further trimming. Per-tool descriptions are capped in the index, mcp_search prints (no arguments) instead of an empty schema, and a schema already returned earlier in the session is not re-emitted on a repeat search (the tool is shown by name only). The caps are tunable per deployment via env vars (defaults in parentheses):

Env var	Default	Effect
`KIMI_MCP_RESULT_MAX_CHARS`	12000	Max characters of an `mcp_call` result before head+tail truncation
`KIMI_MCP_SEARCH_RESULT_CAP`	5	Max tools a single `mcp_search` returns
`KIMI_MCP_DESC_MAX_CHARS`	300	Max characters of a per-tool description kept in the index
`KIMI_MCP_CACHE_TTL_SECS`	0 (off)	TTL for the read-only result cache; `0`/unset disables caching

Smarter matching and dispatch.

mcp_search ranks with BM25 over tokenized tool name + description, with light plural stemming and a small synonym map and name-token boosting — so "make a board", "rm file", or "enumerate services" find the create/delete/list tools even without an exact word overlap. It stays fully offline and deterministic (no embeddings).
mcp_call auto-resolves a dropped mcp__<server>__ prefix (the common case where the model passes a bare tool name) and, on a typo, returns "did you mean …" suggestions instead of a silent failure.
Results of tools the server marks readOnlyHint: true can be cached by (name, args) with a TTL, so an identical repeat call within the window returns instantly without re-dispatching or re-injecting the payload. Off by default — readOnlyHint means a tool doesn't modify state, not that its answer is stable (a status/list result changes over time), so caching is opt-in via KIMI_MCP_CACHE_TTL_SECS and the TTL bounds how stale a cached answer can be.
A large JSON-array result is summarized item-wise (whole leading items + an "N of M shown" note) instead of a blind byte cut.

Approval / safety

Flag	Short	Aliases	Description
`--yolo`	`-y`	`--yes`, `--auto-approve`	Approve all tool executions automatically. The user is still reachable for interactive questions.
`--afk`			Away-from-keyboard: auto-approve tool calls and auto-dismiss interactive questions. Use when no user is at the terminal.

Print mode (non-interactive)

Run a single turn, write the result to stdout, and exit — for scripting and automation. Print mode implies --afk (fully non-interactive; --yolo is not required). Ports kimi-cli's --print.

Flag	Short	Aliases	Description
`--prompt <TEXT>`	`-p`	`-c`, `--command`	Prompt for a single non-interactive turn. Implies `--print`.
`--print`			Run one turn non-interactively (implies `--afk`) and exit. Reads the prompt from `-p`, or from stdin if omitted.
`--quiet`			Shortcut for `--print --output-format text --final-message-only`.
`--output-format <FORMAT>`			`text` (default) or `stream-json` (JSONL, one chat message per line). Print mode only.
`--final-message-only`			Output only the final assistant message. Print mode only.

# Single-turn, capture the assistant's answer
hero_kimi_agent -y -p "Summarize the architecture of this repo"

# Pipe the prompt from stdin
echo "List the .rs files" | hero_kimi_agent --print

# Only the final message, nothing else
hero_kimi_agent --quiet -p "How many tests are in this crate?"

# Machine-readable JSONL stream (assistant + tool messages)
hero_kimi_agent --output-format stream-json -p "Run the tests and report"

Loop control (advanced)

Flag	Description
`--max-steps-per-turn <N>`	Maximum LLM steps per user turn. Default: from config (100).
`--max-retries-per-step <N>`	Maximum retries on a failed step. Default: from config (3).
`--max-ralph-iterations <N>`	Extra iterations in Ralph (autonomous) mode. `-1` = unlimited. Default: 0.

Diagnostics

Flag	Description
`--verbose`	Print verbose runtime information.
`--debug`	Enable debug-level logging.
`--version` / `-V`	Print version and exit.

Subcommands

Command	Description
`info`	Show version and wire protocol information.
`mcp`	Manage MCP server configurations.

Web UI integration (KIMI_WEB_WORKER_ARGS)

When kimi web (the Python web server) detects hero_kimi_agent on PATH, it spawns it as the worker subprocess for each session. The web server always passes --work-dir and --session; all other flags come from the KIMI_WEB_WORKER_ARGS environment variable.

Set KIMI_WEB_WORKER_ARGS before starting the web server to forward any combination of the flags above:

# Use a specific model
KIMI_WEB_WORKER_ARGS="--model groq/gpt-oss-120b" uv run kimi web --personality blue

# Disable thinking mode
KIMI_WEB_WORKER_ARGS="--no-thinking" uv run kimi web --personality blue

# Load a custom agent file and point to a specific skills directory
KIMI_WEB_WORKER_ARGS="--agent-file /path/to/agent.yaml --skills-dir /path/to/skills" \
 uv run kimi web --personality blue

# Use an alternate config file
KIMI_WEB_WORKER_ARGS="--config-file /path/to/config.toml" uv run kimi web --personality blue

# Add an MCP server
KIMI_WEB_WORKER_ARGS='--mcp-config-file /path/to/mcp.json' uv run kimi web --personality blue

# Auto-approve all tool calls (yolo mode)
KIMI_WEB_WORKER_ARGS="--yolo" uv run kimi web --personality blue

# Combine multiple flags (use shell quoting as normal)
KIMI_WEB_WORKER_ARGS="--model groq/gpt-oss-120b --no-thinking --yolo --skills-dir ~/myskills" \
 uv run kimi web --personality blue

KIMI_WEB_WORKER_ARGS is parsed with shlex rules (same as a shell command line), so spaces inside quoted strings are handled correctly.

Note: KIMI_WEB_WORKER_ARGS only affects the Rust hero_kimi_agent worker. If hero_kimi_agent is not on PATH, the web server falls back to the Python worker, which ignores this variable and reads all settings from ~/.kimi/config.toml.

Test

cargo test # all
cargo test -p hero_kimi_agent # agent
cargo test -p kosong # LLM abstraction
cargo test -p kaos # OS abstraction

Deferred-MCP-tools proof tests

The context savings from generic MCP dispatch are backed by tests that run the real code path (KimiToolset::tools() and kosong's real wire encoder) and measure the exact bytes sent in the request tools[] array:

# Offline: matcher, bounded/O(1) index, search result cap, dispatch-not-listed, and a
# byte-for-byte flatness measurement over captured real schemas (also a regression guard).
cargo test -p hero_kimi_agent --lib deferred_tests
cargo test -p hero_kimi_agent --lib token_proof -- --nocapture

# Live end-to-end: connects real MCP servers over stdio (needs npx + network) and
# captures the literal tools[] + system prompt the agent transmits through kosong::step.
# Self-skips unless the env var is set, so the default suite stays offline.
KIMI_LIVE_MCP_PROOF=1 cargo test -p hero_kimi_agent --lib live_proof -- --nocapture

Workspace

Crate	Purpose
`hero_kimi_agent`	Main binary — wire server, tools, agent loop, MCP
`kosong`	LLM abstraction — messages, tool schemas, providers
`kaos`	OS abstraction — LocalKaos, path semantics

Relationship to Python

This repo is a Rust rewrite of the Python kimi-cli runtime. The two must stay compatible on wire protocol, message formats, ~/.kimi data layout, tool schemas, and all other externally observable behavior. Python is the source of truth.

The Python repo is pinned as a git submodule at kimi-cli/:

git submodule update --init

Version numbers must always match kimi-cli exactly.

License

Apache-2.0. See LICENSE and NOTICE.

README.md Unescape Escape