rust port of kimi agent, adjusted to hero env
  • Rust 90.6%
  • JavaScript 4.1%
  • CSS 2.6%
  • Shell 1.7%
  • HTML 0.7%
  • Other 0.2%
Find a file
omarz b9210c99b4
All checks were successful
lab publish / publish (push) Successful in 30m39s
build: make 'make start' bind hero_kimi_web to router by default
WEB_PORT now defaults to empty => UDS-only (router) launch; set
WEB_PORT to also expose a direct TCP port.
2026-06-04 20:20:02 +02:00
.forgejo/workflows ci: publish musl-x86_64 binaries to rolling releases via lab 2026-06-02 14:46:46 +02:00
.github/workflows ci: publish musl-x86_64 binaries to rolling releases via lab 2026-06-02 14:46:46 +02:00
crates Merge development into main: deferred MCP tools + reliability suite 2026-06-04 20:07:00 +02:00
docs feat: Dioxus single-file chat web component (fake data) + AI pipeline 2026-06-01 11:16:31 +02:00
schema chore: fix service.toml field names and add wire protocol oschema 2026-05-31 11:21:49 +02:00
scripts build: add cargo build/install make targets; rename Dioxus build -> bundle 2026-06-04 13:45:12 +02:00
.gitignore feat: Dioxus single-file chat web component (fake data) + AI pipeline 2026-06-01 11:16:31 +02:00
.gitmodules build: remove kimi-cli submodule, add Makefile and parity docs 2026-05-18 06:20:40 +02:00
.pre-commit-config.yaml feat(rust): add rust version of kimi agent kernel - kagent (#717) 2026-02-04 22:32:31 +08:00
AGENTS.md fix(hero_kimi_web): serve the chat UI on the standard <service>/web.sock socket 2026-06-04 00:15:59 -04:00
Cargo.lock fix(hero_kimi_agent): embed builtin agent specs, skills, and plugins in the binary 2026-06-04 09:35:16 -04:00
Cargo.toml feat: Dioxus single-file chat web component (fake data) + AI pipeline 2026-06-01 11:16:31 +02:00
Cargo.toml.hero_builder_backup feat: restructure workspace into crates/ layout and add hero_kimi_web service 2026-05-18 11:51:32 +02:00
instructions_porting.md docs: normalize indentation in markdown docs + fix service.toml kind and env var name 2026-05-26 12:43:00 +02:00
LICENSE feat: initialize kimi-agent repo 2026-02-06 22:26:36 +08:00
Makefile build: make 'make start' bind hero_kimi_web to router by default 2026-06-04 20:20:02 +02:00
MCP_IMPROVEMENTS.md Add MCP_IMPROVEMENTS.md doc 2026-06-04 14:28:34 +02:00
NOTICE feat: restructure workspace into crates/ layout and add hero_kimi_web service 2026-05-18 11:51:32 +02:00
README.md fix(mcp): make read-only result cache opt-in with a TTL (was stale-prone) 2026-06-04 13:20:30 +02:00
run_kimi_web_debug.sh feat(hero_kimi_agent): add SessionStart/SessionEnd/StopFailure/Notification hooks 2026-05-25 12:30:48 +02:00
rust-toolchain.toml chore: migrate service_base imports to hero_lifecycle and add dev profile 2026-06-01 07:43:34 +02:00

Kimi Agent (Rust)

Rust implementation of Kimi Code CLI. Wire-only JSON-RPC agent server over stdio.

What is hero_kimi_agent?

hero_kimi_agent is an AI agent runtime that powers code generation and analysis workflows. It:

  • Runs as a JSON-RPC 2.0 server over stdin/stdout (wire protocol)
  • Manages multi-turn agent conversations with LLM orchestration
  • Provides tool execution (shell, file I/O, code analysis, testing)
  • Supports Model Context Protocol (MCP) for extensible tool integrations
  • Abstracts LLM providers (OpenAI, Anthropic, Kimi, etc.) via the kosong crate
  • Handles agent state, configuration, and session management
  • Implements approval workflows and result sharing

Quick Start

Build

cargo build -p hero_kimi_agent

Run immediately:

./target/debug/hero_kimi_agent

Install

Build and install the binary to $PATH_ROOT/bin/:

# Dev build
PATH_ROOT=~/.local make install

# Release build (optimized)
PATH_ROOT=~/.local make install-release

After installation, the hero_kimi_agent binary will be in ~/.local/bin/ and available on PATH (if ~/.local/bin is in your shell's $PATH).

Make targets

  • make build — build dev binary
  • make build-release — build optimized release binary
  • make install — build dev + install to $PATH_ROOT/bin
  • make install-release — build release + install to $PATH_ROOT/bin
  • make clean — remove build artifacts
  • make help — show all targets

Note: make install* requires PATH_ROOT environment variable to be set.

CLI Reference

hero_kimi_agent [OPTIONS] [COMMAND]

Session & working directory

Flag Short Description
--work-dir <PATH> -w Working directory for the agent. Default: current directory.
--session <SESSION_ID> -S Resume a specific session by ID. Default: create new session.
--continue -C Continue the most recent session for the working directory.

Model & provider

Flag Short Description
--model <NAME> -m LLM model to use (must match a key in config [models]). Default: default_model from config.
--thinking Enable extended thinking/reasoning mode.
--no-thinking Disable thinking mode (overrides config default).

Configuration

Flag Description
--config-file <PATH> Load config from a TOML or JSON file instead of ~/.kimi/config.toml.
--config <TOML_OR_JSON> Inline config string (TOML or JSON). Useful for one-off overrides without a file.

--config and --config-file are mutually exclusive.

Agent specification

Flag Description
--agent <builtin> Use a named builtin agent (default, okabe).
--agent-file <PATH> Load a custom agent YAML file.

--agent and --agent-file are mutually exclusive.

Skills

Flag Description
--skills-dir <PATH> Override the skills directory. Default: auto-discovered from ~/.kimi/skills/.

MCP (Model Context Protocol)

Flag Description
--mcp-config-file <PATH> Load an MCP config file. Repeat to add multiple.
--mcp-config <JSON> Inline MCP config JSON. Repeat to add multiple.

Deferred MCP tools — generic dispatch (context savings)

MCP servers often expose dozens or hundreds of tools, each with a bulky JSON schema. Sending all of them to the model on every turn wastes a large share of the context window — and because tool definitions are re-serialized on every turn, an "active" tool keeps costing tokens for the rest of the session.

Instead, the entire MCP surface is fronted by two fixed tools, so the per-turn tool list is O(1) no matter how many servers or tools are connected:

  • mcp_search(query) — finds tools by keyword and returns the top matches' name, description, and argument schema as a tool result (not as tool definitions). A tool's schema is therefore paid for once — on the turn it's searched — instead of every turn.
  • mcp_call(name, arguments) — invokes a tool by its mcp__<server>__<tool> name. Arguments are validated against the target tool's real schema inside the dispatcher, so the model still gets precise, schema-aware errors.

A compact, per-server index is appended to the system prompt so the model knows what's connected. The index is bounded: one line per server up to a cap, then a single summary line — so connecting 1000+ servers stays O(1) in prompt cost. mcp_search returns at most a handful of matches per call (default 5) so a broad query can't dump 100 schemas at once.

Measured on three real servers (53 tools): the per-turn tool payload is flat — registering 1 MCP tool or all 53 yields the same ~1,170-byte tool list (2 tools). Versus sending every schema each turn (~34,500 bytes), that's ~19× smaller / ~95% less, and it does not grow as more tools or servers connect. See the proof tests below.

Output side. MCP tools also return large blobs (a full list, a log dump, an OpenRPC spec). An mcp_call result is head+tail truncated to ~12k characters with a notice telling the model how to fetch just the part it needs (filter, pagination, a smaller limit, a specific id). This bounds the output the same way dispatch bounds the input — e.g. a real 170 KB rpc_discover result is trimmed to ~12k chars while still showing the start and end.

Further trimming. Per-tool descriptions are capped in the index, mcp_search prints (no arguments) instead of an empty schema, and a schema already returned earlier in the session is not re-emitted on a repeat search (the tool is shown by name only). The caps are tunable per deployment via env vars (defaults in parentheses):

Env var Default Effect
KIMI_MCP_RESULT_MAX_CHARS 12000 Max characters of an mcp_call result before head+tail truncation
KIMI_MCP_SEARCH_RESULT_CAP 5 Max tools a single mcp_search returns
KIMI_MCP_DESC_MAX_CHARS 300 Max characters of a per-tool description kept in the index
KIMI_MCP_CACHE_TTL_SECS 0 (off) TTL for the read-only result cache; 0/unset disables caching

Smarter matching and dispatch.

  • mcp_search ranks with BM25 over tokenized tool name + description, with light plural stemming and a small synonym map and name-token boosting — so "make a board", "rm file", or "enumerate services" find the create/delete/list tools even without an exact word overlap. It stays fully offline and deterministic (no embeddings).
  • mcp_call auto-resolves a dropped mcp__<server>__ prefix (the common case where the model passes a bare tool name) and, on a typo, returns "did you mean …" suggestions instead of a silent failure.
  • Results of tools the server marks readOnlyHint: true can be cached by (name, args) with a TTL, so an identical repeat call within the window returns instantly without re-dispatching or re-injecting the payload. Off by defaultreadOnlyHint means a tool doesn't modify state, not that its answer is stable (a status/list result changes over time), so caching is opt-in via KIMI_MCP_CACHE_TTL_SECS and the TTL bounds how stale a cached answer can be.
  • A large JSON-array result is summarized item-wise (whole leading items + an "N of M shown" note) instead of a blind byte cut.

Approval / safety

Flag Short Aliases Description
--yolo -y --yes, --auto-approve Approve all tool executions automatically. The user is still reachable for interactive questions.
--afk Away-from-keyboard: auto-approve tool calls and auto-dismiss interactive questions. Use when no user is at the terminal.

Print mode (non-interactive)

Run a single turn, write the result to stdout, and exit — for scripting and automation. Print mode implies --afk (fully non-interactive; --yolo is not required). Ports kimi-cli's --print.

Flag Short Aliases Description
--prompt <TEXT> -p -c, --command Prompt for a single non-interactive turn. Implies --print.
--print Run one turn non-interactively (implies --afk) and exit. Reads the prompt from -p, or from stdin if omitted.
--quiet Shortcut for --print --output-format text --final-message-only.
--output-format <FORMAT> text (default) or stream-json (JSONL, one chat message per line). Print mode only.
--final-message-only Output only the final assistant message. Print mode only.
# Single-turn, capture the assistant's answer
hero_kimi_agent -y -p "Summarize the architecture of this repo"

# Pipe the prompt from stdin
echo "List the .rs files" | hero_kimi_agent --print

# Only the final message, nothing else
hero_kimi_agent --quiet -p "How many tests are in this crate?"

# Machine-readable JSONL stream (assistant + tool messages)
hero_kimi_agent --output-format stream-json -p "Run the tests and report"

Loop control (advanced)

Flag Description
--max-steps-per-turn <N> Maximum LLM steps per user turn. Default: from config (100).
--max-retries-per-step <N> Maximum retries on a failed step. Default: from config (3).
--max-ralph-iterations <N> Extra iterations in Ralph (autonomous) mode. -1 = unlimited. Default: 0.

Diagnostics

Flag Description
--verbose Print verbose runtime information.
--debug Enable debug-level logging.
--version / -V Print version and exit.

Subcommands

Command Description
info Show version and wire protocol information.
mcp Manage MCP server configurations.

Web UI integration (KIMI_WEB_WORKER_ARGS)

When kimi web (the Python web server) detects hero_kimi_agent on PATH, it spawns it as the worker subprocess for each session. The web server always passes --work-dir and --session; all other flags come from the KIMI_WEB_WORKER_ARGS environment variable.

Set KIMI_WEB_WORKER_ARGS before starting the web server to forward any combination of the flags above:

# Use a specific model
KIMI_WEB_WORKER_ARGS="--model groq/gpt-oss-120b" uv run kimi web --personality blue

# Disable thinking mode
KIMI_WEB_WORKER_ARGS="--no-thinking" uv run kimi web --personality blue

# Load a custom agent file and point to a specific skills directory
KIMI_WEB_WORKER_ARGS="--agent-file /path/to/agent.yaml --skills-dir /path/to/skills" \
 uv run kimi web --personality blue

# Use an alternate config file
KIMI_WEB_WORKER_ARGS="--config-file /path/to/config.toml" uv run kimi web --personality blue

# Add an MCP server
KIMI_WEB_WORKER_ARGS='--mcp-config-file /path/to/mcp.json' uv run kimi web --personality blue

# Auto-approve all tool calls (yolo mode)
KIMI_WEB_WORKER_ARGS="--yolo" uv run kimi web --personality blue

# Combine multiple flags (use shell quoting as normal)
KIMI_WEB_WORKER_ARGS="--model groq/gpt-oss-120b --no-thinking --yolo --skills-dir ~/myskills" \
 uv run kimi web --personality blue

KIMI_WEB_WORKER_ARGS is parsed with shlex rules (same as a shell command line), so spaces inside quoted strings are handled correctly.

Note: KIMI_WEB_WORKER_ARGS only affects the Rust hero_kimi_agent worker. If hero_kimi_agent is not on PATH, the web server falls back to the Python worker, which ignores this variable and reads all settings from ~/.kimi/config.toml.

Test

cargo test # all
cargo test -p hero_kimi_agent # agent
cargo test -p kosong # LLM abstraction
cargo test -p kaos # OS abstraction

Deferred-MCP-tools proof tests

The context savings from generic MCP dispatch are backed by tests that run the real code path (KimiToolset::tools() and kosong's real wire encoder) and measure the exact bytes sent in the request tools[] array:

# Offline: matcher, bounded/O(1) index, search result cap, dispatch-not-listed, and a
# byte-for-byte flatness measurement over captured real schemas (also a regression guard).
cargo test -p hero_kimi_agent --lib deferred_tests
cargo test -p hero_kimi_agent --lib token_proof -- --nocapture

# Live end-to-end: connects real MCP servers over stdio (needs npx + network) and
# captures the literal tools[] + system prompt the agent transmits through kosong::step.
# Self-skips unless the env var is set, so the default suite stays offline.
KIMI_LIVE_MCP_PROOF=1 cargo test -p hero_kimi_agent --lib live_proof -- --nocapture

Workspace

Crate Purpose
hero_kimi_agent Main binary — wire server, tools, agent loop, MCP
kosong LLM abstraction — messages, tool schemas, providers
kaos OS abstraction — LocalKaos, path semantics

Relationship to Python

This repo is a Rust rewrite of the Python kimi-cli runtime. The two must stay compatible on wire protocol, message formats, ~/.kimi data layout, tool schemas, and all other externally observable behavior. Python is the source of truth.

The Python repo is pinned as a git submodule at kimi-cli/:

git submodule update --init

Version numbers must always match kimi-cli exactly.

License

Apache-2.0. See LICENSE and NOTICE.