- Rust 90.6%
- JavaScript 4.1%
- CSS 2.6%
- Shell 1.7%
- HTML 0.7%
- Other 0.2%
|
All checks were successful
lab publish / publish (push) Successful in 30m39s
WEB_PORT now defaults to empty => UDS-only (router) launch; set WEB_PORT to also expose a direct TCP port. |
||
|---|---|---|
| .forgejo/workflows | ||
| .github/workflows | ||
| crates | ||
| docs | ||
| schema | ||
| scripts | ||
| .gitignore | ||
| .gitmodules | ||
| .pre-commit-config.yaml | ||
| AGENTS.md | ||
| Cargo.lock | ||
| Cargo.toml | ||
| Cargo.toml.hero_builder_backup | ||
| instructions_porting.md | ||
| LICENSE | ||
| Makefile | ||
| MCP_IMPROVEMENTS.md | ||
| NOTICE | ||
| README.md | ||
| run_kimi_web_debug.sh | ||
| rust-toolchain.toml | ||
Kimi Agent (Rust)
Rust implementation of Kimi Code CLI. Wire-only JSON-RPC agent server over stdio.
What is hero_kimi_agent?
hero_kimi_agent is an AI agent runtime that powers code generation and analysis workflows. It:
- Runs as a JSON-RPC 2.0 server over stdin/stdout (wire protocol)
- Manages multi-turn agent conversations with LLM orchestration
- Provides tool execution (shell, file I/O, code analysis, testing)
- Supports Model Context Protocol (MCP) for extensible tool integrations
- Abstracts LLM providers (OpenAI, Anthropic, Kimi, etc.) via the
kosongcrate - Handles agent state, configuration, and session management
- Implements approval workflows and result sharing
Quick Start
Build
cargo build -p hero_kimi_agent
Run immediately:
./target/debug/hero_kimi_agent
Install
Build and install the binary to $PATH_ROOT/bin/:
# Dev build
PATH_ROOT=~/.local make install
# Release build (optimized)
PATH_ROOT=~/.local make install-release
After installation, the hero_kimi_agent binary will be in ~/.local/bin/ and available on PATH (if ~/.local/bin is in your shell's $PATH).
Make targets
make build— build dev binarymake build-release— build optimized release binarymake install— build dev + install to$PATH_ROOT/binmake install-release— build release + install to$PATH_ROOT/binmake clean— remove build artifactsmake help— show all targets
Note: make install* requires PATH_ROOT environment variable to be set.
CLI Reference
hero_kimi_agent [OPTIONS] [COMMAND]
Session & working directory
| Flag | Short | Description |
|---|---|---|
--work-dir <PATH> |
-w |
Working directory for the agent. Default: current directory. |
--session <SESSION_ID> |
-S |
Resume a specific session by ID. Default: create new session. |
--continue |
-C |
Continue the most recent session for the working directory. |
Model & provider
| Flag | Short | Description |
|---|---|---|
--model <NAME> |
-m |
LLM model to use (must match a key in config [models]). Default: default_model from config. |
--thinking |
Enable extended thinking/reasoning mode. | |
--no-thinking |
Disable thinking mode (overrides config default). |
Configuration
| Flag | Description |
|---|---|
--config-file <PATH> |
Load config from a TOML or JSON file instead of ~/.kimi/config.toml. |
--config <TOML_OR_JSON> |
Inline config string (TOML or JSON). Useful for one-off overrides without a file. |
--config and --config-file are mutually exclusive.
Agent specification
| Flag | Description |
|---|---|
--agent <builtin> |
Use a named builtin agent (default, okabe). |
--agent-file <PATH> |
Load a custom agent YAML file. |
--agent and --agent-file are mutually exclusive.
Skills
| Flag | Description |
|---|---|
--skills-dir <PATH> |
Override the skills directory. Default: auto-discovered from ~/.kimi/skills/. |
MCP (Model Context Protocol)
| Flag | Description |
|---|---|
--mcp-config-file <PATH> |
Load an MCP config file. Repeat to add multiple. |
--mcp-config <JSON> |
Inline MCP config JSON. Repeat to add multiple. |
Deferred MCP tools — generic dispatch (context savings)
MCP servers often expose dozens or hundreds of tools, each with a bulky JSON schema. Sending all of them to the model on every turn wastes a large share of the context window — and because tool definitions are re-serialized on every turn, an "active" tool keeps costing tokens for the rest of the session.
Instead, the entire MCP surface is fronted by two fixed tools, so the per-turn tool list is O(1) no matter how many servers or tools are connected:
mcp_search(query)— finds tools by keyword and returns the top matches' name, description, and argument schema as a tool result (not as tool definitions). A tool's schema is therefore paid for once — on the turn it's searched — instead of every turn.mcp_call(name, arguments)— invokes a tool by itsmcp__<server>__<tool>name. Arguments are validated against the target tool's real schema inside the dispatcher, so the model still gets precise, schema-aware errors.
A compact, per-server index is appended to the system prompt so the model knows what's connected. The index is bounded: one line per server up to a cap, then a single summary line — so connecting 1000+ servers stays O(1) in prompt cost. mcp_search returns at most a handful of matches per call (default 5) so a broad query can't dump 100 schemas at once.
Measured on three real servers (53 tools): the per-turn tool payload is flat — registering 1 MCP tool or all 53 yields the same ~1,170-byte tool list (2 tools). Versus sending every schema each turn (~34,500 bytes), that's ~19× smaller / ~95% less, and it does not grow as more tools or servers connect. See the proof tests below.
Output side. MCP tools also return large blobs (a full list, a log dump, an OpenRPC spec). An mcp_call result is head+tail truncated to ~12k characters with a notice telling the model how to fetch just the part it needs (filter, pagination, a smaller limit, a specific id). This bounds the output the same way dispatch bounds the input — e.g. a real 170 KB rpc_discover result is trimmed to ~12k chars while still showing the start and end.
Further trimming. Per-tool descriptions are capped in the index, mcp_search prints (no arguments) instead of an empty schema, and a schema already returned earlier in the session is not re-emitted on a repeat search (the tool is shown by name only). The caps are tunable per deployment via env vars (defaults in parentheses):
| Env var | Default | Effect |
|---|---|---|
KIMI_MCP_RESULT_MAX_CHARS |
12000 | Max characters of an mcp_call result before head+tail truncation |
KIMI_MCP_SEARCH_RESULT_CAP |
5 | Max tools a single mcp_search returns |
KIMI_MCP_DESC_MAX_CHARS |
300 | Max characters of a per-tool description kept in the index |
KIMI_MCP_CACHE_TTL_SECS |
0 (off) | TTL for the read-only result cache; 0/unset disables caching |
Smarter matching and dispatch.
mcp_searchranks with BM25 over tokenized tool name + description, with light plural stemming and a small synonym map and name-token boosting — so "make a board", "rm file", or "enumerate services" find the create/delete/list tools even without an exact word overlap. It stays fully offline and deterministic (no embeddings).mcp_callauto-resolves a droppedmcp__<server>__prefix (the common case where the model passes a bare tool name) and, on a typo, returns "did you mean …" suggestions instead of a silent failure.- Results of tools the server marks
readOnlyHint: truecan be cached by (name, args) with a TTL, so an identical repeat call within the window returns instantly without re-dispatching or re-injecting the payload. Off by default —readOnlyHintmeans a tool doesn't modify state, not that its answer is stable (astatus/listresult changes over time), so caching is opt-in viaKIMI_MCP_CACHE_TTL_SECSand the TTL bounds how stale a cached answer can be. - A large JSON-array result is summarized item-wise (whole leading items + an "N of M shown" note) instead of a blind byte cut.
Approval / safety
| Flag | Short | Aliases | Description |
|---|---|---|---|
--yolo |
-y |
--yes, --auto-approve |
Approve all tool executions automatically. The user is still reachable for interactive questions. |
--afk |
Away-from-keyboard: auto-approve tool calls and auto-dismiss interactive questions. Use when no user is at the terminal. |
Print mode (non-interactive)
Run a single turn, write the result to stdout, and exit — for scripting and automation. Print mode implies --afk (fully non-interactive; --yolo is not required). Ports kimi-cli's --print.
| Flag | Short | Aliases | Description |
|---|---|---|---|
--prompt <TEXT> |
-p |
-c, --command |
Prompt for a single non-interactive turn. Implies --print. |
--print |
Run one turn non-interactively (implies --afk) and exit. Reads the prompt from -p, or from stdin if omitted. |
||
--quiet |
Shortcut for --print --output-format text --final-message-only. |
||
--output-format <FORMAT> |
text (default) or stream-json (JSONL, one chat message per line). Print mode only. |
||
--final-message-only |
Output only the final assistant message. Print mode only. |
# Single-turn, capture the assistant's answer
hero_kimi_agent -y -p "Summarize the architecture of this repo"
# Pipe the prompt from stdin
echo "List the .rs files" | hero_kimi_agent --print
# Only the final message, nothing else
hero_kimi_agent --quiet -p "How many tests are in this crate?"
# Machine-readable JSONL stream (assistant + tool messages)
hero_kimi_agent --output-format stream-json -p "Run the tests and report"
Loop control (advanced)
| Flag | Description |
|---|---|
--max-steps-per-turn <N> |
Maximum LLM steps per user turn. Default: from config (100). |
--max-retries-per-step <N> |
Maximum retries on a failed step. Default: from config (3). |
--max-ralph-iterations <N> |
Extra iterations in Ralph (autonomous) mode. -1 = unlimited. Default: 0. |
Diagnostics
| Flag | Description |
|---|---|
--verbose |
Print verbose runtime information. |
--debug |
Enable debug-level logging. |
--version / -V |
Print version and exit. |
Subcommands
| Command | Description |
|---|---|
info |
Show version and wire protocol information. |
mcp |
Manage MCP server configurations. |
Web UI integration (KIMI_WEB_WORKER_ARGS)
When kimi web (the Python web server) detects hero_kimi_agent on PATH, it spawns it as the worker subprocess for each session. The web server always passes --work-dir and --session; all other flags come from the KIMI_WEB_WORKER_ARGS environment variable.
Set KIMI_WEB_WORKER_ARGS before starting the web server to forward any combination of the flags above:
# Use a specific model
KIMI_WEB_WORKER_ARGS="--model groq/gpt-oss-120b" uv run kimi web --personality blue
# Disable thinking mode
KIMI_WEB_WORKER_ARGS="--no-thinking" uv run kimi web --personality blue
# Load a custom agent file and point to a specific skills directory
KIMI_WEB_WORKER_ARGS="--agent-file /path/to/agent.yaml --skills-dir /path/to/skills" \
uv run kimi web --personality blue
# Use an alternate config file
KIMI_WEB_WORKER_ARGS="--config-file /path/to/config.toml" uv run kimi web --personality blue
# Add an MCP server
KIMI_WEB_WORKER_ARGS='--mcp-config-file /path/to/mcp.json' uv run kimi web --personality blue
# Auto-approve all tool calls (yolo mode)
KIMI_WEB_WORKER_ARGS="--yolo" uv run kimi web --personality blue
# Combine multiple flags (use shell quoting as normal)
KIMI_WEB_WORKER_ARGS="--model groq/gpt-oss-120b --no-thinking --yolo --skills-dir ~/myskills" \
uv run kimi web --personality blue
KIMI_WEB_WORKER_ARGS is parsed with shlex rules (same as a shell command line), so spaces inside quoted strings are handled correctly.
Note:
KIMI_WEB_WORKER_ARGSonly affects the Rusthero_kimi_agentworker. Ifhero_kimi_agentis not onPATH, the web server falls back to the Python worker, which ignores this variable and reads all settings from~/.kimi/config.toml.
Test
cargo test # all
cargo test -p hero_kimi_agent # agent
cargo test -p kosong # LLM abstraction
cargo test -p kaos # OS abstraction
Deferred-MCP-tools proof tests
The context savings from generic MCP dispatch are backed by tests that run the real code path (KimiToolset::tools() and kosong's real wire encoder) and measure the exact bytes sent in the request tools[] array:
# Offline: matcher, bounded/O(1) index, search result cap, dispatch-not-listed, and a
# byte-for-byte flatness measurement over captured real schemas (also a regression guard).
cargo test -p hero_kimi_agent --lib deferred_tests
cargo test -p hero_kimi_agent --lib token_proof -- --nocapture
# Live end-to-end: connects real MCP servers over stdio (needs npx + network) and
# captures the literal tools[] + system prompt the agent transmits through kosong::step.
# Self-skips unless the env var is set, so the default suite stays offline.
KIMI_LIVE_MCP_PROOF=1 cargo test -p hero_kimi_agent --lib live_proof -- --nocapture
Workspace
| Crate | Purpose |
|---|---|
hero_kimi_agent |
Main binary — wire server, tools, agent loop, MCP |
kosong |
LLM abstraction — messages, tool schemas, providers |
kaos |
OS abstraction — LocalKaos, path semantics |
Relationship to Python
This repo is a Rust rewrite of the Python kimi-cli runtime. The two must stay compatible on wire protocol, message formats, ~/.kimi data layout, tool schemas, and all other externally observable behavior. Python is the source of truth.
The Python repo is pinned as a git submodule at kimi-cli/:
git submodule update --init
Version numbers must always match kimi-cli exactly.