Port Tier 1/2/3 learnings from xmoncode/shrimp #90

Open
thabeta wants to merge 1 commit from port-from-xmoncode-shrimp into integration
Owner

Summary

Ports a batch of ideas from xmonader's personal shrimp agent (~/xmoncode/shrimp) into hero_shrimpall of Tier 1/2/3 from the comparison write-up. Each item is adapted to hero_shrimp's architecture (not a copy — upstream is a different crate layout), wired into the engine/runtime/CLI, and unit-tested.

Full details: docs/ports-from-xmoncode-shrimp.md.

What's included

New tools (registered + routed):

  • repo_wiki — drift-tracked ARCHITECTURE.md from the repo map
  • find_clones — near-duplicate function bodies (token-bag cosine)
  • impacted_tests — tests depending on changed files (via blast_radius)
  • ast_edit — tree-sitter symbol replacement (Rust) with a post-parse rollback gate
  • expand_context — retrieve the full text of an elided tool output on demand
  • fork — best-of-N candidate race in isolated git worktrees
  • mcp_search — BM25 ranking + name-resolve over MCP tools
  • skill_evolve — deterministic skill minting from recurring success patterns

Behavior / hot paths:

  • loop-detection cold-start exploration grace
  • Anthropic prompt cache anchored on the last stable (assistant) message
  • per-server MCP circuit breaker
  • RRF + MMR diversity re-rank in memory recall
  • per-segment shell grant keys in the session approval cache
  • conversational approve-over-chat (Telegram) + reject-with-feedback
  • declarative file-defined crews (dependency-wave DAG + typed handoffs)
  • macOS Seatbelt (sandbox-exec) shell backend
  • typed llm:deltaMessagePartial at the client edge
  • council raised 3 → 4 members (MAX_COUNCILORS + tier clamp)
  • new tools wired into tool_routing groups

Harnesses:

  • 5 new behavioral eval scenarios that assert real on-disk effects
  • eval/fromscratch/ — held-out-oracle capability harness (ported)

Verification

  • cargo build --workspace clean, 0 warnings on changed crates.
  • Unit suite: 1717 passed (2 failures are a sandbox artifact — /tmp is itself a git repo in CI; they pass under a non-git TMPDIR).
  • Behavioral eval: 16/16 through the real agent loop (scripted LLM). The 5 new scenarios assert real file effects (ast_edit rewrites a file, repo_wiki writes the doc, etc.) — this is what caught a real routing bug where the new tools were registered but never offered to the model.
  • From-scratch harness, run live: deepseek-v4-flash built a complete bencode encoder/decoder from a spec; the held-out acceptance test passed 5/5.
  • Live multi-model verification: executor deepseek-v4-flash + a 4-model council (deepseek-v4-pro, z-ai/glm-5.1, minimax/minimax-m3, moonshotai/kimi-k2.6) — all confirmed responding (authoritative: cost ledger + council_positions table). ~$0.14 total.

Notes for the reviewer

  • Council cap 3 → 4 is included intentionally (raises council size, ~33% more cost per consult). Easy to revert to config-only if undesired.
  • Not yet exercised end-to-end (unit-tested + isolated, low blast radius): fork, declarative crews, watch, expand_context, conversational approvals. Prompt-cache anchoring and MMR recall are hot-path changes validated by structure/unit tests but not against live external behavior.
  • Adds 3 dependencies (tree-sitter, tree-sitter-rust, streaming-iterator) for ast_edit.

Test plan

  • cargo build --workspace
  • cargo test --workspace (engine 1717 pass; 2 env-only)
  • make eval → 16/16
  • eval/fromscratch/run.sh bencode (live) → 5/5 held-out
  • live executor + 4-model council run
## Summary Ports a batch of ideas from xmonader's personal `shrimp` agent (`~/xmoncode/shrimp`) into `hero_shrimp` — **all of Tier 1/2/3** from the comparison write-up. Each item is adapted to hero_shrimp's architecture (not a copy — upstream is a different crate layout), wired into the engine/runtime/CLI, and unit-tested. Full details: `docs/ports-from-xmoncode-shrimp.md`. ## What's included **New tools (registered + routed):** - `repo_wiki` — drift-tracked `ARCHITECTURE.md` from the repo map - `find_clones` — near-duplicate function bodies (token-bag cosine) - `impacted_tests` — tests depending on changed files (via `blast_radius`) - `ast_edit` — tree-sitter symbol replacement (Rust) with a post-parse rollback gate - `expand_context` — retrieve the full text of an elided tool output on demand - `fork` — best-of-N candidate race in isolated git worktrees - `mcp_search` — BM25 ranking + name-resolve over MCP tools - `skill_evolve` — deterministic skill minting from recurring success patterns **Behavior / hot paths:** - loop-detection cold-start exploration grace - Anthropic prompt cache anchored on the last *stable* (assistant) message - per-server MCP circuit breaker - RRF + MMR diversity re-rank in memory recall - per-segment shell grant keys in the session approval cache - conversational approve-over-chat (Telegram) + reject-with-feedback - declarative file-defined crews (dependency-wave DAG + typed handoffs) - macOS Seatbelt (`sandbox-exec`) shell backend - typed `llm:delta` → `MessagePartial` at the client edge - council raised 3 → 4 members (`MAX_COUNCILORS` + tier clamp) - new tools wired into `tool_routing` groups **Harnesses:** - 5 new behavioral eval scenarios that assert **real on-disk effects** - `eval/fromscratch/` — held-out-oracle capability harness (ported) ## Verification - `cargo build --workspace` clean, **0 warnings** on changed crates. - Unit suite: **1717 passed** (2 failures are a sandbox artifact — `/tmp` is itself a git repo in CI; they pass under a non-git `TMPDIR`). - Behavioral eval: **16/16** through the real agent loop (scripted LLM). The 5 new scenarios assert real file effects (`ast_edit` rewrites a file, `repo_wiki` writes the doc, etc.) — this is what caught a real routing bug where the new tools were registered but never offered to the model. - **From-scratch harness, run live:** `deepseek-v4-flash` built a complete bencode encoder/decoder from a spec; the held-out acceptance test passed **5/5**. - **Live multi-model verification:** executor `deepseek-v4-flash` + a 4-model council (`deepseek-v4-pro`, `z-ai/glm-5.1`, `minimax/minimax-m3`, `moonshotai/kimi-k2.6`) — all confirmed responding (authoritative: cost ledger + `council_positions` table). ~$0.14 total. ## Notes for the reviewer - **Council cap 3 → 4** is included intentionally (raises council size, ~33% more cost per consult). Easy to revert to config-only if undesired. - Not yet exercised end-to-end (unit-tested + isolated, low blast radius): `fork`, declarative crews, `watch`, `expand_context`, conversational approvals. Prompt-cache anchoring and MMR recall are hot-path changes validated by structure/unit tests but not against live external behavior. - Adds 3 dependencies (`tree-sitter`, `tree-sitter-rust`, `streaming-iterator`) for `ast_edit`. ## Test plan - [x] `cargo build --workspace` - [x] `cargo test --workspace` (engine 1717 pass; 2 env-only) - [x] `make eval` → 16/16 - [x] `eval/fromscratch/run.sh bencode` (live) → 5/5 held-out - [x] live executor + 4-model council run
Merge pull request 'update main' (#83) from development into main
All checks were successful
Build Linux / build-linux (push) Successful in 12m16s
Verify / verify (push) Successful in 38m10s
7da7d6f587
Reviewed-on: #83
chore: build on main — hero_lifecycle factor-out + herolib_openrpc, CI 1.96
All checks were successful
Build Linux / build-linux (push) Successful in 4m59s
Verify / verify (push) Successful in 32m12s
5644285cce
feat: port Tier 1/2/3 learnings from xmoncode/shrimp
Some checks failed
Verify / verify (push) Failing after 21s
bf6c279992
Adapts a batch of ideas from xmonader's personal `shrimp` agent into
hero_shrimp, each wired into the engine/runtime/CLI and unit-tested.
Workspace builds clean (0 warnings); behavioral eval 16/16; live-verified
against deepseek-v4-flash (executor) + a 4-model council (deepseek-v4-pro,
z-ai/glm-5.1, minimax/minimax-m3, moonshotai/kimi-k2.6).

New tools (registered + routed):
- repo_wiki        drift-tracked ARCHITECTURE.md from the repo map
- find_clones      near-dup function bodies (token-bag cosine)
- impacted_tests   tests depending on changed files (blast_radius)
- ast_edit         tree-sitter symbol replacement (Rust) + rollback gate
- expand_context   retrieve full elided tool output on demand
- fork             best-of-N candidate race in git worktrees
- mcp_search       BM25 ranking + name-resolve over MCP tools
- skill_evolve     deterministic skill minting from success patterns

Behavior / hot paths:
- loop-detection cold-start exploration grace
- Anthropic prompt cache anchored on the last stable (assistant) message
- per-server MCP circuit breaker
- RRF+MMR diversity re-rank in memory recall
- per-segment shell grant keys in the session approval cache
- conversational approve-over-chat (Telegram) + reject-with-feedback
- declarative file-defined crews (dependency-wave DAG + typed handoffs)
- macOS Seatbelt (sandbox-exec) shell backend
- typed llm:delta -> MessagePartial at the client edge
- council raised 3 -> 4 members (MAX_COUNCILORS + tier clamp)
- new tools wired into tool_routing groups

Harnesses:
- 5 new behavioral eval scenarios that assert real on-disk effects
- eval/fromscratch/ held-out-oracle capability harness (ported)

Docs: docs/ports-from-xmoncode-shrimp.md

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
thabeta changed title from port-from-xmoncode-shrimp to Port Tier 1/2/3 learnings from xmoncode/shrimp 2026-06-04 23:23:31 +00:00
Some checks failed
Verify / verify (push) Failing after 21s
This pull request can be merged automatically.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin port-from-xmoncode-shrimp:port-from-xmoncode-shrimp
git switch port-from-xmoncode-shrimp

Merge

Merge the changes and update on Forgejo.

Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.

git switch integration
git merge --no-ff port-from-xmoncode-shrimp
git switch port-from-xmoncode-shrimp
git rebase integration
git switch integration
git merge --ff-only port-from-xmoncode-shrimp
git switch port-from-xmoncode-shrimp
git rebase integration
git switch integration
git merge --no-ff port-from-xmoncode-shrimp
git switch integration
git merge --squash port-from-xmoncode-shrimp
git switch integration
git merge --ff-only port-from-xmoncode-shrimp
git switch integration
git merge port-from-xmoncode-shrimp
git push origin integration
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_shrimp!90
No description provided.