[nu-demo] hero_agent tools payload blocks every LLM: 165 tools > 128 limit, dots in tool names violate regex, duplicate agent_run #153
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Symptom
When a message is triaged to the
Toolspath (e.g. asking hero_agent to "list files in this workspace") andhero_agentsends the full registered tool list to the backing LLM, every one of 7 configured LLM targets rejects the request:claude-haiku-4.5(via aibroker)tools.60.custom.name: String should match pattern ^[a-zA-Z0-9_-]{1,128}$gpt-4o-mini(via aibroker)Invalid 'tools': array too long... got an array with length 165claude-3.5-sonnetNo endpoints foundgemini-2.5-flashDuplicate function declaration found: agent_rungpt-4o-mini(OpenRouter direct)llama-3.3-70b-versatileResult: every turn that requires a tool fails. The user sees an internal error, not an answer.
Root cause
Once
hero_agentloads/home/driver/hero/var/agent/mcp.jsonpointing at hero_router's/mcp/:serviceendpoints (see sibling #153), each MCP service contributes its full OpenRPC method set:Two class of violations in that payload:
A. Naming regex
The 58+ MCP tools have names like:
hero_osis.contact.listrpc.discoverbooks.searchAnthropic requires
^[a-zA-Z0-9_-]{1,128}$— dots are invalid. OpenAI has the same constraint. Hero's JSON-RPC method naming convention (namespace.verb) is fundamentally incompatible with tool-use naming rules.B. Tool count
OpenAI has a hard limit of 128 tools per request (documented). Anthropic is softer but still pushes back past ~100. 165 is too many regardless.
C. Duplicates
The Gemini adapter emits
agent_runtwice — likely because two different registrations produce the same function name after Gemini's name-mangling step. Probably from the MCP round-trip creating duplicate entries.Proposed fixes
Short term (demo-blocker)
Sanitize tool names before emission:
Apply in
tool_router.rswhen building the OpenAI/Anthropic payload. Preserve the original name internally for dispatch.Cap at 128 with a deterministic ordering:
always_includefirst, then service-tagged ones that match the conversation, then the rest (trimmed). Honor the model's actual limit fromconfig.aibroker_modelsmetadata.Dedup by emitted name after step 1 (naming collisions should be reported, not silently dropped).
Medium term
Per-call tool selection via a matcher (same idea as the existing
groupssystem but actually used): only offer tools whose name prefix or keywords match the user's message. 10-20 relevant tools is plenty.Adapter-specific payloads — currently the llm_client pushes one payload shape to all providers. Each should get its own sanitization + limit.
Long term (proper)
books__search_queryinstead ofhero_books.search.query). Updatehero_router/src/server/mcp.rsso OpenRPC-to-MCP-tool conversion uses underscores.Diagnostic log excerpt
Related
Verification
After fix:
^[a-zA-Z0-9_-]{1,128}$)nameentriesagent.chat({"message": "list files in /tmp", "model": "claude-haiku-4.5"})returns a successful tool call + resultSigned-off-by: mik-tf
make demotarget — provision + install + seed + verify a fresh Hero OS demo VM in one command #163Demo hotfix applied 2026-04-24
hero_agent::mcp_client.rspatched ondevelopment_mik_nu_demo:sanitize_tool_name(raw)helper — replaces.with__(sohero_osis_business.contact.listbecomeshero_osis_business__contact__list), drops other illegal chars to_. Names now match^[a-zA-Z0-9_-]+$.McpToolgainedoriginal_name: Option<String>so calls back to MCP servers use the real name.parse_tools_responsepopulates both names;call_toolresolves either form.agent.rs::route_toolsalready handles the count cap.Verified:
hero_agent_serverrebuilt and running on herodemo. Tool name regex no longer rejects per-domain OSIS tools.Prod-level fix needed
This sanitizer is the demo-time mitigation. The proper fix:
development(not just demo branch).mcp_tools.jsonwhitelist so operators can pin core tools regardless of the relevance score.-separator (contact-listnotcontact.list) per MCP spec recommendations.Tracking the upstream merge separately.
Merged to hero_agent/development 2026-04-25
Squash commit
be302edon hero_agent development:PR #8 has been closed by the merge. The MCP tool-name sanitizer is now upstream — no need for the demo VM hotfix once a fresh deploy uses development everywhere.
Signed-off-by: mik-tf
Follow-up CI fix merged (PR #9)
The initial PR #8 merge (
be302ed) introduced two regressions caught by CI:cargo fmtviolation: inlineif-else expression exceeded line lengthoriginal_namein 13McpTool { ... }initializers acrosstool_router.rs,semantic_router.rs(tests + routing helpers)clippy::while-let-on-iteratorlint onsanitize_tool_name's loopPR #9 (commits
f9add73,5cc3d24) addressed all three.hero_agent/developmentis at5bea19band CI green.The lesson recorded in memory: when adding a required field to a struct used across the codebase, run
cargo build --workspace(not just the target binary) before merging. CI catches this but it should never have shipped.Signed-off-by: mik-tf
Fixed in hero_agent commit
e876a16ondevelopment. All three failure modes from the issue body addressed.A. Naming regex — new
tool_router::sanitize_tool_name(name):Applied at every emission site:
tool_router::ToolRouter::build_schemasagent::Agent::build_schemas_for_namesGuarantees output matches
^[a-zA-Z0-9_-]{1,128}$(regression test asserts against a bad-input matrix).Examples:
hero_osis.contact.listhero_osis_contact_listrpc.discoverrpc_discoverfoo..bar(runs collapse)foo_barhero_osis.(trailing trim)hero_osisachars (truncate)acharsB. Tool count cap + duplicate dedup — new
tool_router::finalize_schemas(schemas, cap):Wired into both
build_schemaspaths so every tool list emitted to an LLM is at most 128 entries with no duplicates. Collisions warn-and-drop (the issue body's requirement: "naming collisions should be reported, not silently dropped").C. Dispatch reverse lookup — new
tool_router::mcp_tool_for_sanitized(name, mcp_tools):In
agent::Agent::agent_loop, when the LLM echoes back a sanitized tool name, agent.rs reverse-looks-up the originalMcpTool.name(with dots) before callingmcp.call_tool. Falls back to the literal name if no match (e.g. tool cache changed mid-conversation, or it was a built-in tool — sanitization is a no-op for snake_case names).Verification:
10 new tests:
sanitize_replaces_dots_with_underscoressanitize_passes_already_clean_namessanitize_collapses_runs_of_invalid_charssanitize_trims_trailing_underscoressanitize_truncates_at_128_bytessanitize_matches_anthropic_regex— regression: every output must match the regex against a representative bad-input setmcp_tool_for_sanitized_finds_originalmcp_tool_for_sanitized_returns_none_for_unknownfinalize_dedupes_by_namefinalize_caps_at_provider_limitcargo fmt --check,cargo check -p hero_agent,cargo clippy --all-targets -- -D warningsall clean. All 107 hero_agent lib tests pass (97 pre-existing + 10 new).Note on the issue body's medium-/long-term suggestions:
mcp_groupssystem; this fix doesn't change that path.tool_choice+ sanitization handles every supported provider's hard constraints; adapter-specific tuning (e.g. Gemini-only quirks beyond name sanitization) can land separately when a specific incompat surfaces.Meta-tracker: home#193.
Signed-off-by: mik-tf
make demotarget — provision + install + seed + verify a fresh Hero OS demo VM in one command #31