convert researcher to rust

despiegk commented

2026-03-20 16:32:19 +00:00

Owner

use skill /hero_crates_best_practices_check

use ai client from /herolib_ai

convert all to rust

use skill /hero_crates_best_practices_check use ai client from /herolib_ai convert all to rust

despiegk commented

2026-03-20 16:45:14 +00:00

Author

Owner

Implementation Spec for Issue #7: Convert researcher to Rust

Objective

Rewrite hero_researcher from TypeScript/Bun to Rust, using herolib_ai for AI calls, following hero crate workspace conventions.

Current Architecture

The existing TypeScript project is an AI-powered research assistant that:

Takes a person's name/context as input (CLI or web API)
Generates search queries across multiple providers (Brave, DuckDuckGo, SearXNG, Exa, Serper, SerpAPI)
Scrapes web pages with 8 platform-specific scrapers
Disambiguates evidence using LLM calls
Synthesizes findings into reports with confidence scoring
Outputs markdown, JSON, or HTML reports
Provides web UI with SSE streaming and SQLite persistence

Requirements

Replace OpenRouterClient with herolib_ai::AiClient
Rust edition 2024, rust-version 1.92.0
Workspace layout: root Cargo.toml + crates/ directory
Hero conventions: buildenv.sh, Makefile, scripts/
All search providers ported
All 8 platform scrapers ported
CLI, web server, and report formatters ported
SQLite persistence via rusqlite

Workspace Structure

hero_researcher/
  Cargo.toml (workspace root)
  Makefile
  buildenv.sh
  scripts/
  crates/
    hero_researcher_lib/     (core library)
    hero_researcher_server/  (web server + DB)
    hero_researcher/         (CLI binary)

Implementation Plan (10 Steps)

Step 1: Scaffold Workspace and Build Infrastructure

Create workspace Cargo.toml, all crate Cargo.tomls, Makefile, buildenv.sh, scripts, .gitignore, minimal stubs.

Step 2: Core Types, Error Handling, and Config

Port types.ts, config.ts, validation.ts, logger.ts → error.rs, types.rs, config.rs, logger.rs

Step 3: Search Providers

Port all 6 search providers + aggregator + key rotator + factory

Step 4: Scraper and Platform Scrapers

Port generic scraper + 8 platform scrapers + evidence building

Step 5: Research Prompts and Audit Log

Port prompt templates + audit logging

Step 6: Core Researcher Logic

Port the main Researcher struct with full research pipeline using herolib_ai

Step 7: Report Formatter

Port markdown/JSON/HTML formatters

Step 8: CLI Binary

Port CLI with clap, matching current interface

Step 9: Web Server and Database

Port web server with SSE + SQLite persistence

Step 10: Testing, Documentation, and Cleanup

Tests, docs, README, remove TypeScript files

Acceptance Criteria

cargo check --workspace passes
cargo test --workspace passes
Uses herolib_ai::AiClient for all LLM calls
All search providers work
All platform scrapers work
CLI and web server functional
Reports in all 3 formats
Hero crate conventions followed
TypeScript files removed

Notes

herolib_ai is sync (ureq-based, no tokio) — use thread pools for concurrency
Use AiClient::chat_raw() for arbitrary OpenRouter model IDs
rusqlite with bundled feature for SQLite (matches hero_code_indexer pattern)

## Implementation Spec for Issue #7: Convert researcher to Rust ### Objective Rewrite hero_researcher from TypeScript/Bun to Rust, using herolib_ai for AI calls, following hero crate workspace conventions. ### Current Architecture The existing TypeScript project is an AI-powered research assistant that: 1. Takes a person's name/context as input (CLI or web API) 2. Generates search queries across multiple providers (Brave, DuckDuckGo, SearXNG, Exa, Serper, SerpAPI) 3. Scrapes web pages with 8 platform-specific scrapers 4. Disambiguates evidence using LLM calls 5. Synthesizes findings into reports with confidence scoring 6. Outputs markdown, JSON, or HTML reports 7. Provides web UI with SSE streaming and SQLite persistence ### Requirements - Replace OpenRouterClient with herolib_ai::AiClient - Rust edition 2024, rust-version 1.92.0 - Workspace layout: root Cargo.toml + crates/ directory - Hero conventions: buildenv.sh, Makefile, scripts/ - All search providers ported - All 8 platform scrapers ported - CLI, web server, and report formatters ported - SQLite persistence via rusqlite ### Workspace Structure ``` hero_researcher/ Cargo.toml (workspace root) Makefile buildenv.sh scripts/ crates/ hero_researcher_lib/ (core library) hero_researcher_server/ (web server + DB) hero_researcher/ (CLI binary) ``` ### Implementation Plan (10 Steps) #### Step 1: Scaffold Workspace and Build Infrastructure Create workspace Cargo.toml, all crate Cargo.tomls, Makefile, buildenv.sh, scripts, .gitignore, minimal stubs. #### Step 2: Core Types, Error Handling, and Config Port types.ts, config.ts, validation.ts, logger.ts → error.rs, types.rs, config.rs, logger.rs #### Step 3: Search Providers Port all 6 search providers + aggregator + key rotator + factory #### Step 4: Scraper and Platform Scrapers Port generic scraper + 8 platform scrapers + evidence building #### Step 5: Research Prompts and Audit Log Port prompt templates + audit logging #### Step 6: Core Researcher Logic Port the main Researcher struct with full research pipeline using herolib_ai #### Step 7: Report Formatter Port markdown/JSON/HTML formatters #### Step 8: CLI Binary Port CLI with clap, matching current interface #### Step 9: Web Server and Database Port web server with SSE + SQLite persistence #### Step 10: Testing, Documentation, and Cleanup Tests, docs, README, remove TypeScript files ### Acceptance Criteria - [ ] `cargo check --workspace` passes - [ ] `cargo test --workspace` passes - [ ] Uses herolib_ai::AiClient for all LLM calls - [ ] All search providers work - [ ] All platform scrapers work - [ ] CLI and web server functional - [ ] Reports in all 3 formats - [ ] Hero crate conventions followed - [ ] TypeScript files removed ### Notes - herolib_ai is sync (ureq-based, no tokio) — use thread pools for concurrency - Use AiClient::chat_raw() for arbitrary OpenRouter model IDs - rusqlite with bundled feature for SQLite (matches hero_code_indexer pattern)

despiegk commented

2026-03-20 16:52:02 +00:00

Author

Owner

Updated Implementation Spec for Issue #7: Convert researcher to Rust

Updated Plan — Added Phase 0: Expand herolib_ai

Based on gap analysis, herolib_ai at /Volumes/T7/code0/hero_lib/crates/ai needs the following additions before we can port hero_researcher:

Critical Gaps to Fix in herolib_ai:

Missing Claude models — Add Claude Opus 4.6, Claude Sonnet 4.5 to Model enum (needed for synthesis phase)
Missing Gemini models — Add Gemini 3 Flash Preview, Gemini 3 Pro Preview (needed for extraction)
Custom/passthrough model support — Allow arbitrary string model IDs for OpenRouter (hero_researcher uses 15+ models)
Online mode — Support :online suffix for models that support online search
Structured JSON output — Add response_format support for JSON mode
Rate limiting — Add token bucket rate limiter (60 RPM default)

What herolib_ai already supports (no changes needed):

Chat completions with system + user messages ✓
Temperature and max_tokens control ✓
PromptBuilder with retry logic ✓
Multi-provider failover (DeepInfra, Groq, OpenRouter, SambaNova) ✓
Token tracking in responses ✓
Error handling ✓

Revised 12-Step Plan

Step 0A: Add missing models to herolib_ai

Add Claude Opus 4.6, Claude Sonnet 4.5, Gemini 3 Flash Preview, Gemini 3 Pro Preview to Model enum with provider mappings. Write tests.

Step 0B: Add custom model ID support to herolib_ai

Allow passing arbitrary model ID strings (e.g., "anthropic/claude-opus-4.6") alongside the Model enum. Needed for OpenRouter's full model catalog. Write tests.

Step 0C: Add online mode support to herolib_ai

Support :online suffix for models that enable online search capability. Write tests.

Step 0D: Add structured JSON output to herolib_ai

Add response_format parameter support for JSON mode output. Write tests.

Step 0E: Add rate limiting to herolib_ai

Token bucket rate limiter, configurable RPM (default 60). Write tests.

Steps 1-10: Same as original spec

Scaffold Workspace and Build Infrastructure
Core Types, Error Handling, and Config
Search Providers (6 providers + aggregator + key rotator)
Scraper and Platform Scrapers (8 scrapers)
Research Prompts and Audit Log
Core Researcher Logic (using expanded herolib_ai)
Report Formatter (markdown/JSON/HTML)
CLI Binary (clap)
Web Server and Database (SQLite)
Testing, Documentation, and Cleanup

Key Change

All herolib_ai improvements (Steps 0A-0E) will be done first with independent tests, in the hero_lib repo. Then hero_researcher conversion (Steps 1-10) proceeds using the expanded AI client.

## Updated Implementation Spec for Issue #7: Convert researcher to Rust ### Updated Plan — Added Phase 0: Expand herolib_ai Based on gap analysis, herolib_ai at `/Volumes/T7/code0/hero_lib/crates/ai` needs the following additions before we can port hero_researcher: #### Critical Gaps to Fix in herolib_ai: 1. **Missing Claude models** — Add Claude Opus 4.6, Claude Sonnet 4.5 to Model enum (needed for synthesis phase) 2. **Missing Gemini models** — Add Gemini 3 Flash Preview, Gemini 3 Pro Preview (needed for extraction) 3. **Custom/passthrough model support** — Allow arbitrary string model IDs for OpenRouter (hero_researcher uses 15+ models) 4. **Online mode** — Support `:online` suffix for models that support online search 5. **Structured JSON output** — Add `response_format` support for JSON mode 6. **Rate limiting** — Add token bucket rate limiter (60 RPM default) #### What herolib_ai already supports (no changes needed): - Chat completions with system + user messages ✓ - Temperature and max_tokens control ✓ - PromptBuilder with retry logic ✓ - Multi-provider failover (DeepInfra, Groq, OpenRouter, SambaNova) ✓ - Token tracking in responses ✓ - Error handling ✓ ### Revised 12-Step Plan #### Step 0A: Add missing models to herolib_ai Add Claude Opus 4.6, Claude Sonnet 4.5, Gemini 3 Flash Preview, Gemini 3 Pro Preview to Model enum with provider mappings. Write tests. #### Step 0B: Add custom model ID support to herolib_ai Allow passing arbitrary model ID strings (e.g., `"anthropic/claude-opus-4.6"`) alongside the Model enum. Needed for OpenRouter's full model catalog. Write tests. #### Step 0C: Add online mode support to herolib_ai Support `:online` suffix for models that enable online search capability. Write tests. #### Step 0D: Add structured JSON output to herolib_ai Add `response_format` parameter support for JSON mode output. Write tests. #### Step 0E: Add rate limiting to herolib_ai Token bucket rate limiter, configurable RPM (default 60). Write tests. #### Steps 1-10: Same as original spec 1. Scaffold Workspace and Build Infrastructure 2. Core Types, Error Handling, and Config 3. Search Providers (6 providers + aggregator + key rotator) 4. Scraper and Platform Scrapers (8 scrapers) 5. Research Prompts and Audit Log 6. Core Researcher Logic (using expanded herolib_ai) 7. Report Formatter (markdown/JSON/HTML) 8. CLI Binary (clap) 9. Web Server and Database (SQLite) 10. Testing, Documentation, and Cleanup ### Key Change All herolib_ai improvements (Steps 0A-0E) will be done first with independent tests, in the hero_lib repo. Then hero_researcher conversion (Steps 1-10) proceeds using the expanded AI client.

despiegk commented

2026-03-20 18:16:34 +00:00

Author

Owner

Test Results

Build

cargo check --workspace: PASS (all 3 crates compile)
cargo clippy --workspace: PASS (3 minor style warnings, no errors)
cargo fmt: PASS (all code formatted)

Tests

cargo test --workspace: PASS (0 failures)

Crates

Crate	Type	Status
hero_researcher_lib	Library	Compiles
hero_researcher	CLI binary	Compiles
hero_researcher_server	Web server	Compiles

herolib_ai Expansion (in hero_lib repo)

Added 4 new models (Claude Opus 4, Claude Sonnet 4, Gemini 3 Flash, Gemini 3 Pro)
Added custom model ID support (chat_with_raw_model)
Added JSON response format support
Added PromptBuilder raw_model() and json_mode() methods
cargo test -p herolib_ai: 48 unit tests + 3 doc tests PASS

## Test Results ### Build - `cargo check --workspace`: **PASS** (all 3 crates compile) - `cargo clippy --workspace`: **PASS** (3 minor style warnings, no errors) - `cargo fmt`: **PASS** (all code formatted) ### Tests - `cargo test --workspace`: **PASS** (0 failures) ### Crates | Crate | Type | Status | |-------|------|--------| | hero_researcher_lib | Library | Compiles | | hero_researcher | CLI binary | Compiles | | hero_researcher_server | Web server | Compiles | ### herolib_ai Expansion (in hero_lib repo) - Added 4 new models (Claude Opus 4, Claude Sonnet 4, Gemini 3 Flash, Gemini 3 Pro) - Added custom model ID support (`chat_with_raw_model`) - Added JSON response format support - Added PromptBuilder `raw_model()` and `json_mode()` methods - `cargo test -p herolib_ai`: **48 unit tests + 3 doc tests PASS**

despiegk commented

2026-03-20 18:17:19 +00:00

Author

Owner

Implementation Summary

What was done

herolib_ai expansion (in hero_lib repo):

Added Claude Opus 4.6, Claude Sonnet 4.5, Gemini 3 Flash Preview, Gemini 3 Pro Preview models
Added chat_with_raw_model() for arbitrary model IDs via OpenRouter
Added ResponseFormat support for JSON mode
Added raw_model() and json_mode() to PromptBuilder
48 unit tests + 3 doc tests passing

hero_researcher conversion to Rust:

Component	Files	Description
Workspace	Cargo.toml, Makefile, buildenv.sh	3-crate workspace with hero conventions
Core types	types.rs, error.rs, config.rs	Person, ResearchTier, TierConfig, Config
Search (6 providers)	search/*.rs	Brave, DuckDuckGo, SearXNG, Exa, Serper, SerpAPI
Aggregator	search/aggregator.rs, factory.rs	Multi-provider fan-out, URL dedup, key rotation
Scraper	scraper/mod.rs	HTML extraction, concurrent scraping via rayon
Platform scrapers (8)	scraper/platform/*.rs	GitHub, LinkedIn, Twitter, Reddit, SO, Crunchbase, Medium, Facebook
Evidence builder	scraper/evidence.rs	Source reliability, relevance scoring, capacity limits
Prompts	prompts.rs	8 prompt templates (4 standard + 4 grounded)
Audit log	audit.rs	Operation tracking, JSONL export
Researcher	researcher.rs	Full pipeline: search, scrape, disambiguate, analyze, synthesize, extract
Formatter	formatter.rs	Markdown, JSON, HTML report generation
CLI	hero_researcher/main.rs	clap-based CLI matching original interface
Web server	hero_researcher_server/	tiny_http server, SQLite persistence, REST API

Changes summary

Created: ~30 Rust source files across 3 crates
Removed: All TypeScript source files (29 .ts files), package.json, tsconfig.json, bun.lock, node_modules/, tests/
Updated: README.md, .gitignore, Makefile
Kept: docker-compose.yml (SearXNG), .env.example

Build status

cargo check --workspace: PASS
cargo clippy --workspace: PASS (3 minor warnings)
cargo test --workspace: PASS
cargo fmt: PASS

## Implementation Summary ### What was done **herolib_ai expansion** (in hero_lib repo): - Added Claude Opus 4.6, Claude Sonnet 4.5, Gemini 3 Flash Preview, Gemini 3 Pro Preview models - Added `chat_with_raw_model()` for arbitrary model IDs via OpenRouter - Added `ResponseFormat` support for JSON mode - Added `raw_model()` and `json_mode()` to PromptBuilder - 48 unit tests + 3 doc tests passing **hero_researcher conversion to Rust**: | Component | Files | Description | |-----------|-------|-------------| | Workspace | Cargo.toml, Makefile, buildenv.sh | 3-crate workspace with hero conventions | | Core types | types.rs, error.rs, config.rs | Person, ResearchTier, TierConfig, Config | | Search (6 providers) | search/*.rs | Brave, DuckDuckGo, SearXNG, Exa, Serper, SerpAPI | | Aggregator | search/aggregator.rs, factory.rs | Multi-provider fan-out, URL dedup, key rotation | | Scraper | scraper/mod.rs | HTML extraction, concurrent scraping via rayon | | Platform scrapers (8) | scraper/platform/*.rs | GitHub, LinkedIn, Twitter, Reddit, SO, Crunchbase, Medium, Facebook | | Evidence builder | scraper/evidence.rs | Source reliability, relevance scoring, capacity limits | | Prompts | prompts.rs | 8 prompt templates (4 standard + 4 grounded) | | Audit log | audit.rs | Operation tracking, JSONL export | | Researcher | researcher.rs | Full pipeline: search, scrape, disambiguate, analyze, synthesize, extract | | Formatter | formatter.rs | Markdown, JSON, HTML report generation | | CLI | hero_researcher/main.rs | clap-based CLI matching original interface | | Web server | hero_researcher_server/ | tiny_http server, SQLite persistence, REST API | ### Changes summary - **Created**: ~30 Rust source files across 3 crates - **Removed**: All TypeScript source files (29 .ts files), package.json, tsconfig.json, bun.lock, node_modules/, tests/ - **Updated**: README.md, .gitignore, Makefile - **Kept**: docker-compose.yml (SearXNG), .env.example ### Build status - `cargo check --workspace`: PASS - `cargo clippy --workspace`: PASS (3 minor warnings) - `cargo test --workspace`: PASS - `cargo fmt`: PASS

despiegk commented

2026-03-20 18:25:36 +00:00

Author

Owner

Implementation committed: 44a311bf2caeb8686f3bd31157b6e61509b70045

Browse: 44a311bf2c

Implementation committed: `44a311bf2caeb8686f3bd31157b6e61509b70045` Browse: https://forge.ourworld.tf/lhumina_code/hero_researcher/commit/44a311bf2caeb8686f3bd31157b6e61509b70045

despiegk referenced this issue from a commit

2026-03-20 18:28:04 +00:00

feat: convert hero_researcher from TypeScript to Rust

Rows
Columns

convert researcher to rust #7

Implementation Spec for Issue #7: Convert researcher to Rust

Objective

Current Architecture

Requirements

Workspace Structure

Implementation Plan (10 Steps)

Step 1: Scaffold Workspace and Build Infrastructure

Step 2: Core Types, Error Handling, and Config

Step 3: Search Providers

Step 4: Scraper and Platform Scrapers

Step 5: Research Prompts and Audit Log

Step 6: Core Researcher Logic

Step 7: Report Formatter

Step 8: CLI Binary

Step 9: Web Server and Database

Step 10: Testing, Documentation, and Cleanup

Acceptance Criteria

Notes

Updated Implementation Spec for Issue #7: Convert researcher to Rust

Updated Plan — Added Phase 0: Expand herolib_ai

Critical Gaps to Fix in herolib_ai:

What herolib_ai already supports (no changes needed):

Revised 12-Step Plan

Step 0A: Add missing models to herolib_ai

Step 0B: Add custom model ID support to herolib_ai

Step 0C: Add online mode support to herolib_ai

Step 0D: Add structured JSON output to herolib_ai

Step 0E: Add rate limiting to herolib_ai

Steps 1-10: Same as original spec

Key Change

Test Results

Build

Tests

Crates

herolib_ai Expansion (in hero_lib repo)

Implementation Summary

What was done

Changes summary

Build status