No description
Find a file
mik-tf d2e1be2f49
Some checks failed
Test / test (push) Failing after 5m44s
ci: trigger Linux build on tags and add test workflow
2026-01-30 12:10:55 -05:00
.cargo refactor: switch herolib path deps to git URL dependencies 2026-01-30 11:24:29 -05:00
.claude chore: Implement Phase 1 & 3 of OpenRPC refactoring - API spec and client library 2026-01-25 10:32:56 +01:00
.forgejo/workflows ci: trigger Linux build on tags and add test workflow 2026-01-30 12:10:55 -05:00
crates/books-client refactor: Rename project from Atlas to Hero Books 2026-01-26 09:27:29 +01:00
docs refactor: Rename binaries to atlas_server/atlas_client with clean separation 2026-01-26 07:05:57 +01:00
examples refactor: Rename binaries to atlas_server/atlas_client with clean separation 2026-01-26 07:05:57 +01:00
sdk/js Add ontology module with Rhai bindings and heroatlas binary 2026-01-21 05:43:41 +01:00
specs chore: Implement Phase 1 & 3 of OpenRPC refactoring - API spec and client library 2026-01-25 10:32:56 +01:00
src chore: rename books_server/books_client -> hero_books/hero_books_client 2026-01-29 18:59:14 -05:00
templates feat: Listen for theme changes from Hero OS parent frame 2026-01-29 14:57:18 -05:00
tosort refactor: switch herolib path deps to git URL dependencies 2026-01-30 11:24:29 -05:00
.gitignore refactor: switch herolib path deps to git URL dependencies 2026-01-30 11:24:29 -05:00
ai_instructions.md Initial commit: AtlasServer Rust implementation 2026-01-19 09:05:23 +01:00
build.rs Add ontology module with Rhai bindings and heroatlas binary 2026-01-21 05:43:41 +01:00
build.sh refactor: Rename project from Atlas to Hero Books 2026-01-26 09:27:29 +01:00
Cargo.toml refactor: switch herolib path deps to git URL dependencies 2026-01-30 11:24:29 -05:00
IMPLEMENTATION_GUIDE.md chore: Implement Phase 1 & 3 of OpenRPC refactoring - API spec and client library 2026-01-25 10:32:56 +01:00
install.sh refactor: Rename project from Atlas to Hero Books 2026-01-26 09:27:29 +01:00
LICENSE Initial commit: AtlasServer Rust implementation 2026-01-19 09:05:23 +01:00
Makefile chore: rename books_server/books_client -> hero_books/hero_books_client 2026-01-29 18:59:14 -05:00
openrpc.json refactor: Rename project from Atlas to Hero Books 2026-01-26 09:27:29 +01:00
OPENRPC_SPEC.md chore: Implement Phase 1 & 3 of OpenRPC refactoring - API spec and client library 2026-01-25 10:32:56 +01:00
PHASE_2_SUMMARY.md docs: Add Phase 2 implementation summary with test results 2026-01-25 10:40:33 +01:00
publish.sh Initial commit: AtlasServer Rust implementation 2026-01-19 09:05:23 +01:00
QUICK_REFERENCE.md chore: Implement Phase 1 & 3 of OpenRPC refactoring - API spec and client library 2026-01-25 10:32:56 +01:00
README.md refactor: Rename project from Atlas to Hero Books 2026-01-26 09:27:29 +01:00
refactor_instructions.md refactor: Rename binaries to atlas_server/atlas_client with clean separation 2026-01-26 07:05:57 +01:00
REFACTORING_STATUS.md refactor: Rename binaries to atlas_server/atlas_client with clean separation 2026-01-26 07:05:57 +01:00
run.sh chore: rename books_server/books_client -> hero_books/hero_books_client 2026-01-29 18:59:14 -05:00
run_slides.sh Initial commit: AtlasServer Rust implementation 2026-01-19 09:05:23 +01:00
WEBSITE.md Initial commit: AtlasServer Rust implementation 2026-01-19 09:05:23 +01:00

Hero Books - Document Management System

A Rust-based document collection management system with CLI, library, and web interfaces for processing markdown-based documentation with support for cross-collection references, link validation, and export to self-contained directories.

Project Structure

This is a single Rust package (hero_books) with multiple entry points:

  • Library (src/lib.rs) - Core functionality for document collection management
  • CLI (src/bin/books.rs) - Command-line interface (books_client binary)
  • Web Server (src/main.rs) - HTTP API and web interface (books_server binary)
  • Modules:
    • doctree - Document tree management and validation
    • website - Website configuration and metadata
    • ebook - Ebook parsing and configuration
    • web - HTTP server handlers and routes
    • cli - CLI command definitions and handlers

Module Documentation

  • DocTree Module - Complete document collection management system

    • Collection scanning and indexing
    • Link parsing and validation
    • Include directive processing
    • Access control management
    • Export to self-contained directories
    • Read-only client API
  • Website Module - Metadata-driven website definitions

    • Website configuration (Docusaurus-style)
    • Navigation bar with dropdown menus
    • Sidebar navigation with multiple sidebars
    • Footer with link columns
    • Page-level metadata and SEO
    • Theme and styling configuration
    • Social links and custom fields
  • Ontology Module - AI-powered semantic extraction

    • Document classification against 10 topic ontologies
    • Semantic concept and relationship extraction
    • Relationship validation with self-correction
    • Embedded ontologies (no external files needed)
    • Chunking support for large documents

Quick Start

# Build the project
make build

# Run the web server
make run

# Run the CLI
make run-cli

# Run in development mode with debug logging
make dev

# See all available commands
make help

Features

  • Collection scanning: Automatically discover collections marked with .collection files
  • Cross-collection references: Link between pages in different collections using collection:page syntax
  • Include directives: Embed content from other pages with !!include collection:page
  • Link validation: Detect broken links to pages, images, and files
  • Export: Generate self-contained directories with all dependencies
  • Access control: Group-based ACL via .group files
  • Git integration: Automatically detect repository URLs

Installation

Build from source

make build

Binaries will be at:

  • target/release/books_client - CLI client
  • target/release/books_server - Web server

Install to PATH

make install

Installs both binaries to ~/hero/bin/:

  • ~/hero/bin/books_client - CLI client
  • ~/hero/bin/books_server - Web server

Development build install (fastest compile):

make installdev

Ensure ~/hero/bin is in your PATH. Add to ~/.bashrc or ~/.zshrc:

export PATH="$HOME/hero/bin:$PATH"

Architecture & Concepts

Separation of Concerns

Hero Books separates content from presentation:

DocTree (Content):

  • Manages markdown collections and pages
  • Validates links and references
  • Tracks files and images
  • Processes include directives
  • Enforces access control

Website (Presentation):

  • Defines navigation structure
  • Configures sidebars and menus
  • Manages theming and styling
  • Handles SEO metadata
  • Provides plugin architecture

This separation allows flexible website layouts without changing content.

Key Concepts

Collections: Directories of markdown pages marked with .collection file

  • Each collection is independently managed
  • Collections can reference each other
  • Access control per collection via ACL files

Pages: Individual markdown files with:

  • Extracted title (from H1 heading)
  • Description (from first paragraph)
  • Parsed internal links and includes
  • Optional front matter metadata

Links: References to pages, images, or files:

  • Same collection: [text](page_name)
  • Cross-collection: [text](collection:page)
  • External: Automatic detection of HTTP(S) URLs
  • Images: Identified by extension

Groups: Access control lists defining user membership

  • Grant read/write access to collections
  • Support wildcards for email patterns
  • Support group inclusion (nested groups)

Export: Self-contained read-only directory:

  • Pages and files organized by collection
  • JSON metadata for each collection
  • Suitable for static hosting or archival

Data Flow

Directory Scan
    ↓
Find Collections (.collection files)
    ↓
Parse Pages (extract metadata, parse links)
    ↓
Validate Links (check references exist)
    ↓
Process Includes (expand !!include directives)
    ↓
Enforce ACL (check group membership)
    ↓
Export (write to structured directory)
    ↓
Read Client (query exported collections)

Refactoring Notes

This project was refactored from a multi-package workspace into a single unified package following Rust best practices:

Previous Structure (workspace with 3 crates):

  • lib/ - atlas-lib library
  • atlas/ - atlas CLI binary
  • web/ - atlas-web server

New Structure (single package with multiple binaries):

  • Single hero_books package in Cargo.toml
  • Two binaries via src/bin/:
    • cli.rsbooks_cli binary (legacy)
    • books.rsbooks_client binary
  • Main server binary: src/main.rsbooks_server binary
  • Modular organization in src/:
    • cli/ - CLI command definitions
    • doctree/ - Core document management
    • ebook/ - Ebook parsing
    • web/ - HTTP server handlers
    • website/ - Website configuration

Benefits:

  • Simpler dependency management
  • Unified build system
  • Easier code sharing between CLI and web server
  • Cleaner project organization
  • Aligned with Rust conventions for monolithic applications

CLI Usage

The books_client CLI talks to a running books_server via OpenRPC.

Start the server first

books_server --port 9567 --books-dir /path/to/docs

Scan for collections

# Scan a local path (server must have access)
books_client scan --path /path/to/docs

# Scan from git repository
books_client scan --git-url https://github.com/user/docs.git

List and inspect collections

# List all collections
books_client list

# Get collection details
books_client get my-collection

# Get all pages in a collection
books_client get-pages my-collection

# Get a specific page
books_client get-page my-collection page-name

Process collections

# Process for Q&A extraction and embeddings
books_client process my-collection

# Force reprocessing
books_client process my-collection --force

Metadata management

# Get collection metadata
books_client get-metadata my-collection

# Set collection metadata
books_client set-metadata my-collection --json '{"key": "value"}'

Server health

# Check server health
books_client health

# View OpenRPC schema
books_client discover

Directory Structure

Source Structure

docs/
├── collection1/
│   ├── .collection           # Marks as collection (optional: name:custom_name)
│   ├── read.acl              # Optional: group names for read access
│   ├── write.acl             # Optional: group names for write access
│   ├── page1.md
│   ├── subdir/
│   │   └── page2.md
│   └── img/
│       └── logo.png
├── collection2/
│   ├── .collection
│   └── intro.md
└── groups/                   # Special collection for ACL groups
    ├── .collection
    ├── admins.group
    └── editors.group

Export Structure

/tmp/books/
├── content/
│   └── collection_name/
│       ├── page1.md          # Pages at root of collection dir
│       ├── page2.md
│       ├── img/              # All images in img/ subdirectory
│       │   └── logo.png
│       └── files/            # All other files in files/ subdirectory
│           └── document.pdf
└── meta/
    └── collection_name.json  # Collection metadata

File Formats

.collection

name:custom_collection_name

If empty or name not specified, uses directory name.

.group

// Comments start with //
user@example.com
*@company.com
include:other_group

ACL files (read.acl, write.acl)

admins
editors

One group name per line.

[text](page_name)           # Same collection
[text](collection:page)     # Cross-collection
![alt](img/image.png)              # Same collection
![alt](collection:img/image.png)   # Cross-collection

Include directives

!!include page_name
!!include collection:page_name

Name Normalization

Page and collection names are normalized:

  1. Convert to lowercase
  2. Replace - with _
  3. Replace / with _
  4. Remove .md extension
  5. Strip numeric prefix (e.g., 03_pagepage)
  6. Remove special characters

Supported Image Extensions

  • .png, .jpg, .jpeg, .gif, .svg, .webp, .bmp, .tiff, .ico

Service Management with Zinit

Hero Books can be registered and managed as a Zinit-managed service with automatic restart, health checks, and port management.

Starting as a Zinit Service

# Start web server as Zinit-managed service
books_server --port 9567 --start

# Start with custom books directory
books_server --port 9567 --books-dir ./books --start

# Multi-instance support
books_server --port 9567 --start --instance prod
books_server --port 9568 --start --instance dev

Service Management

Once started with --start, services are managed by Zinit:

# View service status
zinit status books_server

# View service logs
zinit logs books_server

# Stop service
zinit stop books_server

# Restart service
zinit restart books_server

# Multi-instance commands
zinit status books_server_prod
zinit logs books_server_dev

Service Features

  • Automatic Restart: Service restarts on failure with 5s delay
  • Health Checks: TCP port health checks every 10s
  • Max Restarts: Up to 5 restart attempts before stopping
  • Logging: Full log history available via zinit logs
  • Verification: Defensive self-test to verify successful startup

Error Handling

If service startup fails, Zinit will:

  1. Attempt TCP connection to verify port binding
  2. Check service state and PID
  3. Display recent logs on failure
  4. Clean up failed service registration

Detailed error messages provide diagnostic information:

  • Port already in use
  • Binary path incorrect
  • Zinit server not running
  • Permission denied

Development

Code Quality

This project maintains high code quality standards:

  • Dead Code Cleanup: Unused code is either removed or marked with #[allow(dead_code)] with clear justification:

    • flatten_chapter_pages() - Utility function kept for testing
    • classify_topic() - Public API method reserved for future use
    • embeddings_from_cache - Field used for statistics reporting
  • Compiler Warnings: All compiler warnings in the hero_books crate are resolved (external dependencies only)

Building

# Build release binaries
make build

# Build with debug info (dev mode)
cargo build

# Run tests
make test

# Run all tests including integration tests
make test-all

# Generate documentation
cargo doc --no-deps --open

# Check for compiler warnings
cargo check

Testing

# Run all tests
cargo test

# Run specific module tests
cargo test doctree
cargo test website

# Run with output
cargo test -- --nocapture

Code Organization

src/
├── lib.rs                 # Library exports
├── main.rs               # Web server entry point (books_server binary)
├── bin/
│   ├── books.rs         # CLI entry point (books_client binary)
│   └── cli.rs           # Legacy CLI entry point
├── cli/                  # CLI commands and handlers
├── doctree/              # Document management
├── ebook/                # Ebook parsing
├── ontology/             # AI-powered semantic extraction
├── vectorsdk/            # Vector search and embeddings
├── publishing/           # Publishing configuration
├── book/                 # Book and PDF processing
├── web/                  # HTTP API routes and handlers
└── website/              # Website configuration

Adding New Features

  1. New DocTree functionality: Add to src/doctree/
  2. New Website config: Add to src/website/
  3. New CLI commands: Add to src/cli/mod.rs
  4. New API endpoints: Add to src/web/mod.rs

Library Usage

Ontology Processing

The ontology processor uses AI to classify documents and extract semantic concepts/relationships.

use hero_books::ontology::{OntologyProcessor, ProcessorConfig, ONTOLOGIES};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create processor with default config
    let processor = OntologyProcessor::new();

    let document = "Our SaaS product integrates with Slack...";

    // Classification only (quick)
    let matches = processor.classify(document).await?;
    for m in &matches {
        println!("{}: {} (primary: {})", m.topic, m.score, m.is_primary);
    }

    // Full processing (classification + extraction)
    let result = processor.process(document).await?;

    for sem in &result.semantics {
        println!("{}: {} concepts, {} relationships",
            sem.category, sem.concepts.len(), sem.relationships.len());
    }

    // Direct extraction for specific topics
    let semantics = processor.extract(document, &["product", "technology"]).await?;

    Ok(())
}

Available Topics: business, technology, product, commercial, people, news, legal, financial, health, education

Configuration:

let config = ProcessorConfig {
    confidence_threshold: Some(8),    // Min score to consider (default: 7)
    max_topics: Some(3),              // Limit topics processed
    filter_topics: Some(vec!["product".into(), "technology".into()]),
    temperature: Some(0.0),           // LLM temperature
    max_input_tokens: Some(60_000),   // Chunk if larger
    ..Default::default()
};
let processor = OntologyProcessor::with_config(config);

Requirements: Set an API key environment variable:

  • GROQ_API_KEY (preferred)
  • SAMBANOVA_API_KEY
  • OPENROUTER_API_KEY

See examples/src/ontology_processing.rs for a complete example.

DocTree

use doctree::{DocTree, ExportArgs};

fn main() -> doctree::Result<()> {
    // Create and scan
    let mut doctree = DocTree::new("mydocs");
    doctree.scan(Path::new("/path/to/docs"), &[])?;
    doctree.init_post()?;  // Validate links

    // Access pages
    let page = doctree.page_get("collection:page")?;
    let content = page.content()?;

    // Export
    doctree.export(ExportArgs {
        destination: PathBuf::from("/tmp/books"),
        reset: true,
        include: false,
    })?;

    Ok(())
}

DocTreeClient (for reading exports)

use doctree::DocTreeClient;

fn main() -> doctree::Result<()> {
    let client = DocTreeClient::new(Path::new("/tmp/books"))?;

    // List collections
    let collections = client.list_collections()?;

    // Get page content
    let content = client.get_page_content("collection", "page")?;

    // Check existence
    if client.page_exists("collection", "page") {
        println!("Page exists!");
    }

    Ok(())
}

Directory Structure

atlasserver_rust/
├── Cargo.toml           # Package configuration
├── Makefile             # Build automation (make help to see all targets)
├── build.rs             # Build script
├── README.md            # This file
├── openrpc.json         # OpenRPC 1.3.2 API specification
├── src/
│   ├── lib.rs           # Library entry point and module declarations
│   ├── main.rs          # Web server binary entry point
│   ├── bin/
│   │   ├── books.rs     # CLI binary entry point (books_client)
│   │   └── cli.rs       # Legacy CLI entry point
│   ├── cli/             # CLI commands and handlers
│   ├── doctree/         # Document tree management
│   ├── ebook/           # Ebook parsing
│   ├── ontology/        # AI-powered semantic extraction
│   ├── vectorsdk/       # Vector search and embeddings
│   ├── publishing/      # Publishing configuration
│   ├── book/            # Book and PDF processing
│   ├── web/             # HTTP API routes and handlers
│   └── website/         # Website configuration
├── crates/
│   └── books-client/    # Rust client library for the API
├── examples/            # Example code for library usage
└── target/
    ├── debug/           # Debug builds
    └── release/
        ├── books_client     # CLI binary
        └── books_server     # Web server binary