Files
herodb/examples/README.md
2025-08-25 06:00:08 +02:00

10 KiB
Raw Blame History

HeroDB Examples

This directory contains examples demonstrating HeroDB's capabilities including full-text search powered by Tantivy and vector database operations using Lance.

Available Examples

  1. Tantivy Search Demo - Full-text search capabilities
  2. Lance Vector Database Demo - Vector database and AI operations
  3. AGE Encryption Demo - Cryptographic operations
  4. Simple Demo - Basic Redis operations

Lance Vector Database Demo (Bash Script)

Overview

The lance_vector_demo.sh script provides a comprehensive demonstration of HeroDB's vector database capabilities using Lance. It showcases vector storage, similarity search, multimodal data handling, and AI-powered operations with external embedding services.

Prerequisites

  1. HeroDB Server: The server must be running (default port 6379)
  2. Redis CLI: The redis-cli tool must be installed and available in your PATH
  3. Embedding Service (optional): For full functionality, set up an external embedding service

Running the Demo

Step 1: Start HeroDB Server

# From the project root directory
cargo run -- --dir ./test_data --port 6379

Step 2: Run the Demo (in a new terminal)

# From the project root directory
./examples/lance_vector_demo.sh

What the Demo Covers

The script demonstrates comprehensive vector database operations:

  1. Dataset Management

    • Creating vector datasets with custom dimensions
    • Defining schemas with metadata fields
    • Listing and inspecting datasets
    • Dataset information and statistics
  2. Embedding Operations

    • Text embedding generation via external services
    • Multimodal embedding support (text + images)
    • Batch embedding operations
  3. Data Storage

    • Storing text documents with automatic embedding
    • Storing images with metadata
    • Multimodal content storage
    • Rich metadata support
  4. Vector Search

    • Similarity search with raw vectors
    • Text-based semantic search
    • Configurable search parameters (K, NPROBES, REFINE)
    • Cross-modal search capabilities
  5. Index Management

    • Creating IVF_PQ indexes for performance
    • Custom index parameters
    • Performance optimization
  6. Advanced Features

    • Error handling and recovery
    • Performance testing concepts
    • Monitoring and maintenance
    • Cleanup operations

Key Lance Commands Demonstrated

Dataset Management

# Create vector dataset
LANCE CREATE documents DIM 384

# Create dataset with schema
LANCE CREATE products DIM 768 SCHEMA category:string price:float available:bool

# List datasets
LANCE LIST

# Get dataset information
LANCE INFO documents

Data Operations

# Store text with metadata
LANCE STORE documents TEXT "Machine learning tutorial" category "education" author "John Doe"

# Store image with metadata
LANCE STORE images IMAGE "base64_encoded_image..." filename "photo.jpg" tags "nature,landscape"

# Store multimodal content
LANCE STORE content TEXT "Product description" IMAGE "base64_image..." type "product"

Search Operations

# Search with raw vector
LANCE SEARCH documents VECTOR "0.1,0.2,0.3,0.4" K 5

# Semantic text search
LANCE SEARCH.TEXT documents "artificial intelligence" K 10 NPROBES 20

# Generate embeddings
LANCE EMBED.TEXT "Hello world" "Machine learning"

Index Management

# Create performance index
LANCE CREATE.INDEX documents IVF_PQ PARTITIONS 256 SUBVECTORS 16

# Drop dataset
LANCE DROP old_dataset

Configuration

Setting Up Embedding Service

# Configure embedding service URL
redis-cli HSET config:core:aiembed url "http://your-embedding-service:8080/embed"

# Optional: Set authentication token
redis-cli HSET config:core:aiembed token "your-api-token"

Embedding Service API

Your embedding service should accept POST requests:

{
  "texts": ["text1", "text2"],
  "images": ["base64_image1", "base64_image2"],
  "model": "your-model-name"
}

And return responses:

{
  "embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]],
  "model": "model-name",
  "usage": {"tokens": 100, "requests": 2}
}

Interactive Features

The demo script includes:

  • Colored output for better readability
  • Step-by-step execution with explanations
  • Error handling demonstrations
  • Automatic cleanup options
  • Performance testing concepts
  • Real-world usage examples

Use Cases Demonstrated

  1. Document Search System

    • Semantic document retrieval
    • Metadata filtering
    • Relevance ranking
  2. Image Similarity Search

    • Visual content matching
    • Tag-based filtering
    • Multimodal queries
  3. Product Recommendations

    • Feature-based similarity
    • Category filtering
    • Price range queries
  4. Content Management

    • Mixed media storage
    • Cross-modal search
    • Rich metadata support

Tantivy Search Demo (Bash Script)

Overview

The tantivy_search_demo.sh script provides a comprehensive demonstration of HeroDB's search functionality using Redis commands. It showcases various search scenarios including basic text search, filtering, sorting, geographic queries, and more.

Prerequisites

  1. HeroDB Server: The server must be running on port 6381
  2. Redis CLI: The redis-cli tool must be installed and available in your PATH

Running the Demo

Step 1: Start HeroDB Server

# From the project root directory
cargo run -- --port 6381

Step 2: Run the Demo (in a new terminal)

# From the project root directory
./examples/tantivy_search_demo.sh

What the Demo Covers

The script demonstrates 15 different search scenarios:

  1. Index Creation - Creating a search index with various field types
  2. Data Insertion - Adding sample products to the index
  3. Basic Text Search - Simple keyword searches
  4. Filtered Search - Combining text search with category filters
  5. Numeric Range Search - Finding products within price ranges
  6. Sorting Results - Ordering results by different fields
  7. Limited Results - Pagination and result limiting
  8. Complex Queries - Multi-field searches with sorting
  9. Geographic Search - Location-based queries
  10. Index Information - Getting statistics about the search index
  11. Search Comparison - Tantivy vs simple pattern matching
  12. Fuzzy Search - Typo tolerance and approximate matching
  13. Phrase Search - Exact phrase matching
  14. Boolean Queries - AND, OR, NOT operators
  15. Cleanup - Removing test data

Sample Data

The demo uses a product catalog with the following fields:

  • title (TEXT) - Product name with higher search weight
  • description (TEXT) - Detailed product description
  • category (TAG) - Comma-separated categories
  • price (NUMERIC) - Product price for range queries
  • rating (NUMERIC) - Customer rating for sorting
  • location (GEO) - Geographic coordinates for location searches

Key Redis Commands Demonstrated

Index Management

# Create search index
FT.CREATE product_catalog ON HASH PREFIX 1 product: SCHEMA title TEXT WEIGHT 2.0 SORTABLE description TEXT category TAG SEPARATOR , price NUMERIC SORTABLE rating NUMERIC SORTABLE location GEO

# Get index information
FT.INFO product_catalog

# Drop index
FT.DROPINDEX product_catalog

Search Queries

# Basic text search
FT.SEARCH product_catalog wireless

# Filtered search
FT.SEARCH product_catalog 'organic @category:{food}'

# Numeric range
FT.SEARCH product_catalog '@price:[50 150]'

# Sorted results
FT.SEARCH product_catalog '@category:{electronics}' SORTBY price ASC

# Geographic search
FT.SEARCH product_catalog '@location:[37.7749 -122.4194 50 km]'

# Boolean queries
FT.SEARCH product_catalog 'wireless AND audio'
FT.SEARCH product_catalog 'coffee OR tea'

# Phrase search
FT.SEARCH product_catalog '"noise canceling"'

Interactive Features

The demo script includes:

  • Colored output for better readability
  • Pause between steps to review results
  • Error handling with clear error messages
  • Automatic cleanup of test data
  • Progress indicators showing what each step demonstrates

Troubleshooting

HeroDB Not Running

✗ HeroDB is not running on port 6381
 Please start HeroDB with: cargo run -- --port 6381

Solution: Start the HeroDB server in a separate terminal.

Redis CLI Not Found

redis-cli: command not found

Solution: Install Redis tools or use an alternative Redis client.

Connection Refused

Could not connect to Redis at localhost:6381: Connection refused

Solution: Ensure HeroDB is running and listening on the correct port.

Manual Testing

You can also run individual commands manually:

# Connect to HeroDB
redis-cli -h localhost -p 6381

# Create a simple index
FT.CREATE myindex ON HASH SCHEMA title TEXT description TEXT

# Add a document
HSET doc:1 title "Hello World" description "This is a test document"

# Search
FT.SEARCH myindex hello

Performance Notes

  • Indexing: Documents are indexed in real-time as they're added
  • Search Speed: Full-text search is much faster than pattern matching on large datasets
  • Memory Usage: Tantivy indexes are memory-efficient and disk-backed
  • Scalability: Supports millions of documents with sub-second search times

Advanced Features

The demo showcases advanced Tantivy features:

  • Relevance Scoring - Results ranked by relevance
  • Fuzzy Matching - Handles typos and approximate matches
  • Field Weighting - Title field has higher search weight
  • Multi-field Search - Search across multiple fields simultaneously
  • Geographic Queries - Distance-based location searches
  • Numeric Ranges - Efficient range queries on numeric fields
  • Tag Filtering - Fast categorical filtering

Next Steps

After running the demo, explore:

  1. Custom Schemas - Define your own field types and configurations
  2. Large Datasets - Test with thousands or millions of documents
  3. Real Applications - Integrate search into your applications
  4. Performance Tuning - Optimize for your specific use case

For more information, see the search documentation.