# HeroDB Examples This directory contains examples demonstrating HeroDB's capabilities including full-text search powered by Tantivy and vector database operations using Lance. ## Available Examples 1. **[Tantivy Search Demo](#tantivy-search-demo-bash-script)** - Full-text search capabilities 2. **[Lance Vector Database Demo](#lance-vector-database-demo-bash-script)** - Vector database and AI operations 3. **[AGE Encryption Demo](age_bash_demo.sh)** - Cryptographic operations 4. **[Simple Demo](simple_demo.sh)** - Basic Redis operations --- ## Lance Vector Database Demo (Bash Script) ### Overview The `lance_vector_demo.sh` script provides a comprehensive demonstration of HeroDB's vector database capabilities using Lance. It showcases vector storage, similarity search, multimodal data handling, and AI-powered operations with external embedding services. ### Prerequisites 1. **HeroDB Server**: The server must be running (default port 6379) 2. **Redis CLI**: The `redis-cli` tool must be installed and available in your PATH 3. **Embedding Service** (optional): For full functionality, set up an external embedding service ### Running the Demo #### Step 1: Start HeroDB Server ```bash # From the project root directory cargo run -- --dir ./test_data --port 6379 ``` #### Step 2: Run the Demo (in a new terminal) ```bash # From the project root directory ./examples/lance_vector_demo.sh ``` ### What the Demo Covers The script demonstrates comprehensive vector database operations: 1. **Dataset Management** - Creating vector datasets with custom dimensions - Defining schemas with metadata fields - Listing and inspecting datasets - Dataset information and statistics 2. **Embedding Operations** - Text embedding generation via external services - Multimodal embedding support (text + images) - Batch embedding operations 3. **Data Storage** - Storing text documents with automatic embedding - Storing images with metadata - Multimodal content storage - Rich metadata support 4. **Vector Search** - Similarity search with raw vectors - Text-based semantic search - Configurable search parameters (K, NPROBES, REFINE) - Cross-modal search capabilities 5. **Index Management** - Creating IVF_PQ indexes for performance - Custom index parameters - Performance optimization 6. **Advanced Features** - Error handling and recovery - Performance testing concepts - Monitoring and maintenance - Cleanup operations ### Key Lance Commands Demonstrated #### Dataset Management ```bash # Create vector dataset LANCE CREATE documents DIM 384 # Create dataset with schema LANCE CREATE products DIM 768 SCHEMA category:string price:float available:bool # List datasets LANCE LIST # Get dataset information LANCE INFO documents ``` #### Data Operations ```bash # Store text with metadata LANCE STORE documents TEXT "Machine learning tutorial" category "education" author "John Doe" # Store image with metadata LANCE STORE images IMAGE "base64_encoded_image..." filename "photo.jpg" tags "nature,landscape" # Store multimodal content LANCE STORE content TEXT "Product description" IMAGE "base64_image..." type "product" ``` #### Search Operations ```bash # Search with raw vector LANCE SEARCH documents VECTOR "0.1,0.2,0.3,0.4" K 5 # Semantic text search LANCE SEARCH.TEXT documents "artificial intelligence" K 10 NPROBES 20 # Generate embeddings LANCE EMBED.TEXT "Hello world" "Machine learning" ``` #### Index Management ```bash # Create performance index LANCE CREATE.INDEX documents IVF_PQ PARTITIONS 256 SUBVECTORS 16 # Drop dataset LANCE DROP old_dataset ``` ### Configuration #### Setting Up Embedding Service ```bash # Configure embedding service URL redis-cli HSET config:core:aiembed url "http://your-embedding-service:8080/embed" # Optional: Set authentication token redis-cli HSET config:core:aiembed token "your-api-token" ``` #### Embedding Service API Your embedding service should accept POST requests: ```json { "texts": ["text1", "text2"], "images": ["base64_image1", "base64_image2"], "model": "your-model-name" } ``` And return responses: ```json { "embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]], "model": "model-name", "usage": {"tokens": 100, "requests": 2} } ``` ### Interactive Features The demo script includes: - **Colored output** for better readability - **Step-by-step execution** with explanations - **Error handling** demonstrations - **Automatic cleanup** options - **Performance testing** concepts - **Real-world usage** examples ### Use Cases Demonstrated 1. **Document Search System** - Semantic document retrieval - Metadata filtering - Relevance ranking 2. **Image Similarity Search** - Visual content matching - Tag-based filtering - Multimodal queries 3. **Product Recommendations** - Feature-based similarity - Category filtering - Price range queries 4. **Content Management** - Mixed media storage - Cross-modal search - Rich metadata support --- ## Tantivy Search Demo (Bash Script) ### Overview The `tantivy_search_demo.sh` script provides a comprehensive demonstration of HeroDB's search functionality using Redis commands. It showcases various search scenarios including basic text search, filtering, sorting, geographic queries, and more. ### Prerequisites 1. **HeroDB Server**: The server must be running on port 6381 2. **Redis CLI**: The `redis-cli` tool must be installed and available in your PATH ### Running the Demo #### Step 1: Start HeroDB Server ```bash # From the project root directory cargo run -- --port 6381 ``` #### Step 2: Run the Demo (in a new terminal) ```bash # From the project root directory ./examples/tantivy_search_demo.sh ``` ### What the Demo Covers The script demonstrates 15 different search scenarios: 1. **Index Creation** - Creating a search index with various field types 2. **Data Insertion** - Adding sample products to the index 3. **Basic Text Search** - Simple keyword searches 4. **Filtered Search** - Combining text search with category filters 5. **Numeric Range Search** - Finding products within price ranges 6. **Sorting Results** - Ordering results by different fields 7. **Limited Results** - Pagination and result limiting 8. **Complex Queries** - Multi-field searches with sorting 9. **Geographic Search** - Location-based queries 10. **Index Information** - Getting statistics about the search index 11. **Search Comparison** - Tantivy vs simple pattern matching 12. **Fuzzy Search** - Typo tolerance and approximate matching 13. **Phrase Search** - Exact phrase matching 14. **Boolean Queries** - AND, OR, NOT operators 15. **Cleanup** - Removing test data ### Sample Data The demo uses a product catalog with the following fields: - **title** (TEXT) - Product name with higher search weight - **description** (TEXT) - Detailed product description - **category** (TAG) - Comma-separated categories - **price** (NUMERIC) - Product price for range queries - **rating** (NUMERIC) - Customer rating for sorting - **location** (GEO) - Geographic coordinates for location searches ### Key Redis Commands Demonstrated #### Index Management ```bash # Create search index FT.CREATE product_catalog ON HASH PREFIX 1 product: SCHEMA title TEXT WEIGHT 2.0 SORTABLE description TEXT category TAG SEPARATOR , price NUMERIC SORTABLE rating NUMERIC SORTABLE location GEO # Get index information FT.INFO product_catalog # Drop index FT.DROPINDEX product_catalog ``` #### Search Queries ```bash # Basic text search FT.SEARCH product_catalog wireless # Filtered search FT.SEARCH product_catalog 'organic @category:{food}' # Numeric range FT.SEARCH product_catalog '@price:[50 150]' # Sorted results FT.SEARCH product_catalog '@category:{electronics}' SORTBY price ASC # Geographic search FT.SEARCH product_catalog '@location:[37.7749 -122.4194 50 km]' # Boolean queries FT.SEARCH product_catalog 'wireless AND audio' FT.SEARCH product_catalog 'coffee OR tea' # Phrase search FT.SEARCH product_catalog '"noise canceling"' ``` ### Interactive Features The demo script includes: - **Colored output** for better readability - **Pause between steps** to review results - **Error handling** with clear error messages - **Automatic cleanup** of test data - **Progress indicators** showing what each step demonstrates ### Troubleshooting #### HeroDB Not Running ``` ✗ HeroDB is not running on port 6381 ℹ Please start HeroDB with: cargo run -- --port 6381 ``` **Solution**: Start the HeroDB server in a separate terminal. #### Redis CLI Not Found ``` redis-cli: command not found ``` **Solution**: Install Redis tools or use an alternative Redis client. #### Connection Refused ``` Could not connect to Redis at localhost:6381: Connection refused ``` **Solution**: Ensure HeroDB is running and listening on the correct port. ### Manual Testing You can also run individual commands manually: ```bash # Connect to HeroDB redis-cli -h localhost -p 6381 # Create a simple index FT.CREATE myindex ON HASH SCHEMA title TEXT description TEXT # Add a document HSET doc:1 title "Hello World" description "This is a test document" # Search FT.SEARCH myindex hello ``` ### Performance Notes - **Indexing**: Documents are indexed in real-time as they're added - **Search Speed**: Full-text search is much faster than pattern matching on large datasets - **Memory Usage**: Tantivy indexes are memory-efficient and disk-backed - **Scalability**: Supports millions of documents with sub-second search times ### Advanced Features The demo showcases advanced Tantivy features: - **Relevance Scoring** - Results ranked by relevance - **Fuzzy Matching** - Handles typos and approximate matches - **Field Weighting** - Title field has higher search weight - **Multi-field Search** - Search across multiple fields simultaneously - **Geographic Queries** - Distance-based location searches - **Numeric Ranges** - Efficient range queries on numeric fields - **Tag Filtering** - Fast categorical filtering ### Next Steps After running the demo, explore: 1. **Custom Schemas** - Define your own field types and configurations 2. **Large Datasets** - Test with thousands or millions of documents 3. **Real Applications** - Integrate search into your applications 4. **Performance Tuning** - Optimize for your specific use case For more information, see the [search documentation](../herodb/docs/search.md).