herodb/examples/README.md

# HeroDB Examples

This directory contains examples demonstrating HeroDB's capabilities including full-text search powered by Tantivy and vector database operations using Lance.

## Available Examples

1. **[Tantivy Search Demo](#tantivy-search-demo-bash-script)** - Full-text search capabilities
2. **[Lance Vector Database Demo](#lance-vector-database-demo-bash-script)** - Vector database and AI operations
3. **[AGE Encryption Demo](age_bash_demo.sh)** - Cryptographic operations
4. **[Simple Demo](simple_demo.sh)** - Basic Redis operations

---

## Lance Vector Database Demo (Bash Script)

### Overview
The `lance_vector_demo.sh` script provides a comprehensive demonstration of HeroDB's vector database capabilities using Lance. It showcases vector storage, similarity search, multimodal data handling, and AI-powered operations with external embedding services.

### Prerequisites
1. **HeroDB Server**: The server must be running (default port 6379)
2. **Redis CLI**: The `redis-cli` tool must be installed and available in your PATH
3. **Embedding Service** (optional): For full functionality, set up an external embedding service

### Running the Demo

#### Step 1: Start HeroDB Server
```bash
# From the project root directory
cargo run -- --dir ./test_data --port 6379
```

#### Step 2: Run the Demo (in a new terminal)
```bash
# From the project root directory
./examples/lance_vector_demo.sh
```

### What the Demo Covers

The script demonstrates comprehensive vector database operations:

1. **Dataset Management**
   - Creating vector datasets with custom dimensions
   - Defining schemas with metadata fields
   - Listing and inspecting datasets
   - Dataset information and statistics

2. **Embedding Operations**
   - Text embedding generation via external services
   - Multimodal embedding support (text + images)
   - Batch embedding operations

3. **Data Storage**
   - Storing text documents with automatic embedding
   - Storing images with metadata
   - Multimodal content storage
   - Rich metadata support

4. **Vector Search**
   - Similarity search with raw vectors
   - Text-based semantic search
   - Configurable search parameters (K, NPROBES, REFINE)
   - Cross-modal search capabilities

5. **Index Management**
   - Creating IVF_PQ indexes for performance
   - Custom index parameters
   - Performance optimization

6. **Advanced Features**
   - Error handling and recovery
   - Performance testing concepts
   - Monitoring and maintenance
   - Cleanup operations

### Key Lance Commands Demonstrated

#### Dataset Management
```bash
# Create vector dataset
LANCE CREATE documents DIM 384

# Create dataset with schema
LANCE CREATE products DIM 768 SCHEMA category:string price:float available:bool

# List datasets
LANCE LIST

# Get dataset information
LANCE INFO documents
```

#### Data Operations
```bash
# Store text with metadata
LANCE STORE documents TEXT "Machine learning tutorial" category "education" author "John Doe"

# Store image with metadata
LANCE STORE images IMAGE "base64_encoded_image..." filename "photo.jpg" tags "nature,landscape"

# Store multimodal content
LANCE STORE content TEXT "Product description" IMAGE "base64_image..." type "product"
```

#### Search Operations
```bash
# Search with raw vector
LANCE SEARCH documents VECTOR "0.1,0.2,0.3,0.4" K 5

# Semantic text search
LANCE SEARCH.TEXT documents "artificial intelligence" K 10 NPROBES 20

# Generate embeddings
LANCE EMBED.TEXT "Hello world" "Machine learning"
```

#### Index Management
```bash
# Create performance index
LANCE CREATE.INDEX documents IVF_PQ PARTITIONS 256 SUBVECTORS 16

# Drop dataset
LANCE DROP old_dataset
```

### Configuration

#### Setting Up Embedding Service
```bash
# Configure embedding service URL
redis-cli HSET config:core:aiembed url "http://your-embedding-service:8080/embed"

# Optional: Set authentication token
redis-cli HSET config:core:aiembed token "your-api-token"
```

#### Embedding Service API
Your embedding service should accept POST requests:
```json
{
  "texts": ["text1", "text2"],
  "images": ["base64_image1", "base64_image2"],
  "model": "your-model-name"
}
```

And return responses:
```json
{
  "embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]],
  "model": "model-name",
  "usage": {"tokens": 100, "requests": 2}
}
```

### Interactive Features

The demo script includes:
- **Colored output** for better readability
- **Step-by-step execution** with explanations
- **Error handling** demonstrations
- **Automatic cleanup** options
- **Performance testing** concepts
- **Real-world usage** examples

### Use Cases Demonstrated

1. **Document Search System**
   - Semantic document retrieval
   - Metadata filtering
   - Relevance ranking

2. **Image Similarity Search**
   - Visual content matching
   - Tag-based filtering
   - Multimodal queries

3. **Product Recommendations**
   - Feature-based similarity
   - Category filtering
   - Price range queries

4. **Content Management**
   - Mixed media storage
   - Cross-modal search
   - Rich metadata support

---

## Tantivy Search Demo (Bash Script)

### Overview
The `tantivy_search_demo.sh` script provides a comprehensive demonstration of HeroDB's search functionality using Redis commands. It showcases various search scenarios including basic text search, filtering, sorting, geographic queries, and more.

### Prerequisites
1. **HeroDB Server**: The server must be running on port 6381
2. **Redis CLI**: The `redis-cli` tool must be installed and available in your PATH

### Running the Demo

#### Step 1: Start HeroDB Server
```bash
# From the project root directory
cargo run -- --port 6381
```

#### Step 2: Run the Demo (in a new terminal)
```bash
# From the project root directory
./examples/tantivy_search_demo.sh
```

### What the Demo Covers

The script demonstrates 15 different search scenarios:

1. **Index Creation** - Creating a search index with various field types
2. **Data Insertion** - Adding sample products to the index
3. **Basic Text Search** - Simple keyword searches
4. **Filtered Search** - Combining text search with category filters
5. **Numeric Range Search** - Finding products within price ranges
6. **Sorting Results** - Ordering results by different fields
7. **Limited Results** - Pagination and result limiting
8. **Complex Queries** - Multi-field searches with sorting
9. **Geographic Search** - Location-based queries
10. **Index Information** - Getting statistics about the search index
11. **Search Comparison** - Tantivy vs simple pattern matching
12. **Fuzzy Search** - Typo tolerance and approximate matching
13. **Phrase Search** - Exact phrase matching
14. **Boolean Queries** - AND, OR, NOT operators
15. **Cleanup** - Removing test data

### Sample Data

The demo uses a product catalog with the following fields:
- **title** (TEXT) - Product name with higher search weight
- **description** (TEXT) - Detailed product description
- **category** (TAG) - Comma-separated categories
- **price** (NUMERIC) - Product price for range queries
- **rating** (NUMERIC) - Customer rating for sorting
- **location** (GEO) - Geographic coordinates for location searches

### Key Redis Commands Demonstrated

#### Index Management
```bash
# Create search index
FT.CREATE product_catalog ON HASH PREFIX 1 product: SCHEMA title TEXT WEIGHT 2.0 SORTABLE description TEXT category TAG SEPARATOR , price NUMERIC SORTABLE rating NUMERIC SORTABLE location GEO

# Get index information
FT.INFO product_catalog

# Drop index
FT.DROPINDEX product_catalog
```

#### Search Queries
```bash
# Basic text search
FT.SEARCH product_catalog wireless

# Filtered search
FT.SEARCH product_catalog 'organic @category:{food}'

# Numeric range
FT.SEARCH product_catalog '@price:[50 150]'

# Sorted results
FT.SEARCH product_catalog '@category:{electronics}' SORTBY price ASC

# Geographic search
FT.SEARCH product_catalog '@location:[37.7749 -122.4194 50 km]'

# Boolean queries
FT.SEARCH product_catalog 'wireless AND audio'
FT.SEARCH product_catalog 'coffee OR tea'

# Phrase search
FT.SEARCH product_catalog '"noise canceling"'
```

### Interactive Features

The demo script includes:
- **Colored output** for better readability
- **Pause between steps** to review results
- **Error handling** with clear error messages
- **Automatic cleanup** of test data
- **Progress indicators** showing what each step demonstrates

### Troubleshooting

#### HeroDB Not Running
```
✗ HeroDB is not running on port 6381
ℹ Please start HeroDB with: cargo run -- --port 6381
```
**Solution**: Start the HeroDB server in a separate terminal.

#### Redis CLI Not Found
```
redis-cli: command not found
```
**Solution**: Install Redis tools or use an alternative Redis client.

#### Connection Refused
```
Could not connect to Redis at localhost:6381: Connection refused
```
**Solution**: Ensure HeroDB is running and listening on the correct port.

### Manual Testing

You can also run individual commands manually:

```bash
# Connect to HeroDB
redis-cli -h localhost -p 6381

# Create a simple index
FT.CREATE myindex ON HASH SCHEMA title TEXT description TEXT

# Add a document
HSET doc:1 title "Hello World" description "This is a test document"

# Search
FT.SEARCH myindex hello
```

### Performance Notes

- **Indexing**: Documents are indexed in real-time as they're added
- **Search Speed**: Full-text search is much faster than pattern matching on large datasets
- **Memory Usage**: Tantivy indexes are memory-efficient and disk-backed
- **Scalability**: Supports millions of documents with sub-second search times

### Advanced Features

The demo showcases advanced Tantivy features:
- **Relevance Scoring** - Results ranked by relevance
- **Fuzzy Matching** - Handles typos and approximate matches
- **Field Weighting** - Title field has higher search weight
- **Multi-field Search** - Search across multiple fields simultaneously
- **Geographic Queries** - Distance-based location searches
- **Numeric Ranges** - Efficient range queries on numeric fields
- **Tag Filtering** - Fast categorical filtering

### Next Steps

After running the demo, explore:
1. **Custom Schemas** - Define your own field types and configurations
2. **Large Datasets** - Test with thousands or millions of documents
3. **Real Applications** - Integrate search into your applications
4. **Performance Tuning** - Optimize for your specific use case

For more information, see the [search documentation](../herodb/docs/search.md).