Files
herodb/examples/README.md
2025-08-25 06:00:08 +02:00

356 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# HeroDB Examples
This directory contains examples demonstrating HeroDB's capabilities including full-text search powered by Tantivy and vector database operations using Lance.
## Available Examples
1. **[Tantivy Search Demo](#tantivy-search-demo-bash-script)** - Full-text search capabilities
2. **[Lance Vector Database Demo](#lance-vector-database-demo-bash-script)** - Vector database and AI operations
3. **[AGE Encryption Demo](age_bash_demo.sh)** - Cryptographic operations
4. **[Simple Demo](simple_demo.sh)** - Basic Redis operations
---
## Lance Vector Database Demo (Bash Script)
### Overview
The `lance_vector_demo.sh` script provides a comprehensive demonstration of HeroDB's vector database capabilities using Lance. It showcases vector storage, similarity search, multimodal data handling, and AI-powered operations with external embedding services.
### Prerequisites
1. **HeroDB Server**: The server must be running (default port 6379)
2. **Redis CLI**: The `redis-cli` tool must be installed and available in your PATH
3. **Embedding Service** (optional): For full functionality, set up an external embedding service
### Running the Demo
#### Step 1: Start HeroDB Server
```bash
# From the project root directory
cargo run -- --dir ./test_data --port 6379
```
#### Step 2: Run the Demo (in a new terminal)
```bash
# From the project root directory
./examples/lance_vector_demo.sh
```
### What the Demo Covers
The script demonstrates comprehensive vector database operations:
1. **Dataset Management**
- Creating vector datasets with custom dimensions
- Defining schemas with metadata fields
- Listing and inspecting datasets
- Dataset information and statistics
2. **Embedding Operations**
- Text embedding generation via external services
- Multimodal embedding support (text + images)
- Batch embedding operations
3. **Data Storage**
- Storing text documents with automatic embedding
- Storing images with metadata
- Multimodal content storage
- Rich metadata support
4. **Vector Search**
- Similarity search with raw vectors
- Text-based semantic search
- Configurable search parameters (K, NPROBES, REFINE)
- Cross-modal search capabilities
5. **Index Management**
- Creating IVF_PQ indexes for performance
- Custom index parameters
- Performance optimization
6. **Advanced Features**
- Error handling and recovery
- Performance testing concepts
- Monitoring and maintenance
- Cleanup operations
### Key Lance Commands Demonstrated
#### Dataset Management
```bash
# Create vector dataset
LANCE CREATE documents DIM 384
# Create dataset with schema
LANCE CREATE products DIM 768 SCHEMA category:string price:float available:bool
# List datasets
LANCE LIST
# Get dataset information
LANCE INFO documents
```
#### Data Operations
```bash
# Store text with metadata
LANCE STORE documents TEXT "Machine learning tutorial" category "education" author "John Doe"
# Store image with metadata
LANCE STORE images IMAGE "base64_encoded_image..." filename "photo.jpg" tags "nature,landscape"
# Store multimodal content
LANCE STORE content TEXT "Product description" IMAGE "base64_image..." type "product"
```
#### Search Operations
```bash
# Search with raw vector
LANCE SEARCH documents VECTOR "0.1,0.2,0.3,0.4" K 5
# Semantic text search
LANCE SEARCH.TEXT documents "artificial intelligence" K 10 NPROBES 20
# Generate embeddings
LANCE EMBED.TEXT "Hello world" "Machine learning"
```
#### Index Management
```bash
# Create performance index
LANCE CREATE.INDEX documents IVF_PQ PARTITIONS 256 SUBVECTORS 16
# Drop dataset
LANCE DROP old_dataset
```
### Configuration
#### Setting Up Embedding Service
```bash
# Configure embedding service URL
redis-cli HSET config:core:aiembed url "http://your-embedding-service:8080/embed"
# Optional: Set authentication token
redis-cli HSET config:core:aiembed token "your-api-token"
```
#### Embedding Service API
Your embedding service should accept POST requests:
```json
{
"texts": ["text1", "text2"],
"images": ["base64_image1", "base64_image2"],
"model": "your-model-name"
}
```
And return responses:
```json
{
"embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]],
"model": "model-name",
"usage": {"tokens": 100, "requests": 2}
}
```
### Interactive Features
The demo script includes:
- **Colored output** for better readability
- **Step-by-step execution** with explanations
- **Error handling** demonstrations
- **Automatic cleanup** options
- **Performance testing** concepts
- **Real-world usage** examples
### Use Cases Demonstrated
1. **Document Search System**
- Semantic document retrieval
- Metadata filtering
- Relevance ranking
2. **Image Similarity Search**
- Visual content matching
- Tag-based filtering
- Multimodal queries
3. **Product Recommendations**
- Feature-based similarity
- Category filtering
- Price range queries
4. **Content Management**
- Mixed media storage
- Cross-modal search
- Rich metadata support
---
## Tantivy Search Demo (Bash Script)
### Overview
The `tantivy_search_demo.sh` script provides a comprehensive demonstration of HeroDB's search functionality using Redis commands. It showcases various search scenarios including basic text search, filtering, sorting, geographic queries, and more.
### Prerequisites
1. **HeroDB Server**: The server must be running on port 6381
2. **Redis CLI**: The `redis-cli` tool must be installed and available in your PATH
### Running the Demo
#### Step 1: Start HeroDB Server
```bash
# From the project root directory
cargo run -- --port 6381
```
#### Step 2: Run the Demo (in a new terminal)
```bash
# From the project root directory
./examples/tantivy_search_demo.sh
```
### What the Demo Covers
The script demonstrates 15 different search scenarios:
1. **Index Creation** - Creating a search index with various field types
2. **Data Insertion** - Adding sample products to the index
3. **Basic Text Search** - Simple keyword searches
4. **Filtered Search** - Combining text search with category filters
5. **Numeric Range Search** - Finding products within price ranges
6. **Sorting Results** - Ordering results by different fields
7. **Limited Results** - Pagination and result limiting
8. **Complex Queries** - Multi-field searches with sorting
9. **Geographic Search** - Location-based queries
10. **Index Information** - Getting statistics about the search index
11. **Search Comparison** - Tantivy vs simple pattern matching
12. **Fuzzy Search** - Typo tolerance and approximate matching
13. **Phrase Search** - Exact phrase matching
14. **Boolean Queries** - AND, OR, NOT operators
15. **Cleanup** - Removing test data
### Sample Data
The demo uses a product catalog with the following fields:
- **title** (TEXT) - Product name with higher search weight
- **description** (TEXT) - Detailed product description
- **category** (TAG) - Comma-separated categories
- **price** (NUMERIC) - Product price for range queries
- **rating** (NUMERIC) - Customer rating for sorting
- **location** (GEO) - Geographic coordinates for location searches
### Key Redis Commands Demonstrated
#### Index Management
```bash
# Create search index
FT.CREATE product_catalog ON HASH PREFIX 1 product: SCHEMA title TEXT WEIGHT 2.0 SORTABLE description TEXT category TAG SEPARATOR , price NUMERIC SORTABLE rating NUMERIC SORTABLE location GEO
# Get index information
FT.INFO product_catalog
# Drop index
FT.DROPINDEX product_catalog
```
#### Search Queries
```bash
# Basic text search
FT.SEARCH product_catalog wireless
# Filtered search
FT.SEARCH product_catalog 'organic @category:{food}'
# Numeric range
FT.SEARCH product_catalog '@price:[50 150]'
# Sorted results
FT.SEARCH product_catalog '@category:{electronics}' SORTBY price ASC
# Geographic search
FT.SEARCH product_catalog '@location:[37.7749 -122.4194 50 km]'
# Boolean queries
FT.SEARCH product_catalog 'wireless AND audio'
FT.SEARCH product_catalog 'coffee OR tea'
# Phrase search
FT.SEARCH product_catalog '"noise canceling"'
```
### Interactive Features
The demo script includes:
- **Colored output** for better readability
- **Pause between steps** to review results
- **Error handling** with clear error messages
- **Automatic cleanup** of test data
- **Progress indicators** showing what each step demonstrates
### Troubleshooting
#### HeroDB Not Running
```
✗ HeroDB is not running on port 6381
Please start HeroDB with: cargo run -- --port 6381
```
**Solution**: Start the HeroDB server in a separate terminal.
#### Redis CLI Not Found
```
redis-cli: command not found
```
**Solution**: Install Redis tools or use an alternative Redis client.
#### Connection Refused
```
Could not connect to Redis at localhost:6381: Connection refused
```
**Solution**: Ensure HeroDB is running and listening on the correct port.
### Manual Testing
You can also run individual commands manually:
```bash
# Connect to HeroDB
redis-cli -h localhost -p 6381
# Create a simple index
FT.CREATE myindex ON HASH SCHEMA title TEXT description TEXT
# Add a document
HSET doc:1 title "Hello World" description "This is a test document"
# Search
FT.SEARCH myindex hello
```
### Performance Notes
- **Indexing**: Documents are indexed in real-time as they're added
- **Search Speed**: Full-text search is much faster than pattern matching on large datasets
- **Memory Usage**: Tantivy indexes are memory-efficient and disk-backed
- **Scalability**: Supports millions of documents with sub-second search times
### Advanced Features
The demo showcases advanced Tantivy features:
- **Relevance Scoring** - Results ranked by relevance
- **Fuzzy Matching** - Handles typos and approximate matches
- **Field Weighting** - Title field has higher search weight
- **Multi-field Search** - Search across multiple fields simultaneously
- **Geographic Queries** - Distance-based location searches
- **Numeric Ranges** - Efficient range queries on numeric fields
- **Tag Filtering** - Fast categorical filtering
### Next Steps
After running the demo, explore:
1. **Custom Schemas** - Define your own field types and configurations
2. **Large Datasets** - Test with thousands or millions of documents
3. **Real Applications** - Integrate search into your applications
4. **Performance Tuning** - Optimize for your specific use case
For more information, see the [search documentation](../herodb/docs/search.md).