356 lines
10 KiB
Markdown
356 lines
10 KiB
Markdown
# HeroDB Examples
|
||
|
||
This directory contains examples demonstrating HeroDB's capabilities including full-text search powered by Tantivy and vector database operations using Lance.
|
||
|
||
## Available Examples
|
||
|
||
1. **[Tantivy Search Demo](#tantivy-search-demo-bash-script)** - Full-text search capabilities
|
||
2. **[Lance Vector Database Demo](#lance-vector-database-demo-bash-script)** - Vector database and AI operations
|
||
3. **[AGE Encryption Demo](age_bash_demo.sh)** - Cryptographic operations
|
||
4. **[Simple Demo](simple_demo.sh)** - Basic Redis operations
|
||
|
||
---
|
||
|
||
## Lance Vector Database Demo (Bash Script)
|
||
|
||
### Overview
|
||
The `lance_vector_demo.sh` script provides a comprehensive demonstration of HeroDB's vector database capabilities using Lance. It showcases vector storage, similarity search, multimodal data handling, and AI-powered operations with external embedding services.
|
||
|
||
### Prerequisites
|
||
1. **HeroDB Server**: The server must be running (default port 6379)
|
||
2. **Redis CLI**: The `redis-cli` tool must be installed and available in your PATH
|
||
3. **Embedding Service** (optional): For full functionality, set up an external embedding service
|
||
|
||
### Running the Demo
|
||
|
||
#### Step 1: Start HeroDB Server
|
||
```bash
|
||
# From the project root directory
|
||
cargo run -- --dir ./test_data --port 6379
|
||
```
|
||
|
||
#### Step 2: Run the Demo (in a new terminal)
|
||
```bash
|
||
# From the project root directory
|
||
./examples/lance_vector_demo.sh
|
||
```
|
||
|
||
### What the Demo Covers
|
||
|
||
The script demonstrates comprehensive vector database operations:
|
||
|
||
1. **Dataset Management**
|
||
- Creating vector datasets with custom dimensions
|
||
- Defining schemas with metadata fields
|
||
- Listing and inspecting datasets
|
||
- Dataset information and statistics
|
||
|
||
2. **Embedding Operations**
|
||
- Text embedding generation via external services
|
||
- Multimodal embedding support (text + images)
|
||
- Batch embedding operations
|
||
|
||
3. **Data Storage**
|
||
- Storing text documents with automatic embedding
|
||
- Storing images with metadata
|
||
- Multimodal content storage
|
||
- Rich metadata support
|
||
|
||
4. **Vector Search**
|
||
- Similarity search with raw vectors
|
||
- Text-based semantic search
|
||
- Configurable search parameters (K, NPROBES, REFINE)
|
||
- Cross-modal search capabilities
|
||
|
||
5. **Index Management**
|
||
- Creating IVF_PQ indexes for performance
|
||
- Custom index parameters
|
||
- Performance optimization
|
||
|
||
6. **Advanced Features**
|
||
- Error handling and recovery
|
||
- Performance testing concepts
|
||
- Monitoring and maintenance
|
||
- Cleanup operations
|
||
|
||
### Key Lance Commands Demonstrated
|
||
|
||
#### Dataset Management
|
||
```bash
|
||
# Create vector dataset
|
||
LANCE CREATE documents DIM 384
|
||
|
||
# Create dataset with schema
|
||
LANCE CREATE products DIM 768 SCHEMA category:string price:float available:bool
|
||
|
||
# List datasets
|
||
LANCE LIST
|
||
|
||
# Get dataset information
|
||
LANCE INFO documents
|
||
```
|
||
|
||
#### Data Operations
|
||
```bash
|
||
# Store text with metadata
|
||
LANCE STORE documents TEXT "Machine learning tutorial" category "education" author "John Doe"
|
||
|
||
# Store image with metadata
|
||
LANCE STORE images IMAGE "base64_encoded_image..." filename "photo.jpg" tags "nature,landscape"
|
||
|
||
# Store multimodal content
|
||
LANCE STORE content TEXT "Product description" IMAGE "base64_image..." type "product"
|
||
```
|
||
|
||
#### Search Operations
|
||
```bash
|
||
# Search with raw vector
|
||
LANCE SEARCH documents VECTOR "0.1,0.2,0.3,0.4" K 5
|
||
|
||
# Semantic text search
|
||
LANCE SEARCH.TEXT documents "artificial intelligence" K 10 NPROBES 20
|
||
|
||
# Generate embeddings
|
||
LANCE EMBED.TEXT "Hello world" "Machine learning"
|
||
```
|
||
|
||
#### Index Management
|
||
```bash
|
||
# Create performance index
|
||
LANCE CREATE.INDEX documents IVF_PQ PARTITIONS 256 SUBVECTORS 16
|
||
|
||
# Drop dataset
|
||
LANCE DROP old_dataset
|
||
```
|
||
|
||
### Configuration
|
||
|
||
#### Setting Up Embedding Service
|
||
```bash
|
||
# Configure embedding service URL
|
||
redis-cli HSET config:core:aiembed url "http://your-embedding-service:8080/embed"
|
||
|
||
# Optional: Set authentication token
|
||
redis-cli HSET config:core:aiembed token "your-api-token"
|
||
```
|
||
|
||
#### Embedding Service API
|
||
Your embedding service should accept POST requests:
|
||
```json
|
||
{
|
||
"texts": ["text1", "text2"],
|
||
"images": ["base64_image1", "base64_image2"],
|
||
"model": "your-model-name"
|
||
}
|
||
```
|
||
|
||
And return responses:
|
||
```json
|
||
{
|
||
"embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]],
|
||
"model": "model-name",
|
||
"usage": {"tokens": 100, "requests": 2}
|
||
}
|
||
```
|
||
|
||
### Interactive Features
|
||
|
||
The demo script includes:
|
||
- **Colored output** for better readability
|
||
- **Step-by-step execution** with explanations
|
||
- **Error handling** demonstrations
|
||
- **Automatic cleanup** options
|
||
- **Performance testing** concepts
|
||
- **Real-world usage** examples
|
||
|
||
### Use Cases Demonstrated
|
||
|
||
1. **Document Search System**
|
||
- Semantic document retrieval
|
||
- Metadata filtering
|
||
- Relevance ranking
|
||
|
||
2. **Image Similarity Search**
|
||
- Visual content matching
|
||
- Tag-based filtering
|
||
- Multimodal queries
|
||
|
||
3. **Product Recommendations**
|
||
- Feature-based similarity
|
||
- Category filtering
|
||
- Price range queries
|
||
|
||
4. **Content Management**
|
||
- Mixed media storage
|
||
- Cross-modal search
|
||
- Rich metadata support
|
||
|
||
---
|
||
|
||
## Tantivy Search Demo (Bash Script)
|
||
|
||
### Overview
|
||
The `tantivy_search_demo.sh` script provides a comprehensive demonstration of HeroDB's search functionality using Redis commands. It showcases various search scenarios including basic text search, filtering, sorting, geographic queries, and more.
|
||
|
||
### Prerequisites
|
||
1. **HeroDB Server**: The server must be running on port 6381
|
||
2. **Redis CLI**: The `redis-cli` tool must be installed and available in your PATH
|
||
|
||
### Running the Demo
|
||
|
||
#### Step 1: Start HeroDB Server
|
||
```bash
|
||
# From the project root directory
|
||
cargo run -- --port 6381
|
||
```
|
||
|
||
#### Step 2: Run the Demo (in a new terminal)
|
||
```bash
|
||
# From the project root directory
|
||
./examples/tantivy_search_demo.sh
|
||
```
|
||
|
||
### What the Demo Covers
|
||
|
||
The script demonstrates 15 different search scenarios:
|
||
|
||
1. **Index Creation** - Creating a search index with various field types
|
||
2. **Data Insertion** - Adding sample products to the index
|
||
3. **Basic Text Search** - Simple keyword searches
|
||
4. **Filtered Search** - Combining text search with category filters
|
||
5. **Numeric Range Search** - Finding products within price ranges
|
||
6. **Sorting Results** - Ordering results by different fields
|
||
7. **Limited Results** - Pagination and result limiting
|
||
8. **Complex Queries** - Multi-field searches with sorting
|
||
9. **Geographic Search** - Location-based queries
|
||
10. **Index Information** - Getting statistics about the search index
|
||
11. **Search Comparison** - Tantivy vs simple pattern matching
|
||
12. **Fuzzy Search** - Typo tolerance and approximate matching
|
||
13. **Phrase Search** - Exact phrase matching
|
||
14. **Boolean Queries** - AND, OR, NOT operators
|
||
15. **Cleanup** - Removing test data
|
||
|
||
### Sample Data
|
||
|
||
The demo uses a product catalog with the following fields:
|
||
- **title** (TEXT) - Product name with higher search weight
|
||
- **description** (TEXT) - Detailed product description
|
||
- **category** (TAG) - Comma-separated categories
|
||
- **price** (NUMERIC) - Product price for range queries
|
||
- **rating** (NUMERIC) - Customer rating for sorting
|
||
- **location** (GEO) - Geographic coordinates for location searches
|
||
|
||
### Key Redis Commands Demonstrated
|
||
|
||
#### Index Management
|
||
```bash
|
||
# Create search index
|
||
FT.CREATE product_catalog ON HASH PREFIX 1 product: SCHEMA title TEXT WEIGHT 2.0 SORTABLE description TEXT category TAG SEPARATOR , price NUMERIC SORTABLE rating NUMERIC SORTABLE location GEO
|
||
|
||
# Get index information
|
||
FT.INFO product_catalog
|
||
|
||
# Drop index
|
||
FT.DROPINDEX product_catalog
|
||
```
|
||
|
||
#### Search Queries
|
||
```bash
|
||
# Basic text search
|
||
FT.SEARCH product_catalog wireless
|
||
|
||
# Filtered search
|
||
FT.SEARCH product_catalog 'organic @category:{food}'
|
||
|
||
# Numeric range
|
||
FT.SEARCH product_catalog '@price:[50 150]'
|
||
|
||
# Sorted results
|
||
FT.SEARCH product_catalog '@category:{electronics}' SORTBY price ASC
|
||
|
||
# Geographic search
|
||
FT.SEARCH product_catalog '@location:[37.7749 -122.4194 50 km]'
|
||
|
||
# Boolean queries
|
||
FT.SEARCH product_catalog 'wireless AND audio'
|
||
FT.SEARCH product_catalog 'coffee OR tea'
|
||
|
||
# Phrase search
|
||
FT.SEARCH product_catalog '"noise canceling"'
|
||
```
|
||
|
||
### Interactive Features
|
||
|
||
The demo script includes:
|
||
- **Colored output** for better readability
|
||
- **Pause between steps** to review results
|
||
- **Error handling** with clear error messages
|
||
- **Automatic cleanup** of test data
|
||
- **Progress indicators** showing what each step demonstrates
|
||
|
||
### Troubleshooting
|
||
|
||
#### HeroDB Not Running
|
||
```
|
||
✗ HeroDB is not running on port 6381
|
||
ℹ Please start HeroDB with: cargo run -- --port 6381
|
||
```
|
||
**Solution**: Start the HeroDB server in a separate terminal.
|
||
|
||
#### Redis CLI Not Found
|
||
```
|
||
redis-cli: command not found
|
||
```
|
||
**Solution**: Install Redis tools or use an alternative Redis client.
|
||
|
||
#### Connection Refused
|
||
```
|
||
Could not connect to Redis at localhost:6381: Connection refused
|
||
```
|
||
**Solution**: Ensure HeroDB is running and listening on the correct port.
|
||
|
||
### Manual Testing
|
||
|
||
You can also run individual commands manually:
|
||
|
||
```bash
|
||
# Connect to HeroDB
|
||
redis-cli -h localhost -p 6381
|
||
|
||
# Create a simple index
|
||
FT.CREATE myindex ON HASH SCHEMA title TEXT description TEXT
|
||
|
||
# Add a document
|
||
HSET doc:1 title "Hello World" description "This is a test document"
|
||
|
||
# Search
|
||
FT.SEARCH myindex hello
|
||
```
|
||
|
||
### Performance Notes
|
||
|
||
- **Indexing**: Documents are indexed in real-time as they're added
|
||
- **Search Speed**: Full-text search is much faster than pattern matching on large datasets
|
||
- **Memory Usage**: Tantivy indexes are memory-efficient and disk-backed
|
||
- **Scalability**: Supports millions of documents with sub-second search times
|
||
|
||
### Advanced Features
|
||
|
||
The demo showcases advanced Tantivy features:
|
||
- **Relevance Scoring** - Results ranked by relevance
|
||
- **Fuzzy Matching** - Handles typos and approximate matches
|
||
- **Field Weighting** - Title field has higher search weight
|
||
- **Multi-field Search** - Search across multiple fields simultaneously
|
||
- **Geographic Queries** - Distance-based location searches
|
||
- **Numeric Ranges** - Efficient range queries on numeric fields
|
||
- **Tag Filtering** - Fast categorical filtering
|
||
|
||
### Next Steps
|
||
|
||
After running the demo, explore:
|
||
1. **Custom Schemas** - Define your own field types and configurations
|
||
2. **Large Datasets** - Test with thousands or millions of documents
|
||
3. **Real Applications** - Integrate search into your applications
|
||
4. **Performance Tuning** - Optimize for your specific use case
|
||
|
||
For more information, see the [search documentation](../herodb/docs/search.md). |