This commit is contained in:
2025-08-25 06:00:08 +02:00
parent ab56fad635
commit 9410176684
7 changed files with 1507 additions and 534 deletions

View File

@@ -1,6 +1,191 @@
# HeroDB Tantivy Search Examples
# HeroDB Examples
This directory contains examples demonstrating HeroDB's full-text search capabilities powered by Tantivy.
This directory contains examples demonstrating HeroDB's capabilities including full-text search powered by Tantivy and vector database operations using Lance.
## Available Examples
1. **[Tantivy Search Demo](#tantivy-search-demo-bash-script)** - Full-text search capabilities
2. **[Lance Vector Database Demo](#lance-vector-database-demo-bash-script)** - Vector database and AI operations
3. **[AGE Encryption Demo](age_bash_demo.sh)** - Cryptographic operations
4. **[Simple Demo](simple_demo.sh)** - Basic Redis operations
---
## Lance Vector Database Demo (Bash Script)
### Overview
The `lance_vector_demo.sh` script provides a comprehensive demonstration of HeroDB's vector database capabilities using Lance. It showcases vector storage, similarity search, multimodal data handling, and AI-powered operations with external embedding services.
### Prerequisites
1. **HeroDB Server**: The server must be running (default port 6379)
2. **Redis CLI**: The `redis-cli` tool must be installed and available in your PATH
3. **Embedding Service** (optional): For full functionality, set up an external embedding service
### Running the Demo
#### Step 1: Start HeroDB Server
```bash
# From the project root directory
cargo run -- --dir ./test_data --port 6379
```
#### Step 2: Run the Demo (in a new terminal)
```bash
# From the project root directory
./examples/lance_vector_demo.sh
```
### What the Demo Covers
The script demonstrates comprehensive vector database operations:
1. **Dataset Management**
- Creating vector datasets with custom dimensions
- Defining schemas with metadata fields
- Listing and inspecting datasets
- Dataset information and statistics
2. **Embedding Operations**
- Text embedding generation via external services
- Multimodal embedding support (text + images)
- Batch embedding operations
3. **Data Storage**
- Storing text documents with automatic embedding
- Storing images with metadata
- Multimodal content storage
- Rich metadata support
4. **Vector Search**
- Similarity search with raw vectors
- Text-based semantic search
- Configurable search parameters (K, NPROBES, REFINE)
- Cross-modal search capabilities
5. **Index Management**
- Creating IVF_PQ indexes for performance
- Custom index parameters
- Performance optimization
6. **Advanced Features**
- Error handling and recovery
- Performance testing concepts
- Monitoring and maintenance
- Cleanup operations
### Key Lance Commands Demonstrated
#### Dataset Management
```bash
# Create vector dataset
LANCE CREATE documents DIM 384
# Create dataset with schema
LANCE CREATE products DIM 768 SCHEMA category:string price:float available:bool
# List datasets
LANCE LIST
# Get dataset information
LANCE INFO documents
```
#### Data Operations
```bash
# Store text with metadata
LANCE STORE documents TEXT "Machine learning tutorial" category "education" author "John Doe"
# Store image with metadata
LANCE STORE images IMAGE "base64_encoded_image..." filename "photo.jpg" tags "nature,landscape"
# Store multimodal content
LANCE STORE content TEXT "Product description" IMAGE "base64_image..." type "product"
```
#### Search Operations
```bash
# Search with raw vector
LANCE SEARCH documents VECTOR "0.1,0.2,0.3,0.4" K 5
# Semantic text search
LANCE SEARCH.TEXT documents "artificial intelligence" K 10 NPROBES 20
# Generate embeddings
LANCE EMBED.TEXT "Hello world" "Machine learning"
```
#### Index Management
```bash
# Create performance index
LANCE CREATE.INDEX documents IVF_PQ PARTITIONS 256 SUBVECTORS 16
# Drop dataset
LANCE DROP old_dataset
```
### Configuration
#### Setting Up Embedding Service
```bash
# Configure embedding service URL
redis-cli HSET config:core:aiembed url "http://your-embedding-service:8080/embed"
# Optional: Set authentication token
redis-cli HSET config:core:aiembed token "your-api-token"
```
#### Embedding Service API
Your embedding service should accept POST requests:
```json
{
"texts": ["text1", "text2"],
"images": ["base64_image1", "base64_image2"],
"model": "your-model-name"
}
```
And return responses:
```json
{
"embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]],
"model": "model-name",
"usage": {"tokens": 100, "requests": 2}
}
```
### Interactive Features
The demo script includes:
- **Colored output** for better readability
- **Step-by-step execution** with explanations
- **Error handling** demonstrations
- **Automatic cleanup** options
- **Performance testing** concepts
- **Real-world usage** examples
### Use Cases Demonstrated
1. **Document Search System**
- Semantic document retrieval
- Metadata filtering
- Relevance ranking
2. **Image Similarity Search**
- Visual content matching
- Tag-based filtering
- Multimodal queries
3. **Product Recommendations**
- Feature-based similarity
- Category filtering
- Price range queries
4. **Content Management**
- Mixed media storage
- Cross-modal search
- Rich metadata support
---
## Tantivy Search Demo (Bash Script)