397 lines
		
	
	
		
			9.4 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			397 lines
		
	
	
		
			9.4 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
# Full-Text Search with Tantivy
 | 
						|
 | 
						|
HeroDB includes powerful full-text search capabilities powered by [Tantivy](https://github.com/quickwit-oss/tantivy), a fast full-text search engine library written in Rust. This provides Redis-compatible search commands similar to RediSearch.
 | 
						|
 | 
						|
## Overview
 | 
						|
 | 
						|
The search functionality allows you to:
 | 
						|
- Create search indexes with custom schemas
 | 
						|
- Index documents with multiple field types
 | 
						|
- Perform complex queries with filters
 | 
						|
- Support for text, numeric, date, and geographic data
 | 
						|
- Real-time search with high performance
 | 
						|
 | 
						|
## Search Commands
 | 
						|
 | 
						|
### FT.CREATE - Create Search Index
 | 
						|
 | 
						|
Create a new search index with a defined schema.
 | 
						|
 | 
						|
```bash
 | 
						|
FT.CREATE index_name SCHEMA field_name field_type [options] [field_name field_type [options] ...]
 | 
						|
```
 | 
						|
 | 
						|
**Field Types:**
 | 
						|
- `TEXT` - Full-text searchable text fields
 | 
						|
- `NUMERIC` - Numeric fields (integers, floats)
 | 
						|
- `TAG` - Tag fields for exact matching
 | 
						|
- `GEO` - Geographic coordinates (lat,lon)
 | 
						|
- `DATE` - Date/timestamp fields
 | 
						|
 | 
						|
**Field Options:**
 | 
						|
- `STORED` - Store field value for retrieval
 | 
						|
- `INDEXED` - Make field searchable
 | 
						|
- `TOKENIZED` - Enable tokenization for text fields
 | 
						|
- `FAST` - Enable fast access for numeric fields
 | 
						|
 | 
						|
**Example:**
 | 
						|
```bash
 | 
						|
# Create a product search index
 | 
						|
FT.CREATE products SCHEMA 
 | 
						|
  title TEXT STORED INDEXED TOKENIZED
 | 
						|
  description TEXT STORED INDEXED TOKENIZED  
 | 
						|
  price NUMERIC STORED INDEXED FAST
 | 
						|
  category TAG STORED
 | 
						|
  location GEO STORED
 | 
						|
  created_date DATE STORED INDEXED
 | 
						|
```
 | 
						|
 | 
						|
### FT.ADD - Add Document to Index
 | 
						|
 | 
						|
Add a document to a search index.
 | 
						|
 | 
						|
```bash
 | 
						|
FT.ADD index_name doc_id [SCORE score] FIELDS field_name field_value [field_name field_value ...]
 | 
						|
```
 | 
						|
 | 
						|
**Example:**
 | 
						|
```bash
 | 
						|
# Add a product document
 | 
						|
FT.ADD products product:1 SCORE 1.0 FIELDS 
 | 
						|
  title "Wireless Headphones" 
 | 
						|
  description "High-quality wireless headphones with noise cancellation"
 | 
						|
  price 199.99
 | 
						|
  category "electronics"
 | 
						|
  location "37.7749,-122.4194"
 | 
						|
  created_date 1640995200000
 | 
						|
```
 | 
						|
 | 
						|
### FT.SEARCH - Search Documents
 | 
						|
 | 
						|
Search for documents in an index.
 | 
						|
 | 
						|
```bash
 | 
						|
FT.SEARCH index_name query [LIMIT offset count] [FILTER field min max] [RETURN field [field ...]]
 | 
						|
```
 | 
						|
 | 
						|
**Query Syntax:**
 | 
						|
- Simple terms: `wireless headphones`
 | 
						|
- Phrase queries: `"noise cancellation"`
 | 
						|
- Field-specific: `title:wireless`
 | 
						|
- Boolean operators: `wireless AND headphones`
 | 
						|
- Wildcards: `head*`
 | 
						|
 | 
						|
**Examples:**
 | 
						|
```bash
 | 
						|
# Simple text search
 | 
						|
FT.SEARCH products "wireless headphones"
 | 
						|
 | 
						|
# Search with filters
 | 
						|
FT.SEARCH products "headphones" FILTER price 100 300 LIMIT 0 10
 | 
						|
 | 
						|
# Field-specific search
 | 
						|
FT.SEARCH products "title:wireless AND category:electronics"
 | 
						|
 | 
						|
# Return specific fields only
 | 
						|
FT.SEARCH products "*" RETURN title price
 | 
						|
```
 | 
						|
 | 
						|
### FT.DEL - Delete Document
 | 
						|
 | 
						|
Remove a document from the search index.
 | 
						|
 | 
						|
```bash
 | 
						|
FT.DEL index_name doc_id
 | 
						|
```
 | 
						|
 | 
						|
**Example:**
 | 
						|
```bash
 | 
						|
FT.DEL products product:1
 | 
						|
```
 | 
						|
 | 
						|
### FT.INFO - Get Index Information
 | 
						|
 | 
						|
Get information about a search index.
 | 
						|
 | 
						|
```bash
 | 
						|
FT.INFO index_name
 | 
						|
```
 | 
						|
 | 
						|
**Returns:**
 | 
						|
- Index name and document count
 | 
						|
- Field definitions and types
 | 
						|
- Index configuration
 | 
						|
 | 
						|
**Example:**
 | 
						|
```bash
 | 
						|
FT.INFO products
 | 
						|
```
 | 
						|
 | 
						|
### FT.DROP - Drop Index
 | 
						|
 | 
						|
Delete an entire search index.
 | 
						|
 | 
						|
```bash
 | 
						|
FT.DROP index_name
 | 
						|
```
 | 
						|
 | 
						|
**Example:**
 | 
						|
```bash
 | 
						|
FT.DROP products
 | 
						|
```
 | 
						|
 | 
						|
### FT.ALTER - Alter Index Schema
 | 
						|
 | 
						|
Add new fields to an existing index.
 | 
						|
 | 
						|
```bash
 | 
						|
FT.ALTER index_name SCHEMA ADD field_name field_type [options]
 | 
						|
```
 | 
						|
 | 
						|
**Example:**
 | 
						|
```bash
 | 
						|
FT.ALTER products SCHEMA ADD brand TAG STORED
 | 
						|
```
 | 
						|
 | 
						|
### FT.AGGREGATE - Aggregate Search Results
 | 
						|
 | 
						|
Perform aggregations on search results.
 | 
						|
 | 
						|
```bash
 | 
						|
FT.AGGREGATE index_name query [GROUPBY field] [REDUCE function field AS alias]
 | 
						|
```
 | 
						|
 | 
						|
**Example:**
 | 
						|
```bash
 | 
						|
# Group products by category and count
 | 
						|
FT.AGGREGATE products "*" GROUPBY category REDUCE COUNT 0 AS count
 | 
						|
```
 | 
						|
 | 
						|
## Field Types in Detail
 | 
						|
 | 
						|
### TEXT Fields
 | 
						|
- **Purpose**: Full-text search on natural language content
 | 
						|
- **Features**: Tokenization, stemming, stop-word removal
 | 
						|
- **Options**: `STORED`, `INDEXED`, `TOKENIZED`
 | 
						|
- **Example**: Product titles, descriptions, content
 | 
						|
 | 
						|
### NUMERIC Fields  
 | 
						|
- **Purpose**: Numeric data for range queries and sorting
 | 
						|
- **Types**: I64, U64, F64
 | 
						|
- **Options**: `STORED`, `INDEXED`, `FAST`
 | 
						|
- **Example**: Prices, quantities, ratings
 | 
						|
 | 
						|
### TAG Fields
 | 
						|
- **Purpose**: Exact-match categorical data
 | 
						|
- **Features**: No tokenization, exact string matching
 | 
						|
- **Options**: `STORED`, case sensitivity control
 | 
						|
- **Example**: Categories, brands, status values
 | 
						|
 | 
						|
### GEO Fields
 | 
						|
- **Purpose**: Geographic coordinates
 | 
						|
- **Format**: "latitude,longitude" (e.g., "37.7749,-122.4194")
 | 
						|
- **Features**: Geographic distance queries
 | 
						|
- **Options**: `STORED`
 | 
						|
 | 
						|
### DATE Fields
 | 
						|
- **Purpose**: Timestamp and date data
 | 
						|
- **Format**: Unix timestamp in milliseconds
 | 
						|
- **Features**: Range queries, temporal filtering
 | 
						|
- **Options**: `STORED`, `INDEXED`, `FAST`
 | 
						|
 | 
						|
## Search Query Syntax
 | 
						|
 | 
						|
### Basic Queries
 | 
						|
```bash
 | 
						|
# Single term
 | 
						|
FT.SEARCH products "wireless"
 | 
						|
 | 
						|
# Multiple terms (AND by default)
 | 
						|
FT.SEARCH products "wireless headphones"
 | 
						|
 | 
						|
# Phrase query
 | 
						|
FT.SEARCH products "\"noise cancellation\""
 | 
						|
```
 | 
						|
 | 
						|
### Field-Specific Queries
 | 
						|
```bash
 | 
						|
# Search in specific field
 | 
						|
FT.SEARCH products "title:wireless"
 | 
						|
 | 
						|
# Multiple field queries
 | 
						|
FT.SEARCH products "title:wireless AND description:bluetooth"
 | 
						|
```
 | 
						|
 | 
						|
### Boolean Operators
 | 
						|
```bash
 | 
						|
# AND operator
 | 
						|
FT.SEARCH products "wireless AND headphones"
 | 
						|
 | 
						|
# OR operator  
 | 
						|
FT.SEARCH products "wireless OR bluetooth"
 | 
						|
 | 
						|
# NOT operator
 | 
						|
FT.SEARCH products "headphones NOT wired"
 | 
						|
```
 | 
						|
 | 
						|
### Wildcards and Fuzzy Search
 | 
						|
```bash
 | 
						|
# Wildcard search
 | 
						|
FT.SEARCH products "head*"
 | 
						|
 | 
						|
# Fuzzy search (approximate matching)
 | 
						|
FT.SEARCH products "%headphone%"
 | 
						|
```
 | 
						|
 | 
						|
### Range Queries
 | 
						|
```bash
 | 
						|
# Numeric range in query
 | 
						|
FT.SEARCH products "@price:[100 300]"
 | 
						|
 | 
						|
# Date range
 | 
						|
FT.SEARCH products "@created_date:[1640995200000 1672531200000]"
 | 
						|
```
 | 
						|
 | 
						|
## Filtering and Sorting
 | 
						|
 | 
						|
### FILTER Clause
 | 
						|
```bash
 | 
						|
# Numeric filter
 | 
						|
FT.SEARCH products "headphones" FILTER price 100 300
 | 
						|
 | 
						|
# Multiple filters
 | 
						|
FT.SEARCH products "*" FILTER price 100 500 FILTER rating 4 5
 | 
						|
```
 | 
						|
 | 
						|
### LIMIT Clause
 | 
						|
```bash
 | 
						|
# Pagination
 | 
						|
FT.SEARCH products "wireless" LIMIT 0 10    # First 10 results
 | 
						|
FT.SEARCH products "wireless" LIMIT 10 10   # Next 10 results
 | 
						|
```
 | 
						|
 | 
						|
### RETURN Clause
 | 
						|
```bash
 | 
						|
# Return specific fields
 | 
						|
FT.SEARCH products "*" RETURN title price
 | 
						|
 | 
						|
# Return all stored fields (default)
 | 
						|
FT.SEARCH products "*"
 | 
						|
```
 | 
						|
 | 
						|
## Performance Considerations
 | 
						|
 | 
						|
### Indexing Strategy
 | 
						|
- Only index fields you need to search on
 | 
						|
- Use `FAST` option for frequently filtered numeric fields
 | 
						|
- Consider storage vs. search performance trade-offs
 | 
						|
 | 
						|
### Query Optimization
 | 
						|
- Use specific field queries when possible
 | 
						|
- Combine filters with text queries for better performance
 | 
						|
- Use pagination with LIMIT for large result sets
 | 
						|
 | 
						|
### Memory Usage
 | 
						|
- Tantivy indexes are memory-mapped for performance
 | 
						|
- Index size depends on document count and field configuration
 | 
						|
- Monitor disk space for index storage
 | 
						|
 | 
						|
## Integration with Redis Commands
 | 
						|
 | 
						|
Search indexes work alongside regular Redis data:
 | 
						|
 | 
						|
```bash
 | 
						|
# Store product data in Redis hash
 | 
						|
HSET product:1 title "Wireless Headphones" price "199.99"
 | 
						|
 | 
						|
# Index the same data for search
 | 
						|
FT.ADD products product:1 FIELDS title "Wireless Headphones" price 199.99
 | 
						|
 | 
						|
# Search returns document IDs that can be used with Redis commands
 | 
						|
FT.SEARCH products "wireless"
 | 
						|
# Returns: product:1
 | 
						|
 | 
						|
# Retrieve full data using Redis
 | 
						|
HGETALL product:1
 | 
						|
```
 | 
						|
 | 
						|
## Example Use Cases
 | 
						|
 | 
						|
### E-commerce Product Search
 | 
						|
```bash
 | 
						|
# Create product catalog index
 | 
						|
FT.CREATE catalog SCHEMA 
 | 
						|
  name TEXT STORED INDEXED TOKENIZED
 | 
						|
  description TEXT INDEXED TOKENIZED
 | 
						|
  price NUMERIC STORED INDEXED FAST
 | 
						|
  category TAG STORED
 | 
						|
  brand TAG STORED
 | 
						|
  rating NUMERIC STORED FAST
 | 
						|
 | 
						|
# Add products
 | 
						|
FT.ADD catalog prod:1 FIELDS name "iPhone 14" price 999 category "phones" brand "apple" rating 4.5
 | 
						|
FT.ADD catalog prod:2 FIELDS name "Samsung Galaxy" price 899 category "phones" brand "samsung" rating 4.3
 | 
						|
 | 
						|
# Search queries
 | 
						|
FT.SEARCH catalog "iPhone"
 | 
						|
FT.SEARCH catalog "phones" FILTER price 800 1000
 | 
						|
FT.SEARCH catalog "@brand:apple"
 | 
						|
```
 | 
						|
 | 
						|
### Content Management
 | 
						|
```bash
 | 
						|
# Create content index
 | 
						|
FT.CREATE content SCHEMA
 | 
						|
  title TEXT STORED INDEXED TOKENIZED
 | 
						|
  body TEXT INDEXED TOKENIZED
 | 
						|
  author TAG STORED
 | 
						|
  published DATE STORED INDEXED
 | 
						|
  tags TAG STORED
 | 
						|
 | 
						|
# Search content
 | 
						|
FT.SEARCH content "machine learning"
 | 
						|
FT.SEARCH content "@author:john AND @tags:ai"
 | 
						|
FT.SEARCH content "*" FILTER published 1640995200000 1672531200000
 | 
						|
```
 | 
						|
 | 
						|
### Geographic Search
 | 
						|
```bash
 | 
						|
# Create location-based index
 | 
						|
FT.CREATE places SCHEMA
 | 
						|
  name TEXT STORED INDEXED TOKENIZED
 | 
						|
  location GEO STORED
 | 
						|
  type TAG STORED
 | 
						|
 | 
						|
# Add locations
 | 
						|
FT.ADD places place:1 FIELDS name "Golden Gate Bridge" location "37.8199,-122.4783" type "landmark"
 | 
						|
 | 
						|
# Geographic queries (future feature)
 | 
						|
FT.SEARCH places "@location:[37.7749 -122.4194 10 km]"
 | 
						|
```
 | 
						|
 | 
						|
## Error Handling
 | 
						|
 | 
						|
Common error responses:
 | 
						|
- `ERR index not found` - Index doesn't exist
 | 
						|
- `ERR field not found` - Field not defined in schema
 | 
						|
- `ERR invalid query syntax` - Malformed query
 | 
						|
- `ERR document not found` - Document ID doesn't exist
 | 
						|
 | 
						|
## Best Practices
 | 
						|
 | 
						|
1. **Schema Design**: Plan your schema carefully - changes require reindexing
 | 
						|
2. **Field Selection**: Only store and index fields you actually need
 | 
						|
3. **Batch Operations**: Add multiple documents efficiently
 | 
						|
4. **Query Testing**: Test queries for performance with realistic data
 | 
						|
5. **Monitoring**: Monitor index size and query performance
 | 
						|
6. **Backup**: Include search indexes in backup strategies
 | 
						|
 | 
						|
## Future Enhancements
 | 
						|
 | 
						|
Planned features:
 | 
						|
- Geographic distance queries
 | 
						|
- Advanced aggregations and faceting
 | 
						|
- Highlighting of search results
 | 
						|
- Synonyms and custom analyzers
 | 
						|
- Real-time suggestions and autocomplete
 | 
						|
- Index replication and sharding |