db/tst/README.md
2025-04-20 07:28:59 +02:00

185 lines
4.7 KiB
Markdown

# Ternary Search Tree (TST)
A persistent ternary search tree implementation in Rust using OurDB for storage.
## Overview
TST is a space-optimized tree data structure that enables efficient string key operations with persistent storage. This implementation provides a persistent ternary search tree that can be used for efficient string key operations, such as auto-complete, routing tables, and more.
A ternary search tree is a type of trie where each node has three children: left, middle, and right. Unlike a radix tree which compresses common prefixes, a TST stores one character per node and uses a binary search tree-like structure for efficient traversal.
Key characteristics:
- Each node stores a single character
- Nodes have three children: left (for characters < current), middle (for next character in key), and right (for characters > current)
- Leaf nodes contain the actual values
- Balanced structure for consistent performance across operations
## Features
- Efficient string key operations
- Persistent storage using OurDB backend
- Balanced tree structure for consistent performance
- Support for binary values
- Thread-safe operations through OurDB
## Usage
Add the dependency to your `Cargo.toml`:
```toml
[dependencies]
tst = { path = "../tst" }
```
### Basic Example
```rust
use tst::TST;
fn main() -> Result<(), tst::Error> {
// Create a new ternary search tree
let mut tree = TST::new("/tmp/tst", false)?;
// Set key-value pairs
tree.set("hello", b"world".to_vec())?;
tree.set("help", b"me".to_vec())?;
// Get values by key
let value = tree.get("hello")?;
println!("hello: {}", String::from_utf8_lossy(&value)); // Prints: world
// List keys by prefix
let keys = tree.list("hel")?; // Returns ["hello", "help"]
println!("Keys with prefix 'hel': {:?}", keys);
// Get all values by prefix
let values = tree.getall("hel")?; // Returns [b"world", b"me"]
// Delete keys
tree.delete("help")?;
Ok(())
}
```
## API
### Creating a TST
```rust
// Create a new ternary search tree
let mut tree = TST::new("/tmp/tst", false)?;
// Create a new ternary search tree and reset if it exists
let mut tree = TST::new("/tmp/tst", true)?;
```
### Setting Values
```rust
// Set a key-value pair
tree.set("key", b"value".to_vec())?;
```
### Getting Values
```rust
// Get a value by key
let value = tree.get("key")?;
```
### Deleting Keys
```rust
// Delete a key
tree.delete("key")?;
```
### Listing Keys by Prefix
```rust
// List all keys with a given prefix
let keys = tree.list("prefix")?;
```
### Getting All Values by Prefix
```rust
// Get all values for keys with a given prefix
let values = tree.getall("prefix")?;
```
## Performance Characteristics
- Search: O(k) where k is the key length
- Insert: O(k) for new keys
- Delete: O(k) plus potential node cleanup
- Space: O(n) where n is the total number of nodes
## Use Cases
TST is particularly useful for:
- Prefix-based searching
- Auto-complete systems
- Dictionary implementations
- Spell checking
- Any application requiring efficient string key operations with persistence
## Implementation Details
The TST implementation uses OurDB for persistent storage:
- Each node is serialized and stored as a record in OurDB
- Node references use OurDB record IDs
- The tree maintains a root node ID for traversal
- Node serialization includes version tracking for format evolution
## Running Tests
The project includes a comprehensive test suite that verifies all functionality:
```bash
cd ~/code/git.ourworld.tf/herocode/db/tst
# Run all tests
cargo test
# Run specific test file
cargo test --test basic_test
cargo test --test prefix_test
```
## Running Examples
The project includes example applications that demonstrate how to use the TST:
```bash
# Run the basic usage example
cargo run --example basic_usage
# Run the prefix operations example
cargo run --example prefix_ops
# Run the performance test
cargo run --example performance
```
## Comparison with RadixTree
While both TST and RadixTree provide efficient string key operations, they have different characteristics:
- **TST**: Stores one character per node, with a balanced structure for consistent performance across operations.
- **RadixTree**: Compresses common prefixes, which can be more space-efficient for keys with long common prefixes.
Choose TST when:
- You need balanced performance across all operations
- Your keys don't share long common prefixes
- You want a simpler implementation with predictable performance
Choose RadixTree when:
- Space efficiency is a priority
- Your keys share long common prefixes
- You prioritize lookup performance over balanced performance
## License
This project is licensed under the same license as the HeroCode project.