db/tst/README.md

4.7 KiB

Ternary Search Tree (TST)

A persistent ternary search tree implementation in Rust using OurDB for storage.

Overview

TST is a space-optimized tree data structure that enables efficient string key operations with persistent storage. This implementation provides a persistent ternary search tree that can be used for efficient string key operations, such as auto-complete, routing tables, and more.

A ternary search tree is a type of trie where each node has three children: left, middle, and right. Unlike a radix tree which compresses common prefixes, a TST stores one character per node and uses a binary search tree-like structure for efficient traversal.

Key characteristics:

  • Each node stores a single character
  • Nodes have three children: left (for characters < current), middle (for next character in key), and right (for characters > current)
  • Leaf nodes contain the actual values
  • Balanced structure for consistent performance across operations

Features

  • Efficient string key operations
  • Persistent storage using OurDB backend
  • Balanced tree structure for consistent performance
  • Support for binary values
  • Thread-safe operations through OurDB

Usage

Add the dependency to your Cargo.toml:

[dependencies]
tst = { path = "../tst" }

Basic Example

use tst::TST;

fn main() -> Result<(), tst::Error> {
    // Create a new ternary search tree
    let mut tree = TST::new("/tmp/tst", false)?;
    
    // Set key-value pairs
    tree.set("hello", b"world".to_vec())?;
    tree.set("help", b"me".to_vec())?;
    
    // Get values by key
    let value = tree.get("hello")?;
    println!("hello: {}", String::from_utf8_lossy(&value)); // Prints: world
    
    // List keys by prefix
    let keys = tree.list("hel")?; // Returns ["hello", "help"]
    println!("Keys with prefix 'hel': {:?}", keys);
    
    // Get all values by prefix
    let values = tree.getall("hel")?; // Returns [b"world", b"me"]
    
    // Delete keys
    tree.delete("help")?;
    
    Ok(())
}

API

Creating a TST

// Create a new ternary search tree
let mut tree = TST::new("/tmp/tst", false)?;

// Create a new ternary search tree and reset if it exists
let mut tree = TST::new("/tmp/tst", true)?;

Setting Values

// Set a key-value pair
tree.set("key", b"value".to_vec())?;

Getting Values

// Get a value by key
let value = tree.get("key")?;

Deleting Keys

// Delete a key
tree.delete("key")?;

Listing Keys by Prefix

// List all keys with a given prefix
let keys = tree.list("prefix")?;

Getting All Values by Prefix

// Get all values for keys with a given prefix
let values = tree.getall("prefix")?;

Performance Characteristics

  • Search: O(k) where k is the key length
  • Insert: O(k) for new keys
  • Delete: O(k) plus potential node cleanup
  • Space: O(n) where n is the total number of nodes

Use Cases

TST is particularly useful for:

  • Prefix-based searching
  • Auto-complete systems
  • Dictionary implementations
  • Spell checking
  • Any application requiring efficient string key operations with persistence

Implementation Details

The TST implementation uses OurDB for persistent storage:

  • Each node is serialized and stored as a record in OurDB
  • Node references use OurDB record IDs
  • The tree maintains a root node ID for traversal
  • Node serialization includes version tracking for format evolution

Running Tests

The project includes a comprehensive test suite that verifies all functionality:

cd ~/code/git.threefold.info/herocode/db/tst
# Run all tests
cargo test

# Run specific test file
cargo test --test basic_test
cargo test --test prefix_test

Running Examples

The project includes example applications that demonstrate how to use the TST:

# Run the basic usage example
cargo run --example basic_usage

# Run the prefix operations example
cargo run --example prefix_ops

# Run the performance test
cargo run --example performance

Comparison with RadixTree

While both TST and RadixTree provide efficient string key operations, they have different characteristics:

  • TST: Stores one character per node, with a balanced structure for consistent performance across operations.
  • RadixTree: Compresses common prefixes, which can be more space-efficient for keys with long common prefixes.

Choose TST when:

  • You need balanced performance across all operations
  • Your keys don't share long common prefixes
  • You want a simpler implementation with predictable performance

Choose RadixTree when:

  • Space efficiency is a priority
  • Your keys share long common prefixes
  • You prioritize lookup performance over balanced performance

License

This project is licensed under the same license as the HeroCode project.