db/radixtree/ARCHITECTURE.md

20 KiB

RadixTree: Architecture for V to Rust Port

1. Overview

RadixTree is a space-optimized tree data structure that enables efficient string key operations with persistent storage. This document outlines the architecture for porting the RadixTree module from its original V implementation to Rust, maintaining all existing functionality while leveraging Rust's memory safety, performance, and ecosystem.

The Rust implementation will integrate with the existing OurDB Rust implementation for persistent storage.

graph TD
    A[Client Code] --> B[RadixTree API]
    B --> C[Node Management]
    B --> D[Serialization]
    B --> E[Tree Operations]
    C --> F[OurDB]
    D --> F
    E --> C

2. Current Architecture (V Implementation)

The current V implementation of RadixTree consists of the following components:

2.1 Core Data Structures

Node

struct Node {
mut:
    key_segment string    // The segment of the key stored at this node
    value       []u8      // Value stored at this node (empty if not a leaf)
    children    []NodeRef // References to child nodes
    is_leaf     bool      // Whether this node is a leaf node
}

NodeRef

struct NodeRef {
mut:
    key_part string // The key segment for this child
    node_id  u32    // Database ID of the node
}

RadixTree

@[heap]
pub struct RadixTree {
mut:
    db      &ourdb.OurDB // Database for persistent storage
    root_id u32          // Database ID of the root node
}

2.2 Key Operations

  1. new(): Creates a new radix tree with a specified database path
  2. set(key, value): Sets a key-value pair in the tree
  3. get(key): Retrieves a value by key
  4. update(prefix, new_value): Updates the value at a given key prefix
  5. delete(key): Removes a key from the tree
  6. list(prefix): Lists all keys with a given prefix
  7. getall(prefix): Gets all values for keys with a given prefix

2.3 Serialization

The V implementation uses a custom binary serialization format for nodes:

  • Version byte (1 byte)
  • Key segment (string)
  • Value length (2 bytes) followed by value bytes
  • Children count (2 bytes) followed by children
  • Is leaf flag (1 byte)

Each child is serialized as:

  • Key part (string)
  • Node ID (4 bytes)

2.4 Integration with OurDB

The RadixTree uses OurDB for persistent storage:

  • Each node is serialized and stored as a record in OurDB
  • Node references use OurDB record IDs
  • The tree maintains a root node ID for traversal

3. Proposed Rust Architecture

The Rust implementation will maintain the same overall architecture while leveraging Rust's type system, ownership model, and error handling.

3.1 Core Data Structures

Node

pub struct Node {
    key_segment: String,
    value: Vec<u8>,
    children: Vec<NodeRef>,
    is_leaf: bool,
}

NodeRef

pub struct NodeRef {
    key_part: String,
    node_id: u32,
}

RadixTree

pub struct RadixTree {
    db: ourdb::OurDB,
    root_id: u32,
}

3.2 Public API

impl RadixTree {
    /// Creates a new radix tree with the specified database path
    pub fn new(path: &str, reset: bool) -> Result<Self, Error> {
        // Implementation
    }

    /// Sets a key-value pair in the tree
    pub fn set(&mut self, key: &str, value: Vec<u8>) -> Result<(), Error> {
        // Implementation
    }

    /// Gets a value by key from the tree
    pub fn get(&mut self, key: &str) -> Result<Vec<u8>, Error> {
        // Implementation
    }

    /// Updates the value at a given key prefix
    pub fn update(&mut self, prefix: &str, new_value: Vec<u8>) -> Result<(), Error> {
        // Implementation
    }

    /// Deletes a key from the tree
    pub fn delete(&mut self, key: &str) -> Result<(), Error> {
        // Implementation
    }

    /// Lists all keys with a given prefix
    pub fn list(&mut self, prefix: &str) -> Result<Vec<String>, Error> {
        // Implementation
    }

    /// Gets all values for keys with a given prefix
    pub fn getall(&mut self, prefix: &str) -> Result<Vec<Vec<u8>>, Error> {
        // Implementation
    }
}

3.3 Error Handling

#[derive(Debug, thiserror::Error)]
pub enum Error {
    #[error("OurDB error: {0}")]
    OurDB(#[from] ourdb::Error),
    
    #[error("Key not found: {0}")]
    KeyNotFound(String),
    
    #[error("Prefix not found: {0}")]
    PrefixNotFound(String),
    
    #[error("Serialization error: {0}")]
    Serialization(String),
    
    #[error("Deserialization error: {0}")]
    Deserialization(String),
    
    #[error("Invalid operation: {0}")]
    InvalidOperation(String),
}

3.4 Serialization

The Rust implementation will maintain the same binary serialization format for compatibility:

const VERSION: u8 = 1;

impl Node {
    /// Serializes a node to bytes for storage
    fn serialize(&self) -> Vec<u8> {
        // Implementation
    }

    /// Deserializes bytes to a node
    fn deserialize(data: &[u8]) -> Result<Self, Error> {
        // Implementation
    }
}

3.5 Integration with OurDB

The Rust implementation will use the existing OurDB Rust implementation:

impl RadixTree {
    fn get_node(&mut self, node_id: u32) -> Result<Node, Error> {
        let data = self.db.get(node_id)?;
        Node::deserialize(&data)
    }

    fn save_node(&mut self, node_id: Option<u32>, node: &Node) -> Result<u32, Error> {
        let data = node.serialize();
        let args = ourdb::OurDBSetArgs {
            id: node_id,
            data: &data,
        };
        Ok(self.db.set(args)?)
    }
}

4. Implementation Strategy

4.1 Phase 1: Core Data Structures and Serialization

  1. Implement the Node and NodeRef structs
  2. Implement serialization and deserialization functions
  3. Implement the Error enum for error handling

4.2 Phase 2: Basic Tree Operations

  1. Implement the RadixTree struct with OurDB integration
  2. Implement the new() function for creating a new tree
  3. Implement the get() and set() functions for basic operations

4.3 Phase 3: Advanced Tree Operations

  1. Implement the delete() function for removing keys
  2. Implement the update() function for updating values
  3. Implement the list() and getall() functions for prefix operations

4.4 Phase 4: Testing and Optimization

  1. Port existing tests from V to Rust
  2. Add new tests for Rust-specific functionality
  3. Benchmark and optimize performance
  4. Ensure compatibility with existing RadixTree data

5. Implementation Considerations

5.1 Memory Management

Leverage Rust's ownership model for safe and efficient memory management:

  • Use String and Vec<u8> for data buffers instead of raw pointers
  • Use references and borrows to avoid unnecessary copying
  • Implement proper RAII for resource management

5.2 Error Handling

Use Rust's Result type for comprehensive error handling:

  • Define custom error types for RadixTree-specific errors
  • Propagate errors using the ? operator
  • Provide detailed error messages
  • Implement proper error conversion using the From trait

5.3 Performance Optimizations

Identify opportunities for performance improvements:

  • Use efficient string operations for prefix matching
  • Minimize database operations by caching nodes when appropriate
  • Use iterators for efficient traversal
  • Consider using Cow<str> for string operations to avoid unnecessary cloning

5.4 Compatibility

Ensure compatibility with the V implementation:

  • Maintain the same serialization format
  • Ensure identical behavior for all operations
  • Support reading existing RadixTree data

6. Testing Strategy

6.1 Unit Tests

Write comprehensive unit tests for each component:

  • Test Node serialization/deserialization
  • Test string operations (common prefix, etc.)
  • Test error handling

6.2 Integration Tests

Write integration tests for the complete system:

  • Test basic CRUD operations
  • Test prefix operations
  • Test edge cases (empty keys, very long keys, etc.)
  • Test with large datasets

6.3 Compatibility Tests

Ensure compatibility with existing RadixTree data:

  • Test reading existing V-created RadixTree data
  • Test writing data that can be read by the V implementation

6.4 Performance Tests

Benchmark performance against the V implementation:

  • Measure throughput for set/get operations
  • Measure latency for different operations
  • Test with different tree sizes and key distributions

7. Project Structure

radixtree/
├── Cargo.toml
├── src/
│   ├── lib.rs           # Public API and re-exports
│   ├── node.rs          # Node and NodeRef implementations
│   ├── serialize.rs     # Serialization and deserialization
│   ├── error.rs         # Error types
│   └── operations.rs    # Tree operations implementation
├── tests/
│   ├── basic_test.rs    # Basic operations tests
│   ├── prefix_test.rs   # Prefix operations tests
│   └── edge_cases.rs    # Edge case tests
└── examples/
    ├── basic.rs         # Basic usage example
    ├── prefix.rs        # Prefix operations example
    └── performance.rs   # Performance benchmark

8. Dependencies

The Rust implementation will use the following dependencies:

  • ourdb for persistent storage
  • thiserror for error handling
  • log for logging
  • criterion for benchmarking (dev dependency)

9. Compatibility Considerations

To ensure compatibility with the V implementation:

  1. Maintain the same serialization format for nodes
  2. Ensure identical behavior for all operations
  3. Support reading existing RadixTree data
  4. Maintain the same performance characteristics

10. Future Extensions

Potential future extensions to consider:

  1. Async API for non-blocking operations
  2. Iterator interface for efficient traversal
  3. Batch operations for improved performance
  4. Custom serialization formats for specific use cases
  5. Compression support for values
  6. Concurrency support for parallel operations

11. Conclusion

This architecture provides a roadmap for porting RadixTree from V to Rust while maintaining compatibility and leveraging Rust's strengths. The implementation will follow a phased approach, starting with core data structures and gradually building up to the complete system.

The Rust implementation aims to be:

  • Safe: Leveraging Rust's ownership model for memory safety
  • Fast: Maintaining or improving performance compared to V
  • Compatible: Working with existing RadixTree data
  • Extensible: Providing a foundation for future enhancements
  • Well-tested: Including comprehensive test coverage

12. Implementation Files

12.1 Cargo.toml

[package]
name = "radixtree"
version = "0.1.0"
edition = "2021"
description = "A persistent radix tree implementation using OurDB for storage"
authors = ["OurWorld Team"]

[dependencies]
ourdb = { path = "../ourdb" }
thiserror = "1.0.40"
log = "0.4.17"

[dev-dependencies]
criterion = "0.5.1"

[[bench]]
name = "radixtree_benchmarks"
harness = false

[[example]]
name = "basic_usage"
path = "examples/basic_usage.rs"

[[example]]
name = "prefix_operations"
path = "examples/prefix_operations.rs"

12.2 src/lib.rs

//! RadixTree is a space-optimized tree data structure that enables efficient string key operations
//! with persistent storage using OurDB as a backend.
//!
//! This implementation provides a persistent radix tree that can be used for efficient
//! prefix-based key operations, such as auto-complete, routing tables, and more.

mod error;
mod node;
mod operations;
mod serialize;

pub use error::Error;
pub use node::{Node, NodeRef};

use ourdb::{OurDB, OurDBConfig, OurDBSetArgs};
use std::path::PathBuf;

/// RadixTree represents a radix tree data structure with persistent storage.
pub struct RadixTree {
    db: OurDB,
    root_id: u32,
}

impl RadixTree {
    /// Creates a new radix tree with the specified database path.
    ///
    /// # Arguments
    ///
    /// * `path` - The path to the database directory
    /// * `reset` - Whether to reset the database if it exists
    ///
    /// # Returns
    ///
    /// A new `RadixTree` instance
    ///
    /// # Errors
    ///
    /// Returns an error if the database cannot be created or opened
    pub fn new(path: &str, reset: bool) -> Result<Self, Error> {
        // Implementation will go here
        unimplemented!()
    }

    /// Sets a key-value pair in the tree.
    ///
    /// # Arguments
    ///
    /// * `key` - The key to set
    /// * `value` - The value to set
    ///
    /// # Errors
    ///
    /// Returns an error if the operation fails
    pub fn set(&mut self, key: &str, value: Vec<u8>) -> Result<(), Error> {
        // Implementation will go here
        unimplemented!()
    }

    /// Gets a value by key from the tree.
    ///
    /// # Arguments
    ///
    /// * `key` - The key to get
    ///
    /// # Returns
    ///
    /// The value associated with the key
    ///
    /// # Errors
    ///
    /// Returns an error if the key is not found or the operation fails
    pub fn get(&mut self, key: &str) -> Result<Vec<u8>, Error> {
        // Implementation will go here
        unimplemented!()
    }

    /// Updates the value at a given key prefix.
    ///
    /// # Arguments
    ///
    /// * `prefix` - The key prefix to update
    /// * `new_value` - The new value to set
    ///
    /// # Errors
    ///
    /// Returns an error if the prefix is not found or the operation fails
    pub fn update(&mut self, prefix: &str, new_value: Vec<u8>) -> Result<(), Error> {
        // Implementation will go here
        unimplemented!()
    }

    /// Deletes a key from the tree.
    ///
    /// # Arguments
    ///
    /// * `key` - The key to delete
    ///
    /// # Errors
    ///
    /// Returns an error if the key is not found or the operation fails
    pub fn delete(&mut self, key: &str) -> Result<(), Error> {
        // Implementation will go here
        unimplemented!()
    }

    /// Lists all keys with a given prefix.
    ///
    /// # Arguments
    ///
    /// * `prefix` - The prefix to search for
    ///
    /// # Returns
    ///
    /// A list of keys that start with the given prefix
    ///
    /// # Errors
    ///
    /// Returns an error if the operation fails
    pub fn list(&mut self, prefix: &str) -> Result<Vec<String>, Error> {
        // Implementation will go here
        unimplemented!()
    }

    /// Gets all values for keys with a given prefix.
    ///
    /// # Arguments
    ///
    /// * `prefix` - The prefix to search for
    ///
    /// # Returns
    ///
    /// A list of values for keys that start with the given prefix
    ///
    /// # Errors
    ///
    /// Returns an error if the operation fails
    pub fn getall(&mut self, prefix: &str) -> Result<Vec<Vec<u8>>, Error> {
        // Implementation will go here
        unimplemented!()
    }
}

12.3 src/error.rs

//! Error types for the RadixTree module.

use thiserror::Error;

/// Error type for RadixTree operations.
#[derive(Debug, Error)]
pub enum Error {
    /// Error from OurDB operations.
    #[error("OurDB error: {0}")]
    OurDB(#[from] ourdb::Error),
    
    /// Error when a key is not found.
    #[error("Key not found: {0}")]
    KeyNotFound(String),
    
    /// Error when a prefix is not found.
    #[error("Prefix not found: {0}")]
    PrefixNotFound(String),
    
    /// Error during serialization.
    #[error("Serialization error: {0}")]
    Serialization(String),
    
    /// Error during deserialization.
    #[error("Deserialization error: {0}")]
    Deserialization(String),
    
    /// Error for invalid operations.
    #[error("Invalid operation: {0}")]
    InvalidOperation(String),
}

12.4 src/node.rs

//! Node types for the RadixTree module.

/// Represents a node in the radix tree.
pub struct Node {
    /// The segment of the key stored at this node.
    pub key_segment: String,
    
    /// Value stored at this node (empty if not a leaf).
    pub value: Vec<u8>,
    
    /// References to child nodes.
    pub children: Vec<NodeRef>,
    
    /// Whether this node is a leaf node.
    pub is_leaf: bool,
}

/// Reference to a node in the database.
pub struct NodeRef {
    /// The key segment for this child.
    pub key_part: String,
    
    /// Database ID of the node.
    pub node_id: u32,
}

impl Node {
    /// Creates a new node.
    pub fn new(key_segment: String, value: Vec<u8>, is_leaf: bool) -> Self {
        Self {
            key_segment,
            value,
            children: Vec::new(),
            is_leaf,
        }
    }
    
    /// Creates a new root node.
    pub fn new_root() -> Self {
        Self {
            key_segment: String::new(),
            value: Vec::new(),
            children: Vec::new(),
            is_leaf: false,
        }
    }
}

impl NodeRef {
    /// Creates a new node reference.
    pub fn new(key_part: String, node_id: u32) -> Self {
        Self {
            key_part,
            node_id,
        }
    }
}

12.5 src/serialize.rs

//! Serialization and deserialization for RadixTree nodes.

use crate::error::Error;
use crate::node::{Node, NodeRef};

/// Current binary format version.
const VERSION: u8 = 1;

impl Node {
    /// Serializes a node to bytes for storage.
    pub fn serialize(&self) -> Vec<u8> {
        // Implementation will go here
        unimplemented!()
    }

    /// Deserializes bytes to a node.
    pub fn deserialize(data: &[u8]) -> Result<Self, Error> {
        // Implementation will go here
        unimplemented!()
    }
}

12.6 src/operations.rs

//! Implementation of RadixTree operations.

use crate::error::Error;
use crate::node::{Node, NodeRef};
use crate::RadixTree;

impl RadixTree {
    /// Helper function to get a node from the database.
    pub(crate) fn get_node(&mut self, node_id: u32) -> Result<Node, Error> {
        // Implementation will go here
        unimplemented!()
    }

    /// Helper function to save a node to the database.
    pub(crate) fn save_node(&mut self, node_id: Option<u32>, node: &Node) -> Result<u32, Error> {
        // Implementation will go here
        unimplemented!()
    }

    /// Helper function to find all keys with a given prefix.
    fn find_keys_with_prefix(
        &mut self,
        node_id: u32,
        current_path: &str,
        prefix: &str,
        result: &mut Vec<String>,
    ) -> Result<(), Error> {
        // Implementation will go here
        unimplemented!()
    }

    /// Helper function to recursively collect all keys under a node.
    fn collect_all_keys(
        &mut self,
        node_id: u32,
        current_path: &str,
        result: &mut Vec<String>,
    ) -> Result<(), Error> {
        // Implementation will go here
        unimplemented!()
    }

    /// Helper function to get the common prefix of two strings.
    fn get_common_prefix(a: &str, b: &str) -> String {
        // Implementation will go here
        unimplemented!()
    }
}

12.7 examples/basic_usage.rs

//! Basic usage example for RadixTree.

use radixtree::RadixTree;

fn main() -> Result<(), radixtree::Error> {
    // Create a temporary directory for the database
    let db_path = std::env::temp_dir().join("radixtree_example");
    std::fs::create_dir_all(&db_path)?;
    
    println!("Creating radix tree at: {}", db_path.display());
    
    // Create a new radix tree
    let mut tree = RadixTree::new(db_path.to_str().unwrap(), true)?;
    
    // Store some data
    tree.set("hello", b"world".to_vec())?;
    tree.set("help", b"me".to_vec())?;
    tree.set("helicopter", b"flying".to_vec())?;
    
    // Retrieve and print the data
    let value = tree.get("hello")?;
    println!("hello: {}", String::from_utf8_lossy(&value));
    
    // List keys with prefix
    let keys = tree.list("hel")?;
    println!("Keys with prefix 'hel': {:?}", keys);
    
    // Get all values with prefix
    let values = tree.getall("hel")?;
    println!("Values with prefix 'hel':");
    for (i, value) in values.iter().enumerate() {
        println!("  {}: {}", i, String::from_utf8_lossy(value));
    }
    
    // Delete a key
    tree.delete("help")?;
    println!("Deleted 'help'");
    
    // Verify deletion
    let keys_after = tree.list("hel")?;
    println!("Keys with prefix 'hel' after deletion: {:?}", keys_after);
    
    // Clean up (optional)
    if std::env::var("KEEP_DB").is_err() {
        std::fs::remove_dir_all(&db_path)?;
        println!("Cleaned up database directory");
    } else {
        println!("Database kept at: {}", db_path.display());
    }
    
    Ok(())
}