Port HeroLib RadixTree Package from V to Rust #2

New Issue

timur · 2025-04-09T09:41:02Z

timur commented

2025-04-09 09:41:02 +00:00

Port HeroLib RadixTree Package from V to Rust

Overview

This issue involves porting the radixtree package from the HeroLib V codebase to a Rust implementation. The radixtree package provides a persistent radix tree data structure that uses OurDB as a backend storage system. The port should maintain all existing functionality while leveraging Rust's memory safety, performance, and ecosystem.

Note: OurDB has already been ported to Rust and is available at git.ourworld.tf/herocode/db/ourdb. This implementation should be used as the storage backend for the RadixTree port.

What is a Radix Tree?

A radix tree (also known as a patricia trie or radix trie) is a space-optimized tree data structure that enables efficient string key operations. Unlike a standard trie where each node represents a single character, a radix tree compresses paths by allowing nodes to represent multiple characters (key segments).

Key characteristics:

Each node stores a segment of a key (not just a single character)
Nodes can have multiple children, each representing a different branch
Leaf nodes contain the actual values
Optimizes storage by compressing common prefixes

Current Implementation Details

Core Data Structures

The V implementation consists of these primary structures:

Node: Represents a node in the radix tree

struct Node {
mut:
    key_segment string    // The segment of the key stored at this node
    value       []u8      // Value stored at this node (empty if not a leaf)
    children    []NodeRef // References to child nodes
    is_leaf     bool      // Whether this node is a leaf node
}

NodeRef: Reference to a node in the database

struct NodeRef {
mut:
    key_part string // The key segment for this child
    node_id  u32    // Database ID of the node
}

RadixTree: The main structure representing the radix tree

@[heap]
pub struct RadixTree {
mut:
    db      &ourdb.OurDB // Database for persistent storage
    root_id u32          // Database ID of the root node
}

Key Operations

The current implementation provides these core operations:

new(): Creates a new radix tree with a specified database path
set(key, value): Sets a key-value pair in the tree
get(key): Retrieves a value by key
update(prefix, new_value): Updates the value at a given key prefix
delete(key): Removes a key from the tree
list(prefix): Lists all keys with a given prefix
getall(prefix): Gets all values for keys with a given prefix

Persistence Layer

The current implementation uses OurDB for persistence:

Each node is serialized and stored as a record in OurDB
Node references use OurDB record IDs
The tree maintains a root node ID for traversal
Node serialization includes version tracking for format evolution

The serialization format is:

[Version(1B)][KeySegment][ValueLength(2B)][Value][ChildrenCount(2B)][Children][IsLeaf(1B)]

Where each child is stored as:

[KeyPart][NodeID(4B)]

Implementation Details

Node Operations

Insertion (set)

Traverse the tree following matching prefixes
Split nodes when partial matches are found
Create new nodes for unmatched segments
Update node values and references in the database

Search (get)

Start from the root node
Follow child nodes whose key segments match the search key
Return the value if an exact match is found at a leaf node

Prefix Search (list)

Start from the root node
Find all keys that start with the given prefix
Return a list of matching keys

Deletion (delete)

Locate the node containing the key
Remove the value and leaf status
Clean up empty nodes if necessary
Update parent references

Performance Characteristics

Search: O(k) where k is the key length
Insert: O(k) for new keys, may require node splitting
Delete: O(k) plus potential node cleanup
Space: O(n) where n is the total length of all keys

Requirements for Rust Port

The Rust implementation should:

Maintain API Compatibility: Provide equivalent functionality to the V implementation
Persistence: Integrate with the existing Rust implementation of OurDB available at git.ourworld.tf/herocode/db/ourdb
Performance: Maintain or improve the performance characteristics of the V implementation
Memory Safety: Leverage Rust's ownership model for memory safety
Error Handling: Use Rust's Result type for proper error handling
Documentation: Include comprehensive documentation and examples
Testing: Port existing tests and add new ones as needed

Proposed Rust Structure

// Core data structures
pub struct Node {
    key_segment: String,
    value: Vec<u8>,
    children: Vec<NodeRef>,
    is_leaf: bool,
}

pub struct NodeRef {
    key_part: String,
    node_id: u32,
}

pub struct RadixTree {
    db: ourdb::OurDB,
    root_id: u32,
}

// Public API
impl RadixTree {
    pub fn new(path: &str, reset: bool) -> Result<Self, Error> {
        // Example implementation using OurDB
        let config = ourdb::OurDBConfig {
            path: std::path::PathBuf::from(path),
            incremental_mode: true,
            file_size: None,
            keysize: None,
        };
        
        let db = ourdb::OurDB::new(config)?;
        // Initialize root node and return RadixTree instance
        // ...
    }
    pub fn set(&mut self, key: &str, value: Vec<u8>) -> Result<(), Error> { ... }
    pub fn get(&mut self, key: &str) -> Result<Vec<u8>, Error> { ... }
    pub fn update(&mut self, prefix: &str, new_value: Vec<u8>) -> Result<(), Error> { ... }
    pub fn delete(&mut self, key: &str) -> Result<(), Error> { ... }
    pub fn list(&mut self, prefix: &str) -> Result<Vec<String>, Error> { ... }
    pub fn getall(&mut self, prefix: &str) -> Result<Vec<Vec<u8>>, Error> { ... }
}

Implementation Considerations

Storage Backend:
- Use the existing Rust implementation of OurDB from git.ourworld.tf/herocode/db/ourdb
- Add the OurDB crate as a dependency in the RadixTree package
- Leverage OurDB's API for all persistence operations
Serialization:
- Use a binary serialization format compatible with the V implementation
- Consider using serde for more flexible serialization options
Error Handling:
- Use proper Rust error types with detailed error messages
- Implement custom error types for radixtree-specific errors
Memory Management:
- Use Rust's ownership model to ensure memory safety
- Minimize cloning where possible for better performance
Concurrency:
- Consider thread safety for concurrent access
- Implement interior mutability patterns where appropriate

Use Cases

The radix tree implementation is particularly useful for:

Prefix-based searching
IP routing tables
Dictionary implementations
Auto-complete systems
File system paths
Any application requiring efficient string key operations with persistence

Acceptance Criteria

All existing functionality is properly ported
All tests pass with equivalent behavior to the V implementation
API is idiomatic Rust while maintaining functional equivalence
Performance is at least on par with the V implementation
Documentation is comprehensive and includes examples
Code follows Rust best practices and passes clippy lints

Resources

Original V implementation: github/freeflowuniverse/herolib/lib/data/radixtree
Rust documentation: https://doc.rust-lang.org/book/
Crates to consider:
- serde for serialization
- thiserror for error handling
- bytes for efficient byte manipulation

# Port HeroLib RadixTree Package from V to Rust ## Overview This issue involves porting the `radixtree` package from the HeroLib V codebase to a Rust implementation. The radixtree package provides a persistent radix tree data structure that uses OurDB as a backend storage system. The port should maintain all existing functionality while leveraging Rust's memory safety, performance, and ecosystem. **Note:** OurDB has already been ported to Rust and is available at [git.ourworld.tf/herocode/db/ourdb](https://git.ourworld.tf/herocode/db/ourdb). This implementation should be used as the storage backend for the RadixTree port. ## What is a Radix Tree? A radix tree (also known as a patricia trie or radix trie) is a space-optimized tree data structure that enables efficient string key operations. Unlike a standard trie where each node represents a single character, a radix tree compresses paths by allowing nodes to represent multiple characters (key segments). Key characteristics: - Each node stores a segment of a key (not just a single character) - Nodes can have multiple children, each representing a different branch - Leaf nodes contain the actual values - Optimizes storage by compressing common prefixes ## Current Implementation Details ### Core Data Structures The V implementation consists of these primary structures: 1. **Node**: Represents a node in the radix tree ```v struct Node { mut: key_segment string // The segment of the key stored at this node value []u8 // Value stored at this node (empty if not a leaf) children []NodeRef // References to child nodes is_leaf bool // Whether this node is a leaf node } ``` 2. **NodeRef**: Reference to a node in the database ```v struct NodeRef { mut: key_part string // The key segment for this child node_id u32 // Database ID of the node } ``` 3. **RadixTree**: The main structure representing the radix tree ```v @[heap] pub struct RadixTree { mut: db &ourdb.OurDB // Database for persistent storage root_id u32 // Database ID of the root node } ``` ### Key Operations The current implementation provides these core operations: 1. **new()**: Creates a new radix tree with a specified database path 2. **set(key, value)**: Sets a key-value pair in the tree 3. **get(key)**: Retrieves a value by key 4. **update(prefix, new_value)**: Updates the value at a given key prefix 5. **delete(key)**: Removes a key from the tree 6. **list(prefix)**: Lists all keys with a given prefix 7. **getall(prefix)**: Gets all values for keys with a given prefix ### Persistence Layer The current implementation uses OurDB for persistence: - Each node is serialized and stored as a record in OurDB - Node references use OurDB record IDs - The tree maintains a root node ID for traversal - Node serialization includes version tracking for format evolution The serialization format is: ``` [Version(1B)][KeySegment][ValueLength(2B)][Value][ChildrenCount(2B)][Children][IsLeaf(1B)] ``` Where each child is stored as: ``` [KeyPart][NodeID(4B)] ``` ## Implementation Details ### Node Operations #### Insertion (set) 1. Traverse the tree following matching prefixes 2. Split nodes when partial matches are found 3. Create new nodes for unmatched segments 4. Update node values and references in the database #### Search (get) 1. Start from the root node 2. Follow child nodes whose key segments match the search key 3. Return the value if an exact match is found at a leaf node #### Prefix Search (list) 1. Start from the root node 2. Find all keys that start with the given prefix 3. Return a list of matching keys #### Deletion (delete) 1. Locate the node containing the key 2. Remove the value and leaf status 3. Clean up empty nodes if necessary 4. Update parent references ### Performance Characteristics - Search: O(k) where k is the key length - Insert: O(k) for new keys, may require node splitting - Delete: O(k) plus potential node cleanup - Space: O(n) where n is the total length of all keys ## Requirements for Rust Port The Rust implementation should: 1. **Maintain API Compatibility**: Provide equivalent functionality to the V implementation 2. **Persistence**: Integrate with the existing Rust implementation of OurDB available at [git.ourworld.tf/herocode/db/ourdb](https://git.ourworld.tf/herocode/db/ourdb) 3. **Performance**: Maintain or improve the performance characteristics of the V implementation 4. **Memory Safety**: Leverage Rust's ownership model for memory safety 5. **Error Handling**: Use Rust's Result type for proper error handling 6. **Documentation**: Include comprehensive documentation and examples 7. **Testing**: Port existing tests and add new ones as needed ## Proposed Rust Structure ```rust // Core data structures pub struct Node { key_segment: String, value: Vec<u8>, children: Vec<NodeRef>, is_leaf: bool, } pub struct NodeRef { key_part: String, node_id: u32, } pub struct RadixTree { db: ourdb::OurDB, root_id: u32, } // Public API impl RadixTree { pub fn new(path: &str, reset: bool) -> Result<Self, Error> { // Example implementation using OurDB let config = ourdb::OurDBConfig { path: std::path::PathBuf::from(path), incremental_mode: true, file_size: None, keysize: None, }; let db = ourdb::OurDB::new(config)?; // Initialize root node and return RadixTree instance // ... } pub fn set(&mut self, key: &str, value: Vec<u8>) -> Result<(), Error> { ... } pub fn get(&mut self, key: &str) -> Result<Vec<u8>, Error> { ... } pub fn update(&mut self, prefix: &str, new_value: Vec<u8>) -> Result<(), Error> { ... } pub fn delete(&mut self, key: &str) -> Result<(), Error> { ... } pub fn list(&mut self, prefix: &str) -> Result<Vec<String>, Error> { ... } pub fn getall(&mut self, prefix: &str) -> Result<Vec<Vec<u8>>, Error> { ... } } ``` ## Implementation Considerations 1. **Storage Backend**: - Use the existing Rust implementation of OurDB from [git.ourworld.tf/herocode/db/ourdb](https://git.ourworld.tf/herocode/db/ourdb) - Add the OurDB crate as a dependency in the RadixTree package - Leverage OurDB's API for all persistence operations 2. **Serialization**: - Use a binary serialization format compatible with the V implementation - Consider using serde for more flexible serialization options 3. **Error Handling**: - Use proper Rust error types with detailed error messages - Implement custom error types for radixtree-specific errors 4. **Memory Management**: - Use Rust's ownership model to ensure memory safety - Minimize cloning where possible for better performance 5. **Concurrency**: - Consider thread safety for concurrent access - Implement interior mutability patterns where appropriate ## Use Cases The radix tree implementation is particularly useful for: - Prefix-based searching - IP routing tables - Dictionary implementations - Auto-complete systems - File system paths - Any application requiring efficient string key operations with persistence ## Acceptance Criteria 1. All existing functionality is properly ported 2. All tests pass with equivalent behavior to the V implementation 3. API is idiomatic Rust while maintaining functional equivalence 4. Performance is at least on par with the V implementation 5. Documentation is comprehensive and includes examples 6. Code follows Rust best practices and passes clippy lints ## Resources - Original V implementation: `github/freeflowuniverse/herolib/lib/data/radixtree` - Rust documentation: https://doc.rust-lang.org/book/ - Crates to consider: - serde for serialization - thiserror for error handling - bytes for efficient byte manipulation

timur self-assigned this 2025-04-09 10:02:27 +00:00

timur commented

2025-04-09 10:03:00 +00:00

architecture specified: ad29dbfd52

architecture specified: ad29dbfd521ebb991646ad817015456bf40c52ca

timur commented

2025-04-09 10:03:21 +00:00

tests and examples for test driven dev set: e62e0a125a

tests and examples for test driven dev set: https://git.ourworld.tf/herocode/db/commit/e62e0a125a23606463398268d21d4bc755557213

despiegk closed this issue

2025-04-22 08:39:04 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: herocode/db#2