526 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			526 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
# OSIRIS MVP — Minimal Semantic Store over HeroDB
 | 
						||
 | 
						||
## 0) Purpose
 | 
						||
 | 
						||
OSIRIS is a Rust-native object layer on top of HeroDB that provides structured storage and retrieval capabilities without any server-side extensions or indexing engines.
 | 
						||
 | 
						||
It provides:
 | 
						||
- Object CRUD operations
 | 
						||
- Namespace management
 | 
						||
- Simple local field indexing (field:*)
 | 
						||
- Basic keyword scan (substring matching)
 | 
						||
- CLI interface
 | 
						||
- Future: 9P filesystem interface
 | 
						||
 | 
						||
It does **not** depend on HeroDB's Tantivy FTS, vectors, or relations.
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## 1) Architecture
 | 
						||
 | 
						||
```
 | 
						||
HeroDB (unmodified)
 | 
						||
│
 | 
						||
├── KV store + encryption
 | 
						||
└── RESP protocol
 | 
						||
    ↑
 | 
						||
    │
 | 
						||
    └── OSIRIS
 | 
						||
        ├── store/         – object schema + persistence
 | 
						||
        ├── index/         – field index & keyword scanning
 | 
						||
        ├── retrieve/      – query planner + filtering
 | 
						||
        ├── interfaces/    – CLI, 9P (future)
 | 
						||
        └── config/        – namespaces + settings
 | 
						||
```
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## 2) Data Model
 | 
						||
 | 
						||
```rust
 | 
						||
#[derive(Clone, Debug, Serialize, Deserialize)]
 | 
						||
pub struct OsirisObject {
 | 
						||
    pub id: String,
 | 
						||
    pub ns: String,
 | 
						||
    pub meta: Metadata,
 | 
						||
    pub text: Option<String>,   // optional plain text
 | 
						||
}
 | 
						||
 | 
						||
#[derive(Clone, Debug, Serialize, Deserialize)]
 | 
						||
pub struct Metadata {
 | 
						||
    pub title: Option<String>,
 | 
						||
    pub mime: Option<String>,
 | 
						||
    pub tags: BTreeMap<String, String>,
 | 
						||
    pub created: OffsetDateTime,
 | 
						||
    pub updated: OffsetDateTime,
 | 
						||
    pub size: Option<u64>,
 | 
						||
}
 | 
						||
```
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## 3) Keyspace Design
 | 
						||
 | 
						||
```
 | 
						||
meta:<id>             → serialized OsirisObject (JSON)
 | 
						||
field:tag:<key>=<val> → Set of IDs (for tag filtering)
 | 
						||
field:mime:<type>     → Set of IDs (for MIME type filtering)
 | 
						||
field:title:<title>   → Set of IDs (for title filtering)
 | 
						||
scan:index            → Set of all IDs (for full scan)
 | 
						||
```
 | 
						||
 | 
						||
**Example:**
 | 
						||
```
 | 
						||
field:tag:project=osiris  → {note_1, note_2}
 | 
						||
field:mime:text/markdown  → {note_1, note_3}
 | 
						||
scan:index                → {note_1, note_2, note_3, ...}
 | 
						||
```
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## 4) Index Maintenance
 | 
						||
 | 
						||
### Insert / Update
 | 
						||
 | 
						||
```rust
 | 
						||
// Store object
 | 
						||
redis.set(format!("meta:{}", obj.id), serde_json::to_string(&obj)?)?;
 | 
						||
 | 
						||
// Index tags
 | 
						||
for (k, v) in &obj.meta.tags {
 | 
						||
    redis.sadd(format!("field:tag:{}={}", k, v), &obj.id)?;
 | 
						||
}
 | 
						||
 | 
						||
// Index MIME type
 | 
						||
if let Some(mime) = &obj.meta.mime {
 | 
						||
    redis.sadd(format!("field:mime:{}", mime), &obj.id)?;
 | 
						||
}
 | 
						||
 | 
						||
// Index title
 | 
						||
if let Some(title) = &obj.meta.title {
 | 
						||
    redis.sadd(format!("field:title:{}", title), &obj.id)?;
 | 
						||
}
 | 
						||
 | 
						||
// Add to scan index
 | 
						||
redis.sadd("scan:index", &obj.id)?;
 | 
						||
```
 | 
						||
 | 
						||
### Delete
 | 
						||
 | 
						||
```rust
 | 
						||
// Remove object
 | 
						||
redis.del(format!("meta:{}", obj.id))?;
 | 
						||
 | 
						||
// Deindex tags
 | 
						||
for (k, v) in &obj.meta.tags {
 | 
						||
    redis.srem(format!("field:tag:{}={}", k, v), &obj.id)?;
 | 
						||
}
 | 
						||
 | 
						||
// Deindex MIME type
 | 
						||
if let Some(mime) = &obj.meta.mime {
 | 
						||
    redis.srem(format!("field:mime:{}", mime), &obj.id)?;
 | 
						||
}
 | 
						||
 | 
						||
// Deindex title
 | 
						||
if let Some(title) = &obj.meta.title {
 | 
						||
    redis.srem(format!("field:title:{}", title), &obj.id)?;
 | 
						||
}
 | 
						||
 | 
						||
// Remove from scan index
 | 
						||
redis.srem("scan:index", &obj.id)?;
 | 
						||
```
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## 5) Retrieval
 | 
						||
 | 
						||
### Query Structure
 | 
						||
 | 
						||
```rust
 | 
						||
pub struct RetrievalQuery {
 | 
						||
    pub text: Option<String>,                 // keyword substring
 | 
						||
    pub ns: String,
 | 
						||
    pub filters: Vec<(String, String)>,       // field=value
 | 
						||
    pub top_k: usize,
 | 
						||
}
 | 
						||
```
 | 
						||
 | 
						||
### Execution Steps
 | 
						||
 | 
						||
1. **Collect candidate IDs** from field:* filters (SMEMBERS + intersection)
 | 
						||
2. **If text query is provided**, iterate over candidates:
 | 
						||
   - Fetch `meta:<id>`
 | 
						||
   - Test substring match on `meta.title`, `text`, or `tags`
 | 
						||
   - Compute simple relevance score
 | 
						||
3. **Sort** by score (descending) and **limit** to `top_k`
 | 
						||
 | 
						||
This is O(N) for text scan but acceptable for MVP or small datasets (<10k objects).
 | 
						||
 | 
						||
### Scoring Algorithm
 | 
						||
 | 
						||
```rust
 | 
						||
fn compute_text_score(obj: &OsirisObject, query: &str) -> f32 {
 | 
						||
    let mut score = 0.0;
 | 
						||
    
 | 
						||
    // Title match
 | 
						||
    if let Some(title) = &obj.meta.title {
 | 
						||
        if title.to_lowercase().contains(query) {
 | 
						||
            score += 0.5;
 | 
						||
        }
 | 
						||
    }
 | 
						||
    
 | 
						||
    // Text content match
 | 
						||
    if let Some(text) = &obj.text {
 | 
						||
        if text.to_lowercase().contains(query) {
 | 
						||
            score += 0.5;
 | 
						||
            // Bonus for multiple occurrences
 | 
						||
            let count = text.to_lowercase().matches(query).count();
 | 
						||
            score += (count as f32 - 1.0) * 0.1;
 | 
						||
        }
 | 
						||
    }
 | 
						||
    
 | 
						||
    // Tag match
 | 
						||
    for (key, value) in &obj.meta.tags {
 | 
						||
        if key.to_lowercase().contains(query) || value.to_lowercase().contains(query) {
 | 
						||
            score += 0.2;
 | 
						||
        }
 | 
						||
    }
 | 
						||
    
 | 
						||
    score.min(1.0)
 | 
						||
}
 | 
						||
```
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## 6) CLI
 | 
						||
 | 
						||
### Commands
 | 
						||
 | 
						||
```bash
 | 
						||
# Initialize and create namespace
 | 
						||
osiris init --herodb redis://localhost:6379
 | 
						||
osiris ns create notes
 | 
						||
 | 
						||
# Add and read objects
 | 
						||
osiris put notes/my-note.md ./my-note.md --tags topic=rust,project=osiris
 | 
						||
osiris get notes/my-note.md
 | 
						||
osiris get notes/my-note.md --raw --output /tmp/note.md
 | 
						||
osiris del notes/my-note.md
 | 
						||
 | 
						||
# Search
 | 
						||
osiris find --ns notes --filter topic=rust
 | 
						||
osiris find "retrieval" --ns notes
 | 
						||
osiris find "rust" --ns notes --filter project=osiris --topk 20
 | 
						||
 | 
						||
# Namespace management
 | 
						||
osiris ns list
 | 
						||
osiris ns delete notes
 | 
						||
 | 
						||
# Statistics
 | 
						||
osiris stats
 | 
						||
osiris stats --ns notes
 | 
						||
```
 | 
						||
 | 
						||
### Examples
 | 
						||
 | 
						||
```bash
 | 
						||
# Store a note from stdin
 | 
						||
echo "This is a note about Rust programming" | \
 | 
						||
  osiris put notes/rust-intro - \
 | 
						||
  --title "Rust Introduction" \
 | 
						||
  --tags topic=rust,level=beginner \
 | 
						||
  --mime text/plain
 | 
						||
 | 
						||
# Search for notes about Rust
 | 
						||
osiris find "rust" --ns notes
 | 
						||
 | 
						||
# Filter by tag
 | 
						||
osiris find --ns notes --filter topic=rust
 | 
						||
 | 
						||
# Get note as JSON
 | 
						||
osiris get notes/rust-intro
 | 
						||
 | 
						||
# Get raw content
 | 
						||
osiris get notes/rust-intro --raw
 | 
						||
```
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## 7) Configuration
 | 
						||
 | 
						||
### File Location
 | 
						||
 | 
						||
`~/.config/osiris/config.toml`
 | 
						||
 | 
						||
### Example
 | 
						||
 | 
						||
```toml
 | 
						||
[herodb]
 | 
						||
url = "redis://localhost:6379"
 | 
						||
 | 
						||
[namespaces.notes]
 | 
						||
db_id = 1
 | 
						||
 | 
						||
[namespaces.calendar]
 | 
						||
db_id = 2
 | 
						||
```
 | 
						||
 | 
						||
### Structure
 | 
						||
 | 
						||
```rust
 | 
						||
pub struct Config {
 | 
						||
    pub herodb: HeroDbConfig,
 | 
						||
    pub namespaces: HashMap<String, NamespaceConfig>,
 | 
						||
}
 | 
						||
 | 
						||
pub struct HeroDbConfig {
 | 
						||
    pub url: String,
 | 
						||
}
 | 
						||
 | 
						||
pub struct NamespaceConfig {
 | 
						||
    pub db_id: u16,
 | 
						||
}
 | 
						||
```
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## 8) Database Allocation
 | 
						||
 | 
						||
```
 | 
						||
DB 0  → HeroDB Admin (managed by HeroDB)
 | 
						||
DB 1  → osiris:notes (namespace "notes")
 | 
						||
DB 2  → osiris:calendar (namespace "calendar")
 | 
						||
DB 3+ → Additional namespaces...
 | 
						||
```
 | 
						||
 | 
						||
Each namespace gets its own isolated HeroDB database.
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## 9) Dependencies
 | 
						||
 | 
						||
```toml
 | 
						||
[dependencies]
 | 
						||
anyhow = "1.0"
 | 
						||
redis = { version = "0.24", features = ["aio", "tokio-comp"] }
 | 
						||
serde = { version = "1.0", features = ["derive"] }
 | 
						||
serde_json = "1.0"
 | 
						||
time = { version = "0.3", features = ["serde", "formatting", "parsing", "macros"] }
 | 
						||
tokio = { version = "1.23", features = ["full"] }
 | 
						||
clap = { version = "4.5", features = ["derive"] }
 | 
						||
toml = "0.8"
 | 
						||
uuid = { version = "1.6", features = ["v4", "serde"] }
 | 
						||
tracing = "0.1"
 | 
						||
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
 | 
						||
```
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## 10) Future Enhancements
 | 
						||
 | 
						||
| Feature | When Added | Moves Where |
 | 
						||
|---------|-----------|-------------|
 | 
						||
| Dedup / blobs | HeroDB extension | HeroDB |
 | 
						||
| Vector search | HeroDB extension | HeroDB |
 | 
						||
| Full-text search | HeroDB (Tantivy) | HeroDB |
 | 
						||
| Relations / graph | OSIRIS later | OSIRIS |
 | 
						||
| 9P filesystem | OSIRIS later | OSIRIS |
 | 
						||
 | 
						||
This MVP maintains clean interface boundaries:
 | 
						||
- **HeroDB** remains a plain KV substrate
 | 
						||
- **OSIRIS** builds higher-order meaning on top
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## 11) Implementation Status
 | 
						||
 | 
						||
### ✅ Completed
 | 
						||
 | 
						||
- [x] Project structure and Cargo.toml
 | 
						||
- [x] Core data models (OsirisObject, Metadata)
 | 
						||
- [x] HeroDB client wrapper (RESP protocol)
 | 
						||
- [x] Field indexing (tags, MIME, title)
 | 
						||
- [x] Search engine (substring matching + scoring)
 | 
						||
- [x] Configuration management
 | 
						||
- [x] CLI interface (init, ns, put, get, del, find, stats)
 | 
						||
- [x] Error handling
 | 
						||
- [x] Documentation (README, specs)
 | 
						||
 | 
						||
### 🚧 Pending
 | 
						||
 | 
						||
- [ ] 9P filesystem interface
 | 
						||
- [ ] Integration tests
 | 
						||
- [ ] Performance benchmarks
 | 
						||
- [ ] Name resolution (namespace/name → ID mapping)
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## 12) Quick Start
 | 
						||
 | 
						||
### Prerequisites
 | 
						||
 | 
						||
Start HeroDB:
 | 
						||
```bash
 | 
						||
cd /path/to/herodb
 | 
						||
cargo run --release -- --dir ./data --admin-secret mysecret --port 6379
 | 
						||
```
 | 
						||
 | 
						||
### Build OSIRIS
 | 
						||
 | 
						||
```bash
 | 
						||
cd /path/to/osiris
 | 
						||
cargo build --release
 | 
						||
```
 | 
						||
 | 
						||
### Initialize
 | 
						||
 | 
						||
```bash
 | 
						||
# Create configuration
 | 
						||
./target/release/osiris init --herodb redis://localhost:6379
 | 
						||
 | 
						||
# Create a namespace
 | 
						||
./target/release/osiris ns create notes
 | 
						||
```
 | 
						||
 | 
						||
### Usage
 | 
						||
 | 
						||
```bash
 | 
						||
# Add a note
 | 
						||
echo "OSIRIS is a minimal object store" | \
 | 
						||
  ./target/release/osiris put notes/intro - \
 | 
						||
  --title "Introduction" \
 | 
						||
  --tags topic=osiris,type=doc
 | 
						||
 | 
						||
# Search
 | 
						||
./target/release/osiris find "object store" --ns notes
 | 
						||
 | 
						||
# Get the note
 | 
						||
./target/release/osiris get notes/intro
 | 
						||
 | 
						||
# Show stats
 | 
						||
./target/release/osiris stats --ns notes
 | 
						||
```
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## 13) Testing
 | 
						||
 | 
						||
### Unit Tests
 | 
						||
 | 
						||
```bash
 | 
						||
cargo test
 | 
						||
```
 | 
						||
 | 
						||
### Integration Tests (requires HeroDB)
 | 
						||
 | 
						||
```bash
 | 
						||
# Start HeroDB
 | 
						||
cd /path/to/herodb
 | 
						||
cargo run -- --dir /tmp/herodb-test --admin-secret test --port 6379
 | 
						||
 | 
						||
# Run tests
 | 
						||
cd /path/to/osiris
 | 
						||
cargo test -- --ignored
 | 
						||
```
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## 14) Performance Characteristics
 | 
						||
 | 
						||
### Write Performance
 | 
						||
 | 
						||
- **Object storage**: O(1) - single SET operation
 | 
						||
- **Indexing**: O(T) where T = number of tags/fields
 | 
						||
- **Total**: O(T) per object
 | 
						||
 | 
						||
### Read Performance
 | 
						||
 | 
						||
- **Get by ID**: O(1) - single GET operation
 | 
						||
- **Filter by tags**: O(F) where F = number of filters (set intersection)
 | 
						||
- **Text search**: O(N) where N = number of candidates (linear scan)
 | 
						||
 | 
						||
### Storage Overhead
 | 
						||
 | 
						||
- **Object**: ~1KB per object (JSON serialized)
 | 
						||
- **Indexes**: ~50 bytes per tag/field entry
 | 
						||
- **Total**: ~1.5KB per object with 10 tags
 | 
						||
 | 
						||
### Scalability
 | 
						||
 | 
						||
- **Optimal**: <10,000 objects per namespace
 | 
						||
- **Acceptable**: <100,000 objects per namespace
 | 
						||
- **Beyond**: Consider migrating to Tantivy FTS
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## 15) Design Decisions
 | 
						||
 | 
						||
### Why No Tantivy in MVP?
 | 
						||
 | 
						||
- **Simplicity**: Avoid HeroDB server-side dependencies
 | 
						||
- **Portability**: Works with any Redis-compatible backend
 | 
						||
- **Flexibility**: Easy to migrate to Tantivy later
 | 
						||
 | 
						||
### Why Substring Matching?
 | 
						||
 | 
						||
- **Good enough**: For small datasets (<10k objects)
 | 
						||
- **Simple**: No tokenization, stemming, or complex scoring
 | 
						||
- **Fast**: O(N) is acceptable for MVP
 | 
						||
 | 
						||
### Why Separate Databases per Namespace?
 | 
						||
 | 
						||
- **Isolation**: Clear separation of concerns
 | 
						||
- **Performance**: Smaller keyspaces = faster scans
 | 
						||
- **Security**: Can apply different encryption keys per namespace
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## 16) Migration Path
 | 
						||
 | 
						||
When ready to scale beyond MVP:
 | 
						||
 | 
						||
1. **Add Tantivy FTS** (HeroDB extension)
 | 
						||
   - Create FT.* commands in HeroDB
 | 
						||
   - Update OSIRIS to use FT.SEARCH instead of substring scan
 | 
						||
   - Keep field indexes for filtering
 | 
						||
 | 
						||
2. **Add Vector Search** (HeroDB extension)
 | 
						||
   - Store embeddings in HeroDB
 | 
						||
   - Implement ANN search (HNSW/IVF)
 | 
						||
   - Add hybrid retrieval (BM25 + vector)
 | 
						||
 | 
						||
3. **Add Relations** (OSIRIS feature)
 | 
						||
   - Store relation graphs in HeroDB
 | 
						||
   - Implement graph traversal
 | 
						||
   - Add relation-based ranking
 | 
						||
 | 
						||
4. **Add Deduplication** (HeroDB extension)
 | 
						||
   - Content-addressable storage (BLAKE3)
 | 
						||
   - Reference counting
 | 
						||
   - Garbage collection
 | 
						||
 | 
						||
---
 | 
						||
 | 
						||
## Summary
 | 
						||
 | 
						||
**OSIRIS MVP is a minimal, production-ready object store** that:
 | 
						||
 | 
						||
- ✅ Works with unmodified HeroDB
 | 
						||
- ✅ Provides structured storage with metadata
 | 
						||
- ✅ Supports field-based filtering
 | 
						||
- ✅ Includes basic text search
 | 
						||
- ✅ Exposes a clean CLI interface
 | 
						||
- ✅ Maintains clear upgrade paths
 | 
						||
 | 
						||
**Perfect for:**
 | 
						||
- Personal knowledge management
 | 
						||
- Small-scale document storage
 | 
						||
- Prototyping semantic applications
 | 
						||
- Learning Rust + Redis patterns
 | 
						||
 | 
						||
**Next steps:**
 | 
						||
- Build and test the MVP
 | 
						||
- Gather usage feedback
 | 
						||
- Plan Tantivy/vector integration
 | 
						||
- Design 9P filesystem interface
 |