lancedb_impl #15
Reference in New Issue
Block a user
No description provided.
Delete Branch "lancedb_impl"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Implemented a new vector database backend called
lance
. The model is currently not multi-model although features have been put in place to support image processing in the future. LanceDB is a search-only backend, meaning it does not support traditional Redis-like KV commands. All LanceDB-related commands start withLANCE.<command>
.A vector database requires a vector-embedding model. This can be configured separately per-database instance. The user will have to specify which vector-embedding provider he wants to use, which is done using the
LANCE.EMBEDDING
command. For this we have created theEmbedder
(trait for text embedding) andImageEmbedder
(trait for image embedding), alongside theTestHashEmbedder
andTestImageHashEmbedder
. The latter two are deterministic, offline embedders which can be used for testing purposes. There is a per-database embedding configuration, stored as a JSON sidecar file at<base_dir>/lance/<db_id>/<dataset>.lance.embedding.json
. More information and a full end-to-end workflow can be found in thelance.md
documentation file in thedocs
directory.To search inside the
lance
database, each time the user provides aLANCE.SEARCH
command, we find the K most similar vectors to a query vector based on distance metrics (KNN algo). This powers semantic search for both text and image queries by finding the closest matching embeddings in the vector space. This hasO(n * d)
performance, where n = number of vectors, d = dimension size. There is support for simple equality-based filtering on fields (id
,text
,media_type
,media_uri
, or any other metadata key). These filter evaluations are applied during the KNN scan before distance comparison to reduce search space. Possible future enhancements will integrate Lance's native ANN (Approximate Nearest Neighbor) indices (IVF_PQ, HNSW, etc.).New commands added:
LANCE.CREATE
- Create dataset with dimensionLANCE.STORE
- Store text with server-side embeddingLANCE.SEARCH
- Search using text queryLANCE.STOREIMAGE
- Store image (URI or base64)LANCE.SEARCHIMAGE
- Search using image queryLANCE.CREATEINDEX
- Create vector index (placeholder)LANCE.EMBEDDING CONFIG SET/GET
- Configure embedding provider per datasetLANCE.LIST
- List datasetsLANCE.INFO
- Get dataset informationLANCE.DEL
- Delete record by IDLANCE.DROP
- Drop entire datasetNew RPC calls added:
lanceCreate
,lanceList
,lanceInfo
,lanceDel
,lanceDrop
lanceSetEmbeddingConfig
,lanceGetEmbeddingConfig
lanceStoreText
,lanceSearchText
lanceStoreImage
,lanceSearchImage
lanceCreateIndex
Other fixes:
0
. User could access it without the requiredadmin-secret
(passed as argument at startup), as database0
was automatically selected when connecting to the database using theredis-cli
command. This is now prohibited and the user will now always have to supplyKEY ...
when selecting database instance0
.View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.