OCI Cache: Garbage collection / Layer sharing #7

New issue

Open

opened 2026-02-11 19:21:08 +00:00 by thabeta · 2 comments

thabeta commented

2026-02-11 19:21:08 +00:00

Owner

The current OCI implementation in chvm-lib stores layers in VM-specific directories. When multiple VMs use the same image, layers are duplicated on disk, increasing storage overhead.

Proposed Changes:

Move layer storage to a global content-addressable store (CAS) indexed by digest.
Implement reference counting for layers to track usage across VMs.
Add a prune subcommand to remove unreferenced layers and images from the CAS.

The current OCI implementation in `chvm-lib` stores layers in VM-specific directories. When multiple VMs use the same image, layers are duplicated on disk, increasing storage overhead. **Proposed Changes:** 1. Move layer storage to a global content-addressable store (CAS) indexed by digest. 2. Implement reference counting for layers to track usage across VMs. 3. Add a `prune` subcommand to remove unreferenced layers and images from the CAS.

salmaelsoly self-assigned this

2026-03-05 08:30:51 +00:00

salmaelsoly commented

2026-03-05 11:23:00 +00:00

Member

After investigation:

layers are not duplicated in vm specific directories it is duplicates inside image manifest directories

Current behavior:

The current pull logic (pull.rs:57,72) stores layer blobs under per-manifest directories.

This means each image downloads all its layers independently, even if the same layer blob already exists from another image.

For images sharing a common base (for example node:20-bookworm and node:22-bookworm, both built on debian:bookworm), the shared base layers are downloaded and stored multiple times.

Additionally, ensure_rootfs() (cache.rs:189-194) flattens all layers into a single rootfs/ directory per image, so even the extracted filesystem is duplicated across images that share layers.

####Proposed approach:

Content-addressable blob storage indexed by layer digest
Skip already-downloaded layers during pull: Before downloading a layer, check if ~/.chvm/blobs// already exists. If it does, skip the download entirely.
Update index.json: Replace layer_paths with layer_digests — an ordered list of layer digest references into the CAS:
{
"layer_digests": ["sha256:xxx", "sha256:yyy"]
}
Overlay stacking instead of flattened rootfs: At VM boot, construct the overlay mount from the layer directories directly:
lowerdir=blobs/sha256-yyy:blobs/sha256-xxx,upperdir=vm/upper,workdir=vm/work
Prune: Since index.json tracks which digests each image uses, pruning is just: find blobs not referenced by any image's
layer_digests and delete them.

### After investigation: - layers are not duplicated in vm specific directories it is duplicates inside image manifest directories ### Current behavior: The current pull logic (`pull.rs:57,72`) stores layer blobs under **per-manifest directories**. This means each image downloads **all its layers independently**, even if the same layer blob already exists from another image. For images sharing a common base (for example `node:20-bookworm` and `node:22-bookworm`, both built on `debian:bookworm`), the shared base layers are **downloaded and stored multiple times**. Additionally, `ensure_rootfs()` (`cache.rs:189-194`) **flattens all layers into a single `rootfs/` directory per image**, so even the **extracted filesystem is duplicated** across images that share layers. ####Proposed approach: 1. Content-addressable blob storage indexed by layer digest 1. Skip already-downloaded layers during pull: Before downloading a layer, check if ~/.chvm/blobs/<layer-digest>/ already exists. If it does, skip the download entirely. 2. Update index.json: Replace layer_paths with layer_digests — an ordered list of layer digest references into the CAS: { "layer_digests": ["sha256:xxx", "sha256:yyy"] } 1. Overlay stacking instead of flattened rootfs: At VM boot, construct the overlay mount from the layer directories directly: `lowerdir=blobs/sha256-yyy:blobs/sha256-xxx,upperdir=vm/upper,workdir=vm/work` 5. Prune: Since index.json tracks which digests each image uses, pruning is just: find blobs not referenced by any image's layer_digests and delete them.

salmaelsoly referenced this issue

2026-03-05 12:27:47 +00:00

WIP: feat: enhance OCI image caching with new layer digest handling and blob storage #25

salmaelsoly commented

2026-03-08 13:35:25 +00:00

Member

work completed:

content- addressable storage for layers
initial implementation for overlay stacking (need to be check and tested and check for block storage to)
prune unreferenced layers
add integration tests for image management

work in progress

testing vm lifecycle with both storage
look into virtiofs and block storage and rootfs extracting

### work completed: - content- addressable storage for layers - initial implementation for overlay stacking (need to be check and tested and check for block storage to) - prune unreferenced layers - add integration tests for image management ### work in progress - testing vm lifecycle with both storage - look into virtiofs and block storage and rootfs extracting