QSSS = Quantum Safe Storage Tutorial (Scott) #23

Open
opened 2024-03-20 13:03:12 +00:00 by sashaastiadi · 21 comments
Owner

Situation

  • Quantum safe storage is part of our product offering
    We need a good tutorial

Specs - Pulumi QSFS

  • Deploy
  • Monitor
  •  Maintain
  • Recover

Requirements

More info

  • how to setup
  • how to connect to created ZDB's using TFCMD
  • how to mount QSFS
  • vscript example to setup a filesystem using ZDB's all in UBuntu 23.10 using vlang vscript
  • how to see see monitoring/health
  • check that we can repair, how can we see we need repair
  • check that we can re-connect on other VM
  • make a test script where we destroy a VM which was writing to QSFS, then create new VM which remounts QSFS, there might be some data lost, but never longer than e.g. 15 min
  • check that all metadata is there as needed to reconnect, document where metadata is
  • explain architecture and how we use local ZDB as cache
  • good documentation
## Situation - Quantum safe storage is part of our product offering We need a good tutorial ## Specs - Pulumi QSFS - [x] Deploy - [x] Monitor - [x] Maintain - [x] Recover ### Requirements More info - how to setup - how to connect to created ZDB's using TFCMD - how to mount QSFS - vscript example to setup a filesystem using ZDB's all in UBuntu 23.10 using vlang vscript - how to see see monitoring/health - check that we can repair, how can we see we need repair - check that we can re-connect on other VM - make a test script where we destroy a VM which was writing to QSFS, then create new VM which remounts QSFS, there might be some data lost, but never longer than e.g. 15 min - check that all metadata is there as needed to reconnect, document where metadata is - explain architecture and how we use local ZDB as cache - good documentation
Author
Owner

Does this mean we want something like the Terraform guide to QSFS (micro and full VM), but with tfcmd instead?
Just to be sure.

Does this mean we want something like the Terraform guide to QSFS ([micro](https://www.manual.grid.tf/terraform/resources/terraform_qsfs_on_microvm.html) and [full](https://www.manual.grid.tf/terraform/resources/terraform_qsfs_on_full_vm.html) VM), but with tfcmd instead? Just to be sure.
Owner

NO, this is set of instructions for now, a tutorial in hero mdbook format, with links to all required info

NO, this is set of instructions for now, a tutorial in hero mdbook format, with links to all required info
despiegk changed title from QSSS = Quantum Safe Storage MVP to QSSS = Quantum Safe Storage Tutorial 2024-03-21 07:39:21 +00:00
despiegk added the
Story
label 2024-03-21 07:39:25 +00:00
despiegk added this to the (deleted) project 2024-03-21 07:39:29 +00:00
mik-tf was assigned by thabeta 2024-05-07 11:17:28 +00:00
despiegk modified the project from (deleted) to tfgrid_3_14 2024-05-22 07:56:25 +00:00
Owner

what is status, I really think we need this still in 3.14, should only be docu

what is status, I really think we need this still in 3.14, should only be docu
Member

@despiegk ramez will deliver this and mark things that he knows for sure ain't working
cross link https://github.com/threefoldtech/home/issues/1539

@despiegk ramez will deliver this and mark things that he knows for sure ain't working cross link https://github.com/threefoldtech/home/issues/1539
Owner

Hey guys!

What's the status on this now @thabeta ? Ramez (not on gitea?) says the whole procedures are developed and only the docs is needed. If that is the case, where would we find such procedures and who tested them? Any github logs or PRs can be helpful.

Once it's clear, I can then make an issue on info_grid+info_tfgrid and document it all quickly in hero mdbook format.

Hey guys! What's the status on this now @thabeta ? Ramez (not on gitea?) says the whole procedures are developed and only the docs is needed. If that is the case, where would we find such procedures and who tested them? Any github logs or PRs can be helpful. Once it's clear, I can then make an issue on info_grid+info_tfgrid and document it all quickly in hero mdbook format.
scott was assigned by despiegk 2024-07-28 07:23:33 +00:00
despiegk modified the project from tfgrid_3_14 to tfgrid_3_17 2024-07-28 07:23:38 +00:00
despiegk removed the
Story
label 2024-07-28 07:57:25 +00:00
Member
https://github.com/threefoldtech/home/issues/1539 - https://github.com/threefoldtech/quantum-storage/issues/40 - https://github.com/threefoldtech/0-db/issues/167
thabeta added the
Story
label 2024-07-28 09:54:21 +00:00
#### Issues created: - https://github.com/threefoldtech/tfgrid-sdk-go/issues/1103 - https://github.com/threefoldtech/quantum-storage/issues/40 - https://github.com/threefoldtech/0-db/issues/167 - https://github.com/threefoldtech/tfgrid-sdk-go/issues/1128 - https://github.com/threefoldtech/quantum-storage/issues/41 #### Updates: created a new issue for the cache: https://github.com/threefoldtech/quantum-storage/issues/41. All of the tests can't be done until all of the issues are resolved. https://github.com/threefoldtech/home/issues/1539#issuecomment-2242913928. And I am still waiting for the fixes in the PR https://github.com/threefoldtech/info_grid/pull/588/files#diff-b0d52cfbe1929e80b61aa5d44f601ec5487d3ec6e5cf228cad7ed7af47aeea97 to check docs boxes. #### Moved the main story to blocked: - https://github.com/threefoldtech/home/issues/1539
despiegk changed title from QSSS = Quantum Safe Storage Tutorial to QSSS = Quantum Safe Storage Tutorial (Scott) 2024-08-01 14:46:53 +00:00
Owner

Will check the PR in info_grid. thanks @saeedr for the reminder.

Will check the PR in info_grid. thanks @saeedr for the reminder.
Owner

Progress on the documentation side is somewhat blocked on issues related to monitoring:

https://github.com/threefoldtech/0-stor_v2/issues/72
https://github.com/threefoldtech/0-stor_v2/issues/118
https://github.com/threefoldtech/0-stor_v2/issues/120

However I have completed testing of the following and will add to the documentation PR:

  • Establish procedure to recover QSFS if the frontend VM is lost (using only the zstor config file)
  • Monitoring using the exposed Prometheus metrics, with a Grafana dashboard template

Some aspects of operation and maintenance need further testing:

  • Replacing failed storage backends (it was not clear in my testing that automatic rebuilding of data from the lost backend was working correctly when new backends were added)
  • Increasing overall capacity by adding or replacing backends
Progress on the documentation side is somewhat blocked on issues related to monitoring: https://github.com/threefoldtech/0-stor_v2/issues/72 https://github.com/threefoldtech/0-stor_v2/issues/118 https://github.com/threefoldtech/0-stor_v2/issues/120 However I have completed testing of the following and will add to the documentation PR: * Establish procedure to recover QSFS if the frontend VM is lost (using only the zstor config file) * Monitoring using the exposed Prometheus metrics, with a Grafana dashboard template Some aspects of operation and maintenance need further testing: * Replacing failed storage backends (it was not clear in my testing that automatic rebuilding of data from the lost backend was working correctly when new backends were added) * Increasing overall capacity by adding or replacing backends
thabeta modified the project from tfgrid_3_17 to tfgrid_3_15 2024-08-06 14:44:31 +00:00
  • how to setup
  • how to connect to created ZDB's using TFCMD
  • how to mount QSFS
  • check that all metadata is there as needed to reconnect, document where metadata is
  • explain architecture and how we use local ZDB as cache
  • how to see see monitoring/health
  • check that we can repair, how can we see we need repair
  • check that we can re-connect on other VM

These should be covered here PR.

  • make a test script where we destroy a VM which was writing to QSFS, then create new VM which remounts QSFS, there might be some data lost, but never longer than e.g. 15 min

This test and test suite will be created after the issues are solved. https://github.com/threefoldtech/home/issues/1539#issuecomment-2242913928.

  • vscript example to setup a filesystem using ZDB's all in UBuntu 23.10 using vlang vscript

There is no possible, clear way to do that yet.

> * [x] how to setup > * [x] how to connect to created ZDB's using TFCMD > * [ ] how to mount QSFS > * [ ] check that all metadata is there as needed to reconnect, document where metadata is > * [ ] explain architecture and how we use local ZDB as cache > * [ ] how to see see monitoring/health > * [ ] check that we can repair, how can we see we need repair > * [ ] check that we can re-connect on other VM These should be covered here [PR](https://github.com/threefoldtech/info_grid/pull/588/files#diff-b0d52cfbe1929e80b61aa5d44f601ec5487d3ec6e5cf228cad7ed7af47aeea97). > * [ ] make a test script where we destroy a VM which was writing to QSFS, then create new VM which remounts QSFS, there might be some data lost, but never longer than e.g. 15 min This test and test suite will be created after the issues are solved. https://github.com/threefoldtech/home/issues/1539#issuecomment-2242913928. > * [ ] vscript example to setup a filesystem using ZDB's all in UBuntu 23.10 using vlang vscript There is no possible, clear way to do that yet.
Blocked on https://github.com/threefoldtech/home/issues/1539#issuecomment-2242913928, and @scottyeager confirmed the issue https://github.com/threefoldtech/quantum-storage/issues/41, and he also found another bug https://github.com/threefoldtech/0-stor_v2/issues/122.
mik-tf was unassigned by despiegk 2024-08-19 14:25:11 +00:00
Owner

erwan & scott will look into it, and if issues ask advice to lee,

erwan & scott will look into it, and if issues ask advice to lee,
Owner

Update

  • QSFS
    • issue with zos as it relates to mycelium and zdb
      • can't access zdb reliably over mycelium
        • solution coming (see below)

Macvlan

  • situation: blocking for qsfs
    • removal of macvlan in zos
      • already done in zos 4
      • needs to be done in zos 3
        • macvlan good idea back then, not as we envisioned they would be
          • restrictions are hampering instead of securing

TODO

## Update - QSFS - issue with zos as it relates to mycelium and zdb - can't access zdb reliably over mycelium - solution coming (see below) ## Macvlan - situation: blocking for qsfs - removal of macvlan in zos - already done in zos 4 - needs to be done in zos 3 - macvlan good idea back then, not as we envisioned they would be - restrictions are hampering instead of securing ## TODO - Track the fix for macvlan in zosv3 - https://github.com/threefoldtech/zos/issues/2403 - When done, we can continue
Member

@scott you should be able to test if QSFS is fixed on devnet of grid3 tonight/tmw morning then we will continue on the migrations side of the ticket.

@scott you should be able to test if QSFS is fixed on devnet of grid3 tonight/tmw morning then we will continue on the migrations side of the ticket.
Member

that's on devnet now you can see if your tests are working

that's on devnet now you can see if your tests are working
Owner

Update

  • As Thabet said, now on devnet
  • @scott can test on devnet
    • if conclusive, @thabeta can supervise so it gets moved to main/testnets (or can it be moved to mainnet already?)

Future Update

  • Scott if possible provide some update after testing. Thanks.
# Update - As Thabet said, now on devnet - @scott can test on devnet - if conclusive, @thabeta can supervise so it gets moved to main/testnets (or can it be moved to mainnet already?) # Future Update - Scott if possible provide some update after testing. Thanks.
Owner

Update

  • No news on this lately
  • Scott is working on it

@scott if you have any blocking issues please let it be known and we can check how to move forward.

# Update - No news on this lately - Scott is working on it @scott if you have any blocking issues please let it be known and we can check how to move forward.
mik-tf modified the project from tfgrid_3_15 to tfgrid_3_16 2024-11-04 20:07:19 +00:00
mik-tf modified the project from tfgrid_3_16 to tfgrid_3_15 2024-11-04 22:56:25 +00:00
Owner

Update

  • Scott and I will try to get a working version of QSFS with pulumi by the end of the week/early next week so we can have the most we can do with the current QSFS state for 3.15
  • Will update this story
  • Then once this story is closed, we can create future issues/stories for 3.16
# Update - Scott and I will try to get a working version of QSFS with pulumi by the end of the week/early next week so we can have the most we can do with the current QSFS state for 3.15 - Will update this story - Then once this story is closed, we can create future issues/stories for 3.16
mik-tf added the due date 2024-11-11 2024-11-05 03:05:13 +00:00
Owner

Update

  • Deployment QSFS on the grid can be seen in 3 main parts:
    • How to Deploy
    • How to Monitor
    • How to Recover

Status

@lee if you could check the 4 issues on zstor shown above that are blocking the monitoring part, it would be great.

# Update - Deployment QSFS on the grid can be seen in 3 main parts: - How to Deploy - How to Monitor - How to Recover # Status - How to Deploy - Guide by Scott: https://github.com/threefoldtech/quantum-storage/blob/pulumi/pulumi/README.md - Basic script on micro vm on the grid: https://gist.github.com/Mik-TF/158f985848f0ec840bf57d4c20ffe025 - Pulumi QSFS Quick Guide for Micro VM: https://github.com/threefoldtech/quantum-storage/blob/development_pulumi_scripts/pulumi/docs/qsfs_ubuntu_24.04.md - How to Monitor - four issues blocking with monitoring for zstor - [WIP] https://github.com/threefoldtech/0-stor_v2/issues/123 - [FIXED] https://github.com/threefoldtech/0-stor_v2/issues/120 - [WIP, probably fixed] https://github.com/threefoldtech/0-stor_v2/issues/118 - [WIP] https://github.com/threefoldtech/0-stor_v2/issues/72 - Once these issues are fixed, we could most likely have this part completed and documented - How to Recover - @scott is working on something with a pulumi script - This will be updated @lee if you could check the 4 issues on zstor shown above that are blocking the monitoring part, it would be great.
mik-tf modified the project from tfgrid_3_15 to tfgrid_3_16 2024-11-12 15:27:05 +00:00
Owner

Update

  • Iwan and Scott are working on it
  • Issues are being closed quickly
  • Will update the issue
# Update - Iwan and Scott are working on it - Issues are being closed quickly - Will update the issue
mik-tf modified the project from tfgrid_3_16 to tfgrid_3_15 2024-11-18 15:25:09 +00:00
Owner

Update

  • QSFS
    • Scott more or less finished how to deploy + recover + monitor

Notes

  • pulumi deployment, can deploy, can redeploy/recover
    • basic dashboard in grafana to monitor with prometheus
  • Working on script for end-to-end test

ETA

  • Trying end of next week
# Update - QSFS - Scott more or less finished how to deploy + recover + monitor # Notes - pulumi deployment, can deploy, can redeploy/recover - basic dashboard in grafana to monitor with prometheus - Working on script for end-to-end test # ETA - Trying end of next week
mik-tf modified the project from tfgrid_3_15 to tfgrid_3_15_patch 2024-11-26 15:11:59 +00:00
mik-tf modified the project from tfgrid_3_15_patch to tfgrid_3_15 2024-11-26 15:49:19 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
6 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

2024-11-11

Dependencies

No dependencies set.

Reference: tfgrid/circle_engineering#23
No description provided.