ComputeService.deploy_vm returns opaque 'vm deployment entered error state' on substrate failure #124
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_compute#124
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
When deploy_vm fails substrate-side, the RPC response is
{"code":-32603,"message":"Internal error","data":"Redis operation error: Internal error: backend error: vm deployment entered error state"}. The string has no information about what actually failed: insufficient TFT balance on the deploying twin, node refusing the deployment payload, image flist unreachable, capacity contention, zos node-local error, or substrate timeout. The operator cannot diagnose without daemon-side log access, which a deployer-mediated user flow does not have. Suggested fix: propagate the underlying error chain into the RPCdatafield, including the substrate event description, the zos node response if any, and the contract IDs that were created before the failure (the orphan list, which would also help the recovery flow tracked at #119). Today this opacity makes every failed deploy a black box for end users.Signed-by: mik-tf mik-tf@noreply.invalid