ComputeService.deploy_vm returns opaque 'vm deployment entered error state' on substrate failure #124

New issue

Closed

opened 2026-05-24 16:58:43 +00:00 by mik-tf · 0 comments

mik-tf commented

2026-05-24 16:58:43 +00:00

Owner

When deploy_vm fails substrate-side, the RPC response is {"code":-32603,"message":"Internal error","data":"Redis operation error: Internal error: backend error: vm deployment entered error state"}. The string has no information about what actually failed: insufficient TFT balance on the deploying twin, node refusing the deployment payload, image flist unreachable, capacity contention, zos node-local error, or substrate timeout. The operator cannot diagnose without daemon-side log access, which a deployer-mediated user flow does not have. Suggested fix: propagate the underlying error chain into the RPC data field, including the substrate event description, the zos node response if any, and the contract IDs that were created before the failure (the orphan list, which would also help the recovery flow tracked at #119). Today this opacity makes every failed deploy a black box for end users.

Signed-by: mik-tf mik-tf@noreply.invalid

When deploy_vm fails substrate-side, the RPC response is `{"code":-32603,"message":"Internal error","data":"Redis operation error: Internal error: backend error: vm deployment entered error state"}`. The string has no information about what actually failed: insufficient TFT balance on the deploying twin, node refusing the deployment payload, image flist unreachable, capacity contention, zos node-local error, or substrate timeout. The operator cannot diagnose without daemon-side log access, which a deployer-mediated user flow does not have. Suggested fix: propagate the underlying error chain into the RPC `data` field, including the substrate event description, the zos node response if any, and the contract IDs that were created before the failure (the orphan list, which would also help the recovery flow tracked at https://forge.ourworld.tf/lhumina_code/hero_compute/issues/119). Today this opacity makes every failed deploy a black box for end users. Signed-by: mik-tf <mik-tf@noreply.invalid>