DescriptionGermano Veit Michel
2018-05-28 23:34:34 UTC
Description of problem:
Customers use the Backup API to backup their VMs. This requires live creation and deletion of snapshots. Sometimes these snapshots fail to delete, due to bugs or outages (network/storage/hardware...).
The VMs are left with chains in bad state which make them unable to migrate, fail to power up if shutdown and also fail to create new snapshots, causing severe problems for the customers whereas sometimes the solution is simple: just retry.
Could you please discuss if it is possible for RHV to automatically retry those or engage the partners using the API to implement the retry mechanism on their algorithms?
Hi Yaniv, we do have an automation for that (test suit that executes infinite atomic snapshots operations randomly). We'll work on mitigating this with scale environments.