Description of problem: Customers use the Backup API to backup their VMs. This requires live creation and deletion of snapshots. Sometimes these snapshots fail to delete, due to bugs or outages (network/storage/hardware...). The VMs are left with chains in bad state which make them unable to migrate, fail to power up if shutdown and also fail to create new snapshots, causing severe problems for the customers whereas sometimes the solution is simple: just retry. Could you please discuss if it is possible for RHV to automatically retry those or engage the partners using the API to implement the retry mechanism on their algorithms?
We need testing for this in automation to make sure we handle this level of scale. Please research the cases and work on automation for this.
Hi Yaniv, we do have an automation for that (test suit that executes infinite atomic snapshots operations randomly). We'll work on mitigating this with scale environments.
For QE - covered by random testing and Commvault integration
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
Marina, did the behavior change after the many improvements we made in 4.2.7 in the field of snapshots?