Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1583424

Summary: Extensive use of Backup API without retry merge yields broken VMs and outages.
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-engineAssignee: Nobody <nobody>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: high    
Version: 4.2.5CC: bcholler, dfediuck, ebenahar, gwatson, lsurette, mkalinin, Rhev-m-bugs, srevivo, tnisan
Target Milestone: ovirt-4.3.2   
Target Release: 4.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-27 15:49:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1554369, 1601212, 1607130, 1612454, 1637976    
Bug Blocks: 902971, 1520566    

Description Germano Veit Michel 2018-05-28 23:34:34 UTC
Description of problem:

Customers use the Backup API to backup their VMs. This requires live creation and deletion of snapshots. Sometimes these snapshots fail to delete, due to bugs or outages (network/storage/hardware...). 

The VMs are left with chains in bad state which make them unable to migrate, fail to power up if shutdown and also fail to create new snapshots, causing severe problems for the customers whereas sometimes the solution is simple: just retry.

Could you please discuss if it is possible for RHV to automatically retry those or engage the partners using the API to implement the retry mechanism on their algorithms?

Comment 14 Yaniv Lavi 2018-08-19 11:02:19 UTC
We need testing for this in automation to make sure we handle this level of scale.
Please research the cases and work on automation for this.

Comment 15 Elad 2018-08-28 13:49:28 UTC
Hi Yaniv, we do have an automation for that (test suit that executes infinite atomic snapshots operations randomly). We'll work on mitigating this with scale environments.

Comment 17 Elad 2018-12-19 09:47:51 UTC
For QE - covered by random testing and Commvault integration

Comment 18 Sandro Bonazzola 2019-01-28 09:43:41 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 20 Tal Nisan 2019-02-26 13:26:36 UTC
Marina, did the behavior change after the many improvements we made in 4.2.7 in the field of snapshots?