Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1583424

Summary:	Extensive use of Backup API without retry merge yields broken VMs and outages.
Product:	Red Hat Enterprise Virtualization Manager	Reporter:	Germano Veit Michel <gveitmic>
Component:	ovirt-engine	Assignee:	Nobody <nobody>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Elad <ebenahar>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.2.5	CC:	bcholler, dfediuck, ebenahar, gwatson, lsurette, mkalinin, Rhev-m-bugs, srevivo, tnisan
Target Milestone:	ovirt-4.3.2
Target Release:	4.3.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-02-27 15:49:52 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Storage	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1554369, 1601212, 1607130, 1612454, 1637976
Bug Blocks:	902971, 1520566

Description Germano Veit Michel 2018-05-28 23:34:34 UTC

Description of problem:

Customers use the Backup API to backup their VMs. This requires live creation and deletion of snapshots. Sometimes these snapshots fail to delete, due to bugs or outages (network/storage/hardware...). 

The VMs are left with chains in bad state which make them unable to migrate, fail to power up if shutdown and also fail to create new snapshots, causing severe problems for the customers whereas sometimes the solution is simple: just retry.

Could you please discuss if it is possible for RHV to automatically retry those or engage the partners using the API to implement the retry mechanism on their algorithms?

Comment 14 Yaniv Lavi 2018-08-19 11:02:19 UTC

We need testing for this in automation to make sure we handle this level of scale.
Please research the cases and work on automation for this.

Comment 15 Elad 2018-08-28 13:49:28 UTC

Hi Yaniv, we do have an automation for that (test suit that executes infinite atomic snapshots operations randomly). We'll work on mitigating this with scale environments.

Comment 17 Elad 2018-12-19 09:47:51 UTC

For QE - covered by random testing and Commvault integration

Comment 18 Sandro Bonazzola 2019-01-28 09:43:41 UTC

This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 20 Tal Nisan 2019-02-26 13:26:36 UTC

Marina, did the behavior change after the many improvements we made in 4.2.7 in the field of snapshots?