Bug 1583424 - Extensive use of Backup API without retry merge yields broken VMs and outages.
Summary: Extensive use of Backup API without retry merge yields broken VMs and outages.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.2.5
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.3.2
: 4.3.0
Assignee: Nobody
QA Contact: Elad
URL:
Whiteboard:
Depends On: 1554369 1601212 1607130 1612454 1637976
Blocks: 902971 1520566
TreeView+ depends on / blocked
 
Reported: 2018-05-28 23:34 UTC by Germano Veit Michel
Modified: 2021-09-09 14:16 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-27 15:49:52 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Germano Veit Michel 2018-05-28 23:34:34 UTC
Description of problem:

Customers use the Backup API to backup their VMs. This requires live creation and deletion of snapshots. Sometimes these snapshots fail to delete, due to bugs or outages (network/storage/hardware...). 

The VMs are left with chains in bad state which make them unable to migrate, fail to power up if shutdown and also fail to create new snapshots, causing severe problems for the customers whereas sometimes the solution is simple: just retry.

Could you please discuss if it is possible for RHV to automatically retry those or engage the partners using the API to implement the retry mechanism on their algorithms?

Comment 14 Yaniv Lavi 2018-08-19 11:02:19 UTC
We need testing for this in automation to make sure we handle this level of scale.
Please research the cases and work on automation for this.

Comment 15 Elad 2018-08-28 13:49:28 UTC
Hi Yaniv, we do have an automation for that (test suit that executes infinite atomic snapshots operations randomly). We'll work on mitigating this with scale environments.

Comment 17 Elad 2018-12-19 09:47:51 UTC
For QE - covered by random testing and Commvault integration

Comment 18 Sandro Bonazzola 2019-01-28 09:43:41 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 20 Tal Nisan 2019-02-26 13:26:36 UTC
Marina, did the behavior change after the many improvements we made in 4.2.7 in the field of snapshots?


Note You need to log in before you can comment on or make changes to this bug.