Bug 1342681 - Live Merge failed in the engine, but was successful on the host on RHEV 3.6.6
Summary: Live Merge failed in the engine, but was successful on the host on RHEV 3.6.6
Keywords:
Status: CLOSED DUPLICATE of bug 1314082
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.6.6
Hardware: Unspecified
OS: Linux
unspecified
urgent
Target Milestone: ovirt-4.0.0-rc
: 4.0.0
Assignee: Ala Hino
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-03 21:37 UTC by Gordon Watson
Modified: 2021-03-11 14:54 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-07 09:49:14 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Gordon Watson 2016-06-03 21:37:49 UTC
Description of problem:

After upgrading to 3.6.6, a live merge of a VM with 4 disks failed, but only on the engine.

Snapshots for two of the disks were successfully merged, and the other two failed due to a "network connectivity issue".

These merges involved the active layer.

Prior to the merges, all four disks consisted of two volumes.

On the host though, all four merges completed.

This left the following inconsistencies;

1) RHEV database;

- Two disks still had two images, with the base images being marked 'illegal' (imagestatus = 4).

- The original active image for these two disks were still marked as 'active'.


2) Storage;

- The two volumes above still physically existed in the storage domain, but were no longer 'open' (just 'active').

- The volume metadata for these volumes were marked 'ILLEGAL'.



Question: Rather than manually modifying the database and the metadata and removing volumes, would retrying the merges have been successful in this case ?  That is, since the active volumes in the database for these two disks were actually no longer 'open' on the host, would a retry have resulted in these just being removed and the above inconsistencies being reconciled ?  


Version-Release number of selected component (if applicable):

RHEV 3.6.6


How reproducible:



Steps to Reproduce:
1.
2.
3.

Actual results:



Expected results:


Additional info:

I

Comment 4 Ala Hino 2016-06-05 09:44:54 UTC
This one seems similar to BZ 1314082.
In 3.6.6 we should be able to recover live merge failures on engine by retrying live merge.

BZ 1323629 describes this behavior.

Can you please retry live merge and update whether volumes deleted?

Comment 6 Ala Hino 2016-06-07 08:45:24 UTC
I understand your concerns, Gordon.

To your question regarding retrying merge - yes, we can suggest, without any analysis, to retry merge if merge completed at host side but failed at engine side. In these cases, retry will fix the engine side. 
In the worst case, were 'real' storage issues exist, merge will fail again. 

I am not saying this a perfect solution. However, it is better than what we had prior to 3.6.6 were, if merge failed, a manual intervention was required.

Regarding automatic retry, we considered this option but decided that we won't be able to make it for 3.6.6. 

Moving forward, we do want to enhance the recovery mechanism to automatically recover from use cases were automatic-recovery is possible, VDS_NETWORD is a good example.

Thank you for sharing your concerns!

Comment 7 Ala Hino 2016-06-07 09:49:14 UTC

*** This bug has been marked as a duplicate of bug 1323629 ***

Comment 8 Ala Hino 2016-06-07 09:50:19 UTC
Updating the duplicate BZ

*** This bug has been marked as a duplicate of bug 1314082 ***


Note You need to log in before you can comment on or make changes to this bug.