Bug 1770027

Summary: Live Merge completed on the host, but not on the engine, which just waited for it to complete until the operation was terminated.
Product: Red Hat Enterprise Virtualization Manager Reporter: Gordon Watson <gwatson>
Component: ovirt-engineAssignee: Benny Zlotnik <bzlotnik>
Status: CLOSED ERRATA QA Contact: Evelina Shames <eshames>
Severity: high Docs Contact:
Priority: high    
Version: 4.2.8-4CC: ahadas, amarirom, bcholler, bzlotnik, ddacosta, dfodor, dwhitley, eshenitz, giridhar.ramaraju, lrotenbe, michal.skrivanek, mkalinin, mperina, mtessun, pelauter, sfishbai, sshmulev, tnisan
Target Milestone: ovirt-4.4.8Keywords: Reopened, ZStream
Target Release: ---Flags: eshames: testing_plan_complete+
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.4.8.3 Doc Type: Bug Fix
Doc Text:
Previously, connection with the postgresql would fail during restart or any other issue. The virtual machine monitoring thread would fail with an unrecoverable error and would not run again until the ovirt-engine was restarted. The current release fixes this issue allowing the monitoring thread to recover once errors are resolved.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-08 14:12:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
byteman script none

Description Gordon Watson 2019-11-07 22:56:16 UTC
Description of problem:

A live merge completed on the host, but not on the engine. The engine was just waiting on the merge to complete, and after two days the operation was terminated, and "failed".

There were no obvious communication issues between the engine and the host. The host was not non-responsive and there were no connection or heartbeat timeouts, etc.


Version-Release number of selected component (if applicable):

RHV 4.2.8
RHVH-4.2-8.3 host;
    libvirt-4.5.0-10.el7_6.4.x86_64            
    qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64            
    vdsm-4.20.47-1.el7ev.x86_64      


How reproducible:

Not.


Steps to Reproduce:
1.
2.
3.

Actual results:

Engine never realised the merge had completed on the host.


Expected results:

Engine should have realised the merge had completed on the host and continued with the next steps.


Additional info:

Comment 7 Benny Zlotnik 2019-11-27 11:14:39 UTC
Hi,

Any chance we have a `virsh dumpxml` type of output for the VM 'fbf06769-2808-4471-812c-8f04464a391b'?

Comment 8 Gordon Watson 2019-11-27 14:21:24 UTC
Benny,

No, unfortunately we don't. The VM was shutdown before the customer contacted us.

Regards, GFW.

Comment 9 Benny Zlotnik 2019-12-01 14:58:47 UTC
That's a shame, as it is unclear why engine thought the job was still running? Is there anything in the vm_jobs table?
I know there was a bug a while ago where there was a sync issue between the vm xml and the internal vdsm configuration stored which affected live merge as well. 
I'll try to find it, but I think it happened when using engine 4.2 and a 4.1 cluster (because DomainXML was introduced in 4.2), is that the case here?

Comment 27 Ryan Barry 2020-05-28 11:48:00 UTC
Liran, will the async snapshot resolve this from the virt side?

Comment 29 Liran Rotenberg 2020-06-01 12:17:57 UTC
Unfortunately, it's irreverent to async snapshot.

Comment 30 Avihai 2020-06-17 06:48:32 UTC
Benny, do we have a clear verification scenario on this one?

Comment 31 Benny Zlotnik 2020-06-17 08:10:03 UTC
(In reply to Avihai from comment #30)
> Benny, do we have a clear verification scenario on this one?

no

Comment 35 Benny Zlotnik 2020-07-08 13:59:31 UTC
Created attachment 1700308 [details]
byteman script

Comment 43 Marina Kalinin 2020-10-05 20:14:24 UTC
After discussing this bug today again with the storage team, we decided to close this Insufficient Data.
If you would like to reopen it, please provide reproducer steps or output from the script provided by engineering earlier here in comment#35 and reproducibe on RHV 4.4, currently supported version.

Comment 61 errata-xmlrpc 2021-09-08 14:12:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV Manager (ovirt-engine) [ovirt-4.4.8]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3460