Bug 1770027 - Live Merge completed on the host, but not on the engine, which just waited for it to complete until the operation was terminated.
Summary: Live Merge completed on the host, but not on the engine, which just waited fo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.2.8-4
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ovirt-4.4.8
: ---
Assignee: Benny Zlotnik
QA Contact: Evelina Shames
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-07 22:56 UTC by Gordon Watson
Modified: 2021-10-28 11:32 UTC (History)
18 users (show)

Fixed In Version: ovirt-engine-4.4.8.3
Doc Type: Bug Fix
Doc Text:
Previously, connection with the postgresql would fail during restart or any other issue. The virtual machine monitoring thread would fail with an unrecoverable error and would not run again until the ovirt-engine was restarted. The current release fixes this issue allowing the monitoring thread to recover once errors are resolved.
Clone Of:
Environment:
Last Closed: 2021-09-08 14:12:04 UTC
oVirt Team: Storage
Target Upstream Version:
eshames: testing_plan_complete+


Attachments (Terms of Use)
byteman script (1.06 KB, text/plain)
2020-07-08 13:59 UTC, Benny Zlotnik
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4565091 0 None None None 2019-11-08 15:32:23 UTC
Red Hat Product Errata RHBA-2021:3460 0 None None None 2021-09-08 14:12:16 UTC
oVirt gerrit 116044 0 master MERGED core: catch all VmStats exceptions 2021-08-04 08:06:30 UTC

Description Gordon Watson 2019-11-07 22:56:16 UTC
Description of problem:

A live merge completed on the host, but not on the engine. The engine was just waiting on the merge to complete, and after two days the operation was terminated, and "failed".

There were no obvious communication issues between the engine and the host. The host was not non-responsive and there were no connection or heartbeat timeouts, etc.


Version-Release number of selected component (if applicable):

RHV 4.2.8
RHVH-4.2-8.3 host;
    libvirt-4.5.0-10.el7_6.4.x86_64            
    qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64            
    vdsm-4.20.47-1.el7ev.x86_64      


How reproducible:

Not.


Steps to Reproduce:
1.
2.
3.

Actual results:

Engine never realised the merge had completed on the host.


Expected results:

Engine should have realised the merge had completed on the host and continued with the next steps.


Additional info:

Comment 7 Benny Zlotnik 2019-11-27 11:14:39 UTC
Hi,

Any chance we have a `virsh dumpxml` type of output for the VM 'fbf06769-2808-4471-812c-8f04464a391b'?

Comment 8 Gordon Watson 2019-11-27 14:21:24 UTC
Benny,

No, unfortunately we don't. The VM was shutdown before the customer contacted us.

Regards, GFW.

Comment 9 Benny Zlotnik 2019-12-01 14:58:47 UTC
That's a shame, as it is unclear why engine thought the job was still running? Is there anything in the vm_jobs table?
I know there was a bug a while ago where there was a sync issue between the vm xml and the internal vdsm configuration stored which affected live merge as well. 
I'll try to find it, but I think it happened when using engine 4.2 and a 4.1 cluster (because DomainXML was introduced in 4.2), is that the case here?

Comment 27 Ryan Barry 2020-05-28 11:48:00 UTC
Liran, will the async snapshot resolve this from the virt side?

Comment 29 Liran Rotenberg 2020-06-01 12:17:57 UTC
Unfortunately, it's irreverent to async snapshot.

Comment 30 Avihai 2020-06-17 06:48:32 UTC
Benny, do we have a clear verification scenario on this one?

Comment 31 Benny Zlotnik 2020-06-17 08:10:03 UTC
(In reply to Avihai from comment #30)
> Benny, do we have a clear verification scenario on this one?

no

Comment 35 Benny Zlotnik 2020-07-08 13:59:31 UTC
Created attachment 1700308 [details]
byteman script

Comment 43 Marina Kalinin 2020-10-05 20:14:24 UTC
After discussing this bug today again with the storage team, we decided to close this Insufficient Data.
If you would like to reopen it, please provide reproducer steps or output from the script provided by engineering earlier here in comment#35 and reproducibe on RHV 4.4, currently supported version.

Comment 61 errata-xmlrpc 2021-09-08 14:12:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV Manager (ovirt-engine) [ovirt-4.4.8]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3460


Note You need to log in before you can comment on or make changes to this bug.