Previously, connection with the postgresql would fail during restart or any other issue. The virtual machine monitoring thread would fail with an unrecoverable error and would not run again until the ovirt-engine was restarted. The current release fixes this issue allowing the monitoring thread to recover once errors are resolved.
Description of problem:
A live merge completed on the host, but not on the engine. The engine was just waiting on the merge to complete, and after two days the operation was terminated, and "failed".
There were no obvious communication issues between the engine and the host. The host was not non-responsive and there were no connection or heartbeat timeouts, etc.
Version-Release number of selected component (if applicable):
RHV 4.2.8
RHVH-4.2-8.3 host;
libvirt-4.5.0-10.el7_6.4.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64
vdsm-4.20.47-1.el7ev.x86_64
How reproducible:
Not.
Steps to Reproduce:
1.
2.
3.
Actual results:
Engine never realised the merge had completed on the host.
Expected results:
Engine should have realised the merge had completed on the host and continued with the next steps.
Additional info:
That's a shame, as it is unclear why engine thought the job was still running? Is there anything in the vm_jobs table?
I know there was a bug a while ago where there was a sync issue between the vm xml and the internal vdsm configuration stored which affected live merge as well.
I'll try to find it, but I think it happened when using engine 4.2 and a 4.1 cluster (because DomainXML was introduced in 4.2), is that the case here?
After discussing this bug today again with the storage team, we decided to close this Insufficient Data.
If you would like to reopen it, please provide reproducer steps or output from the script provided by engineering earlier here in comment#35 and reproducibe on RHV 4.4, currently supported version.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (RHV Manager (ovirt-engine) [ovirt-4.4.8]), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2021:3460
Description of problem: A live merge completed on the host, but not on the engine. The engine was just waiting on the merge to complete, and after two days the operation was terminated, and "failed". There were no obvious communication issues between the engine and the host. The host was not non-responsive and there were no connection or heartbeat timeouts, etc. Version-Release number of selected component (if applicable): RHV 4.2.8 RHVH-4.2-8.3 host; libvirt-4.5.0-10.el7_6.4.x86_64 qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64 vdsm-4.20.47-1.el7ev.x86_64 How reproducible: Not. Steps to Reproduce: 1. 2. 3. Actual results: Engine never realised the merge had completed on the host. Expected results: Engine should have realised the merge had completed on the host and continued with the next steps. Additional info: