Bug 1768174 - Engine may stop monitoring hosts
Summary: Engine may stop monitoring hosts
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.3.7
: 4.3.7.2
Assignee: Arik
QA Contact: Nisim Simsolo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-03 09:06 UTC by Arik
Modified: 2019-11-21 12:44 UTC (History)
4 users (show)

Fixed In Version: ovirt-engine-4.3.7.2
Clone Of:
Environment:
Last Closed: 2019-11-21 12:44:40 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.3+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 104359 0 'None' MERGED core: prevent NPE on VMs monitoring 2020-09-25 20:14:15 UTC
oVirt gerrit 104553 0 ovirt-engine-4.3 MERGED core: prevent NPE on VMs monitoring 2020-09-25 20:14:19 UTC

Description Arik 2019-11-03 09:06:10 UTC
Description of problem:
In a relatively recent change, the engine was changed to cache only the UIDs of the running VMs instead of all their properties, to reduce the allocated memory. The problem is that when a running VM is stopped being reported by the hosts it ran on, an NPE is thrown and the engine stops monitoring this host.

I set the severity as high and not urgent because this state in which a VM is stopped being reported is supposed to be really rare - typically the VM is reported as Down by the host and only later destroyed.

Version-Release number of selected component (if applicable):


How reproducible:
I tackled this when creating the state mentioned above manually, but I suppose it would happen when restarting a host while there is a VM running on that host.

Steps to Reproduce:
1. Run a VM
2. Restart the host that the VM runs on
3. Run another VM on that host

Actual results:
The second VM will probably stay in WaitForLaunch state because the host is not monitored.

Expected results:
The second VM should eventually switch to UP state.


Additional info:

Comment 1 Michal Skrivanek 2019-11-04 09:57:59 UTC
want to backport this to 4.3?

Comment 2 Arik 2019-11-04 09:59:28 UTC
(In reply to Michal Skrivanek from comment #1)
> want to backport this to 4.3?

Affirmative

Comment 3 Nisim Simsolo 2019-11-14 12:37:14 UTC
Verification build:
rhvm-4.3.7.1-0.1.el7
vdsm-4.30.35-1.el7ev.x86_64
libvirt-4.5.0-23.el7_7.1.x86_64
qemu-kvm-rhev-2.12.0-33.el7_7.4.x86_64

Verification scenario: 
Repeat bug description: "steps to reproduce" few times.

Comment 4 Sandro Bonazzola 2019-11-21 12:44:40 UTC
This bugzilla is included in oVirt 4.3.7 release, published on November 21st 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.7 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.