Bug 1768174

Summary: Engine may stop monitoring hosts
Product: [oVirt] ovirt-engine Reporter: Arik <ahadas>
Component: BLL.VirtAssignee: Arik <ahadas>
Status: CLOSED CURRENTRELEASE QA Contact: Nisim Simsolo <nsimsolo>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.3.0CC: bugs, michal.skrivanek, nsimsolo, Rhev-m-bugs
Target Milestone: ovirt-4.3.7Keywords: CodeChange
Target Release: 4.3.7.2Flags: pm-rhel: ovirt-4.3+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.3.7.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-21 12:44:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Arik 2019-11-03 09:06:10 UTC
Description of problem:
In a relatively recent change, the engine was changed to cache only the UIDs of the running VMs instead of all their properties, to reduce the allocated memory. The problem is that when a running VM is stopped being reported by the hosts it ran on, an NPE is thrown and the engine stops monitoring this host.

I set the severity as high and not urgent because this state in which a VM is stopped being reported is supposed to be really rare - typically the VM is reported as Down by the host and only later destroyed.

Version-Release number of selected component (if applicable):


How reproducible:
I tackled this when creating the state mentioned above manually, but I suppose it would happen when restarting a host while there is a VM running on that host.

Steps to Reproduce:
1. Run a VM
2. Restart the host that the VM runs on
3. Run another VM on that host

Actual results:
The second VM will probably stay in WaitForLaunch state because the host is not monitored.

Expected results:
The second VM should eventually switch to UP state.


Additional info:

Comment 1 Michal Skrivanek 2019-11-04 09:57:59 UTC
want to backport this to 4.3?

Comment 2 Arik 2019-11-04 09:59:28 UTC
(In reply to Michal Skrivanek from comment #1)
> want to backport this to 4.3?

Affirmative

Comment 3 Nisim Simsolo 2019-11-14 12:37:14 UTC
Verification build:
rhvm-4.3.7.1-0.1.el7
vdsm-4.30.35-1.el7ev.x86_64
libvirt-4.5.0-23.el7_7.1.x86_64
qemu-kvm-rhev-2.12.0-33.el7_7.4.x86_64

Verification scenario: 
Repeat bug description: "steps to reproduce" few times.

Comment 4 Sandro Bonazzola 2019-11-21 12:44:40 UTC
This bugzilla is included in oVirt 4.3.7 release, published on November 21st 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.7 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.