Bug 1392903

Summary: VM is not monitored by engine, NPE @ org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.isVdsNonResponsive
Product: [oVirt] ovirt-engine Reporter: Israel Pinto <ipinto>
Component: BLL.VirtAssignee: Arik <ahadas>
Status: CLOSED CURRENTRELEASE QA Contact: Israel Pinto <ipinto>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.1.0CC: bugs, ipinto, mgoldboi, michal.skrivanek, nsednev, sbonazzo
Target Milestone: ovirt-4.0.6Keywords: Triaged
Target Release: 4.0.6.1Flags: rule-engine: ovirt-4.0.z+
mgoldboi: planning_ack+
michal.skrivanek: devel_ack+
mavital: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-18 07:25:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1362618    
Attachments:
Description Flags
engine.log
none
vdsm log
none
repoen_engine.log none

Description Israel Pinto 2016-11-08 12:53:59 UTC
Description of problem:
VM is not manage by the engine, it stack in migrate state. 
After starting VM, perform migrate to VM, VM stauck in migrate state.
According to engine log, VM did not run on host although it looked UP from UI. 
Engine log:
2016-11-08 09:58:15,150 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler9) [4923421b] VM 'bbebf175-5b9b-46db-ae04-a0f5c32eeb6d'(cpu_hotplug_vm_with_load) is running in db and not running on VDS 'f18b5b1d-de10-4856-99da-cc86fc55ed96'(host_mixed_1)

2016-11-08 09:58:15,152 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (DefaultQuartzScheduler9) [4923421b] Failed during monitoring vm: bbebf175-5b9b-46db-ae04-a0f5c32eeb6d , error is: java.lang.NullPointerException
2016-11-08 09:58:15,153 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (DefaultQuartzScheduler9) [4923421b] Exception:: java.lang.NullPointerException
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.isVdsNonResponsive(VmAnalyzer.java:764) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.handOverVm(VmAnalyzer.java:753) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.proceedDisappearedVm(VmAnalyzer.java:713) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.analyze(VmAnalyzer.java:134) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.lambda$analyzeVms$1(VmsMonitoring.java:151) [vdsbroker.jar:]
        at java.util.ArrayList.forEach(ArrayList.java:1249) [rt.jar:1.8.0_102]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.analyzeVms(VmsMonitoring.java:146) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.perform(VmsMonitoring.java:109) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher.poll(PollVmStatsRefresher.java:38) [vdsbroker.jar:]
        at sun.reflect.GeneratedMethodAccessor231.invoke(Unknown Source) [:1.8.0_102]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_102]
        at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_102]
        at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_102]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_102]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_102]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_102]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_102]


Version-Release number of selected component (if applicable):
Engine: oVirt Engine Version: 4.1.0-0.0.master.20161102211241.git5724b88.el7.centos
Host:
OS Version:RHEL - 7.3 - 7.el7
OS Description:Red Hat Enterprise Linux Server 7.3 (Maipo)
Kernel Version:3.10.0 - 514.el7.x86_64
KVM Version:2.6.0 - 27.el7
LIBVIRT Version:libvirt-2.0.0-10.el7
VDSM Version:vdsm-4.18.999-782.git97b30db.el7.centos

Comment 1 Israel Pinto 2016-11-08 12:54:54 UTC
Created attachment 1218518 [details]
engine.log

Comment 2 Israel Pinto 2016-11-08 12:55:31 UTC
Created attachment 1218519 [details]
vdsm log

Comment 3 Yaniv Kaul 2016-11-09 07:07:46 UTC
*** Bug 1392902 has been marked as a duplicate of this bug. ***

Comment 4 Israel Pinto 2016-11-23 08:09:34 UTC
Verify with:
Engine: 4.0.6-0.1.el7ev
Host:
OS Version:RHEL - 7.3 - 7.el7
OS Description:Red Hat Enterprise Linux Server 7.3 (Maipo)
Kernel Version:3.10.0 - 514.el7.x86_64
KVM Version:2.6.0 - 27.el7
LIBVIRT Version:libvirt-2.0.0-10.el7
VDSM Version:vdsm-4.18.16-1.el7ev

Steps:
Set host which HE VM running on to maintenance.
Results:
VM migrate to other host with success, no NPE found in log.

Comment 5 Israel Pinto 2016-11-23 08:49:46 UTC
Created attachment 1223061 [details]
repoen_engine.log

Comment 6 Israel Pinto 2016-11-23 08:51:52 UTC
rerun maintenance on host the manage the HE VM,
migrate pass but the NPE happened again, looks like rach condition at VM monitor
from engine log:
2016-11-23 03:35:04,912 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler9) [6c3b0c77] VM '2230569e-9357-4ab2-915a-767fa5532db1'(HostedEngine) moved from 'MigratingFrom' --> 'Down'
2016-11-23 03:35:04,912 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (DefaultQuartzScheduler9) [6c3b0c77] Failed during vms monitoring on host host_mixed_1 error is: java.lang.NullPointerException
2016-11-23 03:35:04,913 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (DefaultQuartzScheduler9) [6c3b0c77] Exception:: java.lang.NullPointerException
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.isVdsNonResponsive(VmAnalyzer.java:760) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.handOverVm(VmAnalyzer.java:739) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.proceedDownVm(VmAnalyzer.java:267) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.analyze(VmAnalyzer.java:154) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.lambda$analyzeVms$1(VmsMonitoring.java:161) [vdsbroker.jar:]
        at java.util.ArrayList.forEach(ArrayList.java:1249) [rt.jar:1.8.0_111]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.analyzeVms(VmsMonitoring.java:156) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.perform(VmsMonitoring.java:119) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher.poll(PollVmStatsRefresher.java:39) [vdsbroker.jar:]
        at sun.reflect.GeneratedMethodAccessor73.invoke(Unknown Source) [:1.8.0_111]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_111]
        at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_111]
        at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_111]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_111]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_111]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_111]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_111]

Comment 7 Arik 2016-11-23 12:25:42 UTC
(In reply to Israel Pinto from comment #6)
After [1] was merged, line 760 in VmAnalyzer is an empty line. The fact that the NPE happened on this line increases the chances that either 1) you don't use the up-to-date version of 4.0.6 or 2) the up-to-date version of 4.0.6 doesn't include the fix (less likely since I see the fix on 4.0.6 branch.

Please verify that you're testing the right version.

[1] https://gerrit.ovirt.org/#/c/66971/

Comment 8 Israel Pinto 2016-11-23 12:56:46 UTC
It looks like the fix [1] is not in the version 4.0.6-0.1,
Can you check it?
[1] https://gerrit.ovirt.org/#/c/66971/

Comment 9 Arik 2016-11-23 17:32:13 UTC
The fix was merged to the 4.0.6 branch and the problem described in comment 6 happened on some version of the code that doesn't include the fix. So it should be either MODIFIED (if the release for 4.0.6 didn't include the fix somehow) or ON_QA (if the mentioned problem happened with incorrect version or improper installation).
Anyhow, it cannot be switched to NEW since the problem didn't happen with the fix. Thus, switching to MODIFIED.

Comment 10 Israel Pinto 2016-11-23 18:37:46 UTC
(In reply to Arik from comment #9)
> The fix was merged to the 4.0.6 branch and the problem described in comment
> 6 happened on some version of the code that doesn't include the fix. So it
> should be either MODIFIED (if the release for 4.0.6 didn't include the fix
> somehow) or ON_QA (if the mentioned problem happened with incorrect version
> or improper installation).
> Anyhow, it cannot be switched to NEW since the problem didn't happen with
> the fix. Thus, switching to MODIFIED.

You right i should move to assigned, but we need sandro 
to check it. looks like the problem did not solved completely.

Comment 11 Arik 2016-11-23 19:43:13 UTC
(In reply to Israel Pinto from comment #10)
> You right i should move to assigned, but we need sandro 
> to check it. looks like the problem did not solved completely.

Israel, it is not that the problem was not completely solved - the problem you described on comment 6 happened on engine that doesn't include the fix. Note that in your stacktrace the NPE happened on line 760 of VmAnalyzer#isVdsNonResponsive and take a look at [1] - this is the place that the exception was raised before the fix and after the fix line 760 is an empty line which is located in a completely different method. The problem that happened to you is unrelated to the fix of this bz for sure.

[1] https://gerrit.ovirt.org/#/c/66971/3/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/monitoring/VmAnalyzer.java

Comment 12 Sandro Bonazzola 2016-11-24 07:32:28 UTC
(In reply to Israel Pinto from comment #8)
> It looks like the fix [1] is not in the version 4.0.6-0.1,
> Can you check it?
> [1] https://gerrit.ovirt.org/#/c/66971/

It will be in 4.0.6.1, it missed 4.0.6

Comment 13 Israel Pinto 2016-11-28 14:42:38 UTC
erify with:
Engine: 4.0.6.1-0.1.el7ev
Host:
OS Version:RHEL - 7.3 - 7.el7
Kernel Version:3.10.0 - 514.el7.x86_64
KVM Version:2.6.0 - 27.el7
LIBVIRT Version:libvirt-2.0.0-10.el7
VDSM Version:vdsm-4.18.17-1.el7ev

Steps:
Set host which HE VM running on to maintenance.
Results:
VM migrate to other host with success, no NPE found in log.