Bug 1392903 - VM is not monitored by engine, NPE @ org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.isVdsNonResponsive
Summary: VM is not monitored by engine, NPE @ org.ovirt.engine.core.vdsbroker.monitori...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ovirt-4.0.6
: 4.0.6.1
Assignee: Arik
QA Contact: Israel Pinto
URL:
Whiteboard:
: 1392902 (view as bug list)
Depends On:
Blocks: 1362618
TreeView+ depends on / blocked
 
Reported: 2016-11-08 12:53 UTC by Israel Pinto
Modified: 2017-01-18 07:25 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-18 07:25:48 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.0.z+
mgoldboi: planning_ack+
michal.skrivanek: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
engine.log (842.92 KB, application/x-gzip)
2016-11-08 12:54 UTC, Israel Pinto
no flags Details
vdsm log (521.01 KB, application/x-gzip)
2016-11-08 12:55 UTC, Israel Pinto
no flags Details
repoen_engine.log (286.31 KB, application/x-gzip)
2016-11-23 08:49 UTC, Israel Pinto
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 66657 0 master MERGED core: prevent NPE on unmanaged migrations 2020-12-23 15:13:29 UTC
oVirt gerrit 66970 0 ovirt-engine-4.0 MERGED core: prevent NPE on unmanaged migrations 2020-12-23 15:13:31 UTC
oVirt gerrit 66971 0 ovirt-engine-4.0.6 MERGED core: prevent NPE on unmanaged migrations 2020-12-23 15:13:31 UTC

Description Israel Pinto 2016-11-08 12:53:59 UTC
Description of problem:
VM is not manage by the engine, it stack in migrate state. 
After starting VM, perform migrate to VM, VM stauck in migrate state.
According to engine log, VM did not run on host although it looked UP from UI. 
Engine log:
2016-11-08 09:58:15,150 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler9) [4923421b] VM 'bbebf175-5b9b-46db-ae04-a0f5c32eeb6d'(cpu_hotplug_vm_with_load) is running in db and not running on VDS 'f18b5b1d-de10-4856-99da-cc86fc55ed96'(host_mixed_1)

2016-11-08 09:58:15,152 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (DefaultQuartzScheduler9) [4923421b] Failed during monitoring vm: bbebf175-5b9b-46db-ae04-a0f5c32eeb6d , error is: java.lang.NullPointerException
2016-11-08 09:58:15,153 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (DefaultQuartzScheduler9) [4923421b] Exception:: java.lang.NullPointerException
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.isVdsNonResponsive(VmAnalyzer.java:764) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.handOverVm(VmAnalyzer.java:753) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.proceedDisappearedVm(VmAnalyzer.java:713) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.analyze(VmAnalyzer.java:134) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.lambda$analyzeVms$1(VmsMonitoring.java:151) [vdsbroker.jar:]
        at java.util.ArrayList.forEach(ArrayList.java:1249) [rt.jar:1.8.0_102]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.analyzeVms(VmsMonitoring.java:146) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.perform(VmsMonitoring.java:109) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher.poll(PollVmStatsRefresher.java:38) [vdsbroker.jar:]
        at sun.reflect.GeneratedMethodAccessor231.invoke(Unknown Source) [:1.8.0_102]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_102]
        at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_102]
        at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_102]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_102]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_102]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_102]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_102]


Version-Release number of selected component (if applicable):
Engine: oVirt Engine Version: 4.1.0-0.0.master.20161102211241.git5724b88.el7.centos
Host:
OS Version:RHEL - 7.3 - 7.el7
OS Description:Red Hat Enterprise Linux Server 7.3 (Maipo)
Kernel Version:3.10.0 - 514.el7.x86_64
KVM Version:2.6.0 - 27.el7
LIBVIRT Version:libvirt-2.0.0-10.el7
VDSM Version:vdsm-4.18.999-782.git97b30db.el7.centos

Comment 1 Israel Pinto 2016-11-08 12:54:54 UTC
Created attachment 1218518 [details]
engine.log

Comment 2 Israel Pinto 2016-11-08 12:55:31 UTC
Created attachment 1218519 [details]
vdsm log

Comment 3 Yaniv Kaul 2016-11-09 07:07:46 UTC
*** Bug 1392902 has been marked as a duplicate of this bug. ***

Comment 4 Israel Pinto 2016-11-23 08:09:34 UTC
Verify with:
Engine: 4.0.6-0.1.el7ev
Host:
OS Version:RHEL - 7.3 - 7.el7
OS Description:Red Hat Enterprise Linux Server 7.3 (Maipo)
Kernel Version:3.10.0 - 514.el7.x86_64
KVM Version:2.6.0 - 27.el7
LIBVIRT Version:libvirt-2.0.0-10.el7
VDSM Version:vdsm-4.18.16-1.el7ev

Steps:
Set host which HE VM running on to maintenance.
Results:
VM migrate to other host with success, no NPE found in log.

Comment 5 Israel Pinto 2016-11-23 08:49:46 UTC
Created attachment 1223061 [details]
repoen_engine.log

Comment 6 Israel Pinto 2016-11-23 08:51:52 UTC
rerun maintenance on host the manage the HE VM,
migrate pass but the NPE happened again, looks like rach condition at VM monitor
from engine log:
2016-11-23 03:35:04,912 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler9) [6c3b0c77] VM '2230569e-9357-4ab2-915a-767fa5532db1'(HostedEngine) moved from 'MigratingFrom' --> 'Down'
2016-11-23 03:35:04,912 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (DefaultQuartzScheduler9) [6c3b0c77] Failed during vms monitoring on host host_mixed_1 error is: java.lang.NullPointerException
2016-11-23 03:35:04,913 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (DefaultQuartzScheduler9) [6c3b0c77] Exception:: java.lang.NullPointerException
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.isVdsNonResponsive(VmAnalyzer.java:760) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.handOverVm(VmAnalyzer.java:739) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.proceedDownVm(VmAnalyzer.java:267) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.analyze(VmAnalyzer.java:154) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.lambda$analyzeVms$1(VmsMonitoring.java:161) [vdsbroker.jar:]
        at java.util.ArrayList.forEach(ArrayList.java:1249) [rt.jar:1.8.0_111]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.analyzeVms(VmsMonitoring.java:156) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.perform(VmsMonitoring.java:119) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher.poll(PollVmStatsRefresher.java:39) [vdsbroker.jar:]
        at sun.reflect.GeneratedMethodAccessor73.invoke(Unknown Source) [:1.8.0_111]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_111]
        at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_111]
        at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_111]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_111]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_111]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_111]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_111]

Comment 7 Arik 2016-11-23 12:25:42 UTC
(In reply to Israel Pinto from comment #6)
After [1] was merged, line 760 in VmAnalyzer is an empty line. The fact that the NPE happened on this line increases the chances that either 1) you don't use the up-to-date version of 4.0.6 or 2) the up-to-date version of 4.0.6 doesn't include the fix (less likely since I see the fix on 4.0.6 branch.

Please verify that you're testing the right version.

[1] https://gerrit.ovirt.org/#/c/66971/

Comment 8 Israel Pinto 2016-11-23 12:56:46 UTC
It looks like the fix [1] is not in the version 4.0.6-0.1,
Can you check it?
[1] https://gerrit.ovirt.org/#/c/66971/

Comment 9 Arik 2016-11-23 17:32:13 UTC
The fix was merged to the 4.0.6 branch and the problem described in comment 6 happened on some version of the code that doesn't include the fix. So it should be either MODIFIED (if the release for 4.0.6 didn't include the fix somehow) or ON_QA (if the mentioned problem happened with incorrect version or improper installation).
Anyhow, it cannot be switched to NEW since the problem didn't happen with the fix. Thus, switching to MODIFIED.

Comment 10 Israel Pinto 2016-11-23 18:37:46 UTC
(In reply to Arik from comment #9)
> The fix was merged to the 4.0.6 branch and the problem described in comment
> 6 happened on some version of the code that doesn't include the fix. So it
> should be either MODIFIED (if the release for 4.0.6 didn't include the fix
> somehow) or ON_QA (if the mentioned problem happened with incorrect version
> or improper installation).
> Anyhow, it cannot be switched to NEW since the problem didn't happen with
> the fix. Thus, switching to MODIFIED.

You right i should move to assigned, but we need sandro 
to check it. looks like the problem did not solved completely.

Comment 11 Arik 2016-11-23 19:43:13 UTC
(In reply to Israel Pinto from comment #10)
> You right i should move to assigned, but we need sandro 
> to check it. looks like the problem did not solved completely.

Israel, it is not that the problem was not completely solved - the problem you described on comment 6 happened on engine that doesn't include the fix. Note that in your stacktrace the NPE happened on line 760 of VmAnalyzer#isVdsNonResponsive and take a look at [1] - this is the place that the exception was raised before the fix and after the fix line 760 is an empty line which is located in a completely different method. The problem that happened to you is unrelated to the fix of this bz for sure.

[1] https://gerrit.ovirt.org/#/c/66971/3/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/monitoring/VmAnalyzer.java

Comment 12 Sandro Bonazzola 2016-11-24 07:32:28 UTC
(In reply to Israel Pinto from comment #8)
> It looks like the fix [1] is not in the version 4.0.6-0.1,
> Can you check it?
> [1] https://gerrit.ovirt.org/#/c/66971/

It will be in 4.0.6.1, it missed 4.0.6

Comment 13 Israel Pinto 2016-11-28 14:42:38 UTC
erify with:
Engine: 4.0.6.1-0.1.el7ev
Host:
OS Version:RHEL - 7.3 - 7.el7
Kernel Version:3.10.0 - 514.el7.x86_64
KVM Version:2.6.0 - 27.el7
LIBVIRT Version:libvirt-2.0.0-10.el7
VDSM Version:vdsm-4.18.17-1.el7ev

Steps:
Set host which HE VM running on to maintenance.
Results:
VM migrate to other host with success, no NPE found in log.


Note You need to log in before you can comment on or make changes to this bug.