Description of problem: VM is not manage by the engine, it stack in migrate state. After starting VM, perform migrate to VM, VM stauck in migrate state. According to engine log, VM did not run on host although it looked UP from UI. Engine log: 2016-11-08 09:58:15,150 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler9) [4923421b] VM 'bbebf175-5b9b-46db-ae04-a0f5c32eeb6d'(cpu_hotplug_vm_with_load) is running in db and not running on VDS 'f18b5b1d-de10-4856-99da-cc86fc55ed96'(host_mixed_1) 2016-11-08 09:58:15,152 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (DefaultQuartzScheduler9) [4923421b] Failed during monitoring vm: bbebf175-5b9b-46db-ae04-a0f5c32eeb6d , error is: java.lang.NullPointerException 2016-11-08 09:58:15,153 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (DefaultQuartzScheduler9) [4923421b] Exception:: java.lang.NullPointerException at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.isVdsNonResponsive(VmAnalyzer.java:764) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.handOverVm(VmAnalyzer.java:753) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.proceedDisappearedVm(VmAnalyzer.java:713) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.analyze(VmAnalyzer.java:134) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.lambda$analyzeVms$1(VmsMonitoring.java:151) [vdsbroker.jar:] at java.util.ArrayList.forEach(ArrayList.java:1249) [rt.jar:1.8.0_102] at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.analyzeVms(VmsMonitoring.java:146) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.perform(VmsMonitoring.java:109) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher.poll(PollVmStatsRefresher.java:38) [vdsbroker.jar:] at sun.reflect.GeneratedMethodAccessor231.invoke(Unknown Source) [:1.8.0_102] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_102] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_102] at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:] at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_102] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_102] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_102] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_102] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_102] Version-Release number of selected component (if applicable): Engine: oVirt Engine Version: 4.1.0-0.0.master.20161102211241.git5724b88.el7.centos Host: OS Version:RHEL - 7.3 - 7.el7 OS Description:Red Hat Enterprise Linux Server 7.3 (Maipo) Kernel Version:3.10.0 - 514.el7.x86_64 KVM Version:2.6.0 - 27.el7 LIBVIRT Version:libvirt-2.0.0-10.el7 VDSM Version:vdsm-4.18.999-782.git97b30db.el7.centos
Created attachment 1218518 [details] engine.log
Created attachment 1218519 [details] vdsm log
*** Bug 1392902 has been marked as a duplicate of this bug. ***
Verify with: Engine: 4.0.6-0.1.el7ev Host: OS Version:RHEL - 7.3 - 7.el7 OS Description:Red Hat Enterprise Linux Server 7.3 (Maipo) Kernel Version:3.10.0 - 514.el7.x86_64 KVM Version:2.6.0 - 27.el7 LIBVIRT Version:libvirt-2.0.0-10.el7 VDSM Version:vdsm-4.18.16-1.el7ev Steps: Set host which HE VM running on to maintenance. Results: VM migrate to other host with success, no NPE found in log.
Created attachment 1223061 [details] repoen_engine.log
rerun maintenance on host the manage the HE VM, migrate pass but the NPE happened again, looks like rach condition at VM monitor from engine log: 2016-11-23 03:35:04,912 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler9) [6c3b0c77] VM '2230569e-9357-4ab2-915a-767fa5532db1'(HostedEngine) moved from 'MigratingFrom' --> 'Down' 2016-11-23 03:35:04,912 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (DefaultQuartzScheduler9) [6c3b0c77] Failed during vms monitoring on host host_mixed_1 error is: java.lang.NullPointerException 2016-11-23 03:35:04,913 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (DefaultQuartzScheduler9) [6c3b0c77] Exception:: java.lang.NullPointerException at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.isVdsNonResponsive(VmAnalyzer.java:760) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.handOverVm(VmAnalyzer.java:739) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.proceedDownVm(VmAnalyzer.java:267) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.analyze(VmAnalyzer.java:154) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.lambda$analyzeVms$1(VmsMonitoring.java:161) [vdsbroker.jar:] at java.util.ArrayList.forEach(ArrayList.java:1249) [rt.jar:1.8.0_111] at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.analyzeVms(VmsMonitoring.java:156) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.perform(VmsMonitoring.java:119) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher.poll(PollVmStatsRefresher.java:39) [vdsbroker.jar:] at sun.reflect.GeneratedMethodAccessor73.invoke(Unknown Source) [:1.8.0_111] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_111] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_111] at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:] at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_111] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_111] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_111]
(In reply to Israel Pinto from comment #6) After [1] was merged, line 760 in VmAnalyzer is an empty line. The fact that the NPE happened on this line increases the chances that either 1) you don't use the up-to-date version of 4.0.6 or 2) the up-to-date version of 4.0.6 doesn't include the fix (less likely since I see the fix on 4.0.6 branch. Please verify that you're testing the right version. [1] https://gerrit.ovirt.org/#/c/66971/
It looks like the fix [1] is not in the version 4.0.6-0.1, Can you check it? [1] https://gerrit.ovirt.org/#/c/66971/
The fix was merged to the 4.0.6 branch and the problem described in comment 6 happened on some version of the code that doesn't include the fix. So it should be either MODIFIED (if the release for 4.0.6 didn't include the fix somehow) or ON_QA (if the mentioned problem happened with incorrect version or improper installation). Anyhow, it cannot be switched to NEW since the problem didn't happen with the fix. Thus, switching to MODIFIED.
(In reply to Arik from comment #9) > The fix was merged to the 4.0.6 branch and the problem described in comment > 6 happened on some version of the code that doesn't include the fix. So it > should be either MODIFIED (if the release for 4.0.6 didn't include the fix > somehow) or ON_QA (if the mentioned problem happened with incorrect version > or improper installation). > Anyhow, it cannot be switched to NEW since the problem didn't happen with > the fix. Thus, switching to MODIFIED. You right i should move to assigned, but we need sandro to check it. looks like the problem did not solved completely.
(In reply to Israel Pinto from comment #10) > You right i should move to assigned, but we need sandro > to check it. looks like the problem did not solved completely. Israel, it is not that the problem was not completely solved - the problem you described on comment 6 happened on engine that doesn't include the fix. Note that in your stacktrace the NPE happened on line 760 of VmAnalyzer#isVdsNonResponsive and take a look at [1] - this is the place that the exception was raised before the fix and after the fix line 760 is an empty line which is located in a completely different method. The problem that happened to you is unrelated to the fix of this bz for sure. [1] https://gerrit.ovirt.org/#/c/66971/3/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/monitoring/VmAnalyzer.java
(In reply to Israel Pinto from comment #8) > It looks like the fix [1] is not in the version 4.0.6-0.1, > Can you check it? > [1] https://gerrit.ovirt.org/#/c/66971/ It will be in 4.0.6.1, it missed 4.0.6
erify with: Engine: 4.0.6.1-0.1.el7ev Host: OS Version:RHEL - 7.3 - 7.el7 Kernel Version:3.10.0 - 514.el7.x86_64 KVM Version:2.6.0 - 27.el7 LIBVIRT Version:libvirt-2.0.0-10.el7 VDSM Version:vdsm-4.18.17-1.el7ev Steps: Set host which HE VM running on to maintenance. Results: VM migrate to other host with success, no NPE found in log.