Bug 1532884

Summary: NullPointerException after failure to refresh VDS
Product: [oVirt] ovirt-engine Reporter: Elad <ebenahar>
Component: BLL.InfraAssignee: Ravi Nori <rnori>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.2.1CC: bugs, ebenahar, lveyde, mperina, oliel
Target Milestone: ovirt-4.2.2Flags: rule-engine: ovirt-4.2+
Target Release: 4.2.2   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.2.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-18 12:25:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine and vdsm logs
none
engine.log-debug none

Description Elad 2018-01-09 23:59:44 UTC
Created attachment 1379290 [details]
engine and vdsm logs

Description of problem:
The following exception is thrown after a failure to refresh VDS:

2018-01-10 01:23:51,025+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-2) [] Failure to refresh host 'host_mixed_2' runtime info: null
2018-01-10 01:23:51,043+02 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engineScheduled-Thread-2) [] Failed to refresh VDS , vds = 'host_mixed_2' : '60b8472d-3fc0-48a1-964e-a1364d1
b4607', error = 'null', continuing.
2018-01-10 01:23:51,043+02 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engineScheduled-Thread-2) [] Exception: java.lang.NullPointerException

Version-Release number of selected component (if applicable):
rhvm-4.2.1-0.2.el7.noarch
vdsm-4.20.11-1.el7ev.x86_64

How reproducible:
Happens after failure to refresh host. The NPE doesn't seem to affect anything, hence, severity is set to low.


Steps to Reproduce:
The exact steps are unclear but it happens almost every time in RHV automation exectuion, for example, during execution of https://polarion.engineering.redhat.com/polarion/redirect/project/RHEVM3/workitem?id=RHEVM3-5994


The only thing that might be related in vdsm.log is:

2018-01-10 01:23:42,315+0200 WARN  (vdsm.Scheduler) [Executor] executor state: count=5 workers=set([<Worker name=periodic/14 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor obj
ect at 0x7faf0bb8e950> at 0x7faf0bb294d0> timeout=7.5, duration=0 at 0x7faf0b7fb850> task#=66 at 0x7faf0b84c290>, <Worker name=periodic/4 running <Task discardable <Operation action=<VmDispatcher operation=<clas
s 'vdsm.virt.periodic.BlockjobMonitor'> at 0x7faf0bacaf10> at 0x7faf0bacaf50> timeout=7.5, duration=0 at 0x7faf0b8b74d0> task#=772 at 0x7faf0ba890d0>, <Worker name=periodic/13 running <Task discardable <Operatio
n action=<VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x7faf0bacaf90> at 0x7faf0bacafd0> timeout=1.0, duration=1 at 0x7faf0b70aed0> discarded task#=232 at 0x7faf0bb0ccd0>, <Worke
r name=periodic/16 waiting task#=0 at 0x7faf0ba11b10>, <Worker name=periodic/15 running <Task discardable <Operation action=<VmDispatcher operation=<class 'vdsm.virt.periodic.UpdateVolumes'> at 0x7faf0baf9f90> a
t 0x7faf0bacad90> timeout=30.0, duration=1 at 0x7faf0b540f50> task#=60 at 0x7faf0b7d5690>]) (executor:213)



Expected results:
No NPE

Additional info:
engine and vdsm logs

Comment 2 Martin Perina 2018-01-10 08:55:24 UTC
I see 2 errors in the log:

1. Ravi, please focus on the NPE during monitoring, which this bug is talking about

2. Ori, any idea about below error?
2018-01-09 21:52:45,608+02 ERROR [org.ovirt.engine.core.bll.SearchQuery] (default task-12) [8004bfc1-76c5-43eb-bfb9-491c610f7dc7] Search object type not handled: Job

Comment 3 Ori Liel 2018-01-15 09:48:55 UTC
As for (2), the reason is a missing break statement, which results in the code executing properly but still issuing the misleading log error. This patch fixes this: 

  https://gerrit.ovirt.org/86337

(1) is a different problem, so perhaps this bugzilla should be split.

Comment 5 Ravi Nori 2018-01-31 17:57:08 UTC
Can you attach debug logs

Comment 6 Elad 2018-02-02 01:37:22 UTC
Created attachment 1389839 [details]
engine.log-debug

In DEBUG:


2018-02-02 02:46:02,850+02 DEBUG [org.ovirt.engine.core.dal.dbbroker.BatchProcedureExecutionConnectionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-38) [] Executed batch
2018-02-02 02:46:02,850+02 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engineScheduled-Thread-38) [] Failed to refresh VDS , vds = 'host_mixed_1' : 'e00894e8-ac29-4b20-9d8b-b45e02
3b4be5', error = 'null', continuing.
2018-02-02 02:46:02,851+02 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engineScheduled-Thread-38) [] Exception: java.lang.NullPointerException
        at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refreshCommitedMemory(HostMonitoring.java:687) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:128) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refresh(HostMonitoring.java:85) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsManager.refresh(VdsManager.java:267) [vdsbroker.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_151]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_151]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent.jar:1.0.
0.redhat-1]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) [javax.enterprise.concurrent.jar:1.0.0.redha
t-1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151]
        at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]
        at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)

2018-02-02 02:46:02,858+02 DEBUG [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (EE-ManagedThreadFactory-engine-Thread-12382) [] Executing task: EE-ManagedThreadFactory-engine-Thread-12382


==============
ovirt-engine-4.2.1.4-0.1.el7.noarch

Comment 7 Elad 2018-04-12 14:15:16 UTC
No NPE was thrown during automation execution of TestCase5994 (were the bug was found on)

Used:
rhvm-4.2.2.6-0.1.el7.noarch
vdsm-4.20.23-1.el7ev.x86_64

Comment 8 Sandro Bonazzola 2018-04-18 12:25:27 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.