Bug 1293828

Summary: NPE when referring to channel without checking status
Product: [oVirt] vdsm-jsonrpc-java Reporter: Oved Ourfali <oourfali>
Component: CoreAssignee: Piotr Kliczewski <pkliczew>
Status: CLOSED CURRENTRELEASE QA Contact: Lukas Svaty <lsvaty>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 1.1.4CC: bugs, mgoldboi, pkliczew, pstehlik, sbonazzo
Target Milestone: ovirt-3.6.2Keywords: CodeChange
Target Release: 1.1.6Flags: rule-engine: ovirt-3.6.z+
mgoldboi: planning_ack+
oourfali: devel_ack+
pstehlik: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-18 11:20:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Oved Ourfali 2015-12-23 08:15:14 UTC
Description of problem:
When we invoke postConnect method we schedule a task which register a
channel in selector but in between we may close the channel due to
issues. This causes NPE which can be mitigated by checking actual state
of the channel.


Version-Release number of selected component (if applicable):


How reproducible:
Occasionly.

Steps to Reproduce:
Hard to reproduce.
Might happen in slow network and short heartbrat interval.

Additional info:
2015-12-21 18:21:41,230 INFO  [org.ovirt.engine.core.vdsbroker.PollVmStatsRefresher] (DefaultQuartzScheduler_Worker-3) [] Failed to fetch vms info for host 'buri05' - skipping VMs monitoring.
2015-12-21 18:21:41,229 ERROR [org.ovirt.engine.core.vdsbroker.HostMonitoring] (DefaultQuartzScheduler_Worker-58) [] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.lang.NullPointerException
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createNetworkException(VdsBrokerCommand.java:157) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:120) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65) [vdsbroker.jar:]
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
        at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:634) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:119) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.HostMonitoring.refresh(HostMonitoring.java:84) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:226) [vdsbroker.jar:]
        at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) [:1.7.0_91]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_91]
        at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_91]
        at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81) [scheduler.jar:]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:]
Caused by: java.lang.NullPointerException
        at org.ovirt.vdsm.jsonrpc.client.reactors.SSLClient$2.call(SSLClient.java:137) [vdsm-jsonrpc-java-client.jar:]
        at org.ovirt.vdsm.jsonrpc.client.reactors.SSLClient$2.call(SSLClient.java:133) [vdsm-jsonrpc-java-client.jar:]
        at org.ovirt.vdsm.jsonrpc.client.utils.retry.Retryable.call(Retryable.java:27) [vdsm-jsonrpc-java-client.jar:]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [rt.jar:1.7.0_91]
        at org.ovirt.vdsm.jsonrpc.client.utils.ReactorScheduler.performPendingOperations(ReactorScheduler.java:28) [vdsm-jsonrpc-java-client.jar:]
        at org.ovirt.vdsm.jsonrpc.client.reactors.Reactor.run(Reactor.java:61) [vdsm-jsonrpc-java-client.jar:]

Comment 1 Lukas Svaty 2016-01-25 14:58:00 UTC
What flow do we need to cover for verification of this?

Comment 2 Red Hat Bugzilla Rules Engine 2016-01-25 14:58:01 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 3 Lukas Svaty 2016-01-25 14:59:39 UTC
moving back to ON_QA (fixing my mistake from comment#1)

Comment 4 Red Hat Bugzilla Rules Engine 2016-01-25 14:59:41 UTC
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.

Comment 5 Piotr Kliczewski 2016-01-25 15:40:42 UTC
In order to test it is required to check host connectivity using slow network.

Comment 6 Lukas Svaty 2016-02-02 09:48:54 UTC
Anything special I should look for? (Specific VDSM/engine errors?

Comment 7 Piotr Kliczewski 2016-02-02 10:34:44 UTC
There are no special errors. It is highly time dependent and if we were slow enough it could throw NPE.

Comment 8 Lukas Svaty 2016-02-02 11:32:53 UTC
As this bug does not have any specific test and has low reproducibility, tests will be run over multiple runs (host operation) on slow network. I'll provide information once I'll either find the mentioned NPE or will have enough run so I can verify this functionality. For slow connection will be used either connection of BRQ-TLV, or BRQ-BOSTON.

Comment 9 Lukas Svaty 2016-02-04 11:58:26 UTC
Tested it on multiple scenarios (deploy, move to maintenance, activate, reinstall, PM actions) over night. No NPE appeared in the logs. Moving to verified if in the feature this issue will re-appear please re-open the bug.

Tested on rhevm-3.6.3-0.1.el6.noarch