Bug 1323923

Summary: No timeout on setup networks operation in case of ClosedChannelException
Product: [oVirt] ovirt-engine Reporter: Michael Burman <mburman>
Component: BLL.NetworkAssignee: Dan Kenigsberg <danken>
Status: CLOSED WONTFIX QA Contact: Meni Yakove <myakove>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.5CC: bugs, danken, ylavi
Target Milestone: ---Flags: sbonazzo: ovirt-4.1-
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-05 06:10:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
engine log none

Description Michael Burman 2016-04-05 05:36:26 UTC
Created attachment 1143669 [details]
engine log

Description of problem:
There is no time out on the setup networks operation in case of ClosedChannelException from vdsm. 

Engine waiting for a reply from vdsm, but in case vdsm is not responding we have no time out on the engine side and setup networks operation will hang out for ever and engine will continue pinging the server, until we restart the engine. 

Such situation can happen when moving the 'ovirtmgmt' network to other interface on host via setup networks. 
Like described in BZ 1323465
 

Version-Release number of selected component (if applicable):
3.6.5-0.1.el6

Steps to Reproduce:
1. Move the ovirtmgmt network to other interface on host via setup networks dialog

Actual results:
Sometimes, Setup Networks operation hangs out for ever and can't be rolled back because of a ClosedChannelException from vdsm. Engine have no response from vdsm and it waiting for ever without any time out.
Restarting the ovirt-engine service will recover the engine. 

Expected results:
We should add a time out for situations like that.

Additional info:
See also BZ - 1323465

During setupNetworks we get:

2016-04-04 16:20:57,184 DEBUG [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages: java.nio.channels.ClosedChannelException
        at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:257) [rt.jar:1.8.0_71]
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:300) [rt.jar:1.8.0_71]
        at org.ovirt.vdsm.jsonrpc.client.reactors.SSLEngineNioHelper.read(SSLEngineNioHelper.java:50) [vdsm-jsonrpc-java-client.jar:]
        at org.ovirt.vdsm.jsonrpc.client.reactors.SSLClient.read(SSLClient.java:91) [vdsm-jsonrpc-java-client.jar:]
        at org.ovirt.vdsm.jsonrpc.client.reactors.stomp.StompCommonClient.processIncoming(StompCommonClient.java:103) [vdsm-jsonrpc-java-client.jar:]
        at org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.process(ReactorClient.java:204) [vdsm-jsonrpc-java-client.jar:]
        at org.ovirt.vdsm.jsonrpc.client.reactors.SSLClient.process(SSLClient.java:125) [vdsm-jsonrpc-java-client.jar:]
        at org.ovirt.vdsm.jsonrpc.client.reactors.Reactor.processChannels(Reactor.java:89) [vdsm-jsonrpc-java-client.jar:]
        at org.ovirt.vdsm.jsonrpc.client.reactors.Reactor.run(Reactor.java:65) [vdsm-jsonrpc-java-client.jar:]


which causes:

2016-04-04 16:20:57,249 DEBUG [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: java.lang.NullPointerException

Comment 1 Sandro Bonazzola 2016-05-02 09:52:09 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 2 Yaniv Lavi 2016-05-23 13:15:13 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 3 Yaniv Lavi 2016-05-23 13:20:00 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 4 Martin Mucha 2016-06-01 13:59:06 UTC
Comment on attachment 1143669 [details]
engine log

it seems, that provided attachment does not contain interesting log mentioned in comment.

Comment 5 Martin Mucha 2016-07-19 13:55:09 UTC
I don't know what to do with this bug. There's no information I can use to find where was the failure. I can only propose merging of 
https://gerrit.ovirt.org/#/c/56602/
which might fix that. This adds the timeout you requests into HostSetupNetworksCommand. It's up to dans decision.