Hide Forgot
Created attachment 1143669 [details] engine log Description of problem: There is no time out on the setup networks operation in case of ClosedChannelException from vdsm. Engine waiting for a reply from vdsm, but in case vdsm is not responding we have no time out on the engine side and setup networks operation will hang out for ever and engine will continue pinging the server, until we restart the engine. Such situation can happen when moving the 'ovirtmgmt' network to other interface on host via setup networks. Like described in BZ 1323465 Version-Release number of selected component (if applicable): 3.6.5-0.1.el6 Steps to Reproduce: 1. Move the ovirtmgmt network to other interface on host via setup networks dialog Actual results: Sometimes, Setup Networks operation hangs out for ever and can't be rolled back because of a ClosedChannelException from vdsm. Engine have no response from vdsm and it waiting for ever without any time out. Restarting the ovirt-engine service will recover the engine. Expected results: We should add a time out for situations like that. Additional info: See also BZ - 1323465 During setupNetworks we get: 2016-04-04 16:20:57,184 DEBUG [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:257) [rt.jar:1.8.0_71] at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:300) [rt.jar:1.8.0_71] at org.ovirt.vdsm.jsonrpc.client.reactors.SSLEngineNioHelper.read(SSLEngineNioHelper.java:50) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.reactors.SSLClient.read(SSLClient.java:91) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.reactors.stomp.StompCommonClient.processIncoming(StompCommonClient.java:103) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.process(ReactorClient.java:204) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.reactors.SSLClient.process(SSLClient.java:125) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.reactors.Reactor.processChannels(Reactor.java:89) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.reactors.Reactor.run(Reactor.java:65) [vdsm-jsonrpc-java-client.jar:] which causes: 2016-04-04 16:20:57,249 DEBUG [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: java.lang.NullPointerException
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
oVirt 4.0 beta has been released, moving to RC milestone.
Comment on attachment 1143669 [details] engine log it seems, that provided attachment does not contain interesting log mentioned in comment.
I don't know what to do with this bug. There's no information I can use to find where was the failure. I can only propose merging of https://gerrit.ovirt.org/#/c/56602/ which might fix that. This adds the timeout you requests into HostSetupNetworksCommand. It's up to dans decision.