+++ This bug was initially created as a clone of Bug #1324155 +++ Description of problem: Sometimes engine gets so some wrong stats such that it prints to log following message every 5 seconds. The host (IP 192.168.122.201) becomes "Not responsive" in webadmin. However the host can be still connected to using ssh, `systemct status vdsmd` reports "active (running)", remote vdsClient calls works. 2016-04-05 17:50:32,124 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (Stomp Reactor) [] Unable to process messages 2016-04-05 17:50:34,123 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler_Worker-80) [] Command 'SpmStatusVDSCommand(HostName = fedora_host1, SpmStatusVDSCommandParameters:{runAsync='true', hostId='695c477c-81bb-4797-9c75-9dd7c7cca025', storagePoolId='00000001-0001-0001-0001-000000000028'})' execution failed: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection timeout 2016-04-05 17:50:35,126 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (Stomp Reactor) [] Connecting to /192.168.122.201 2016-04-05 17:50:35,129 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (Stomp Reactor) [] Unable to process messages 2016-04-05 17:50:37,128 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-25) [] Command 'GetCapabilitiesVDSCommand(HostName = fedora_host1, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='695c477c-81bb-4797-9c75-9dd7c7cca025', vds='Host[fedora_host1,695c477c-81bb-4797-9c75-9dd7c7cca025]'})' execution failed: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection timeout 2016-04-05 17:50:37,128 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler_Worker-25) [] Failure to refresh Vds runtime info: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection timeout 2016-04-05 17:50:37,128 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler_Worker-25) [] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection timeout at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createNetworkException(VdsBrokerCommand.java:157) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:120) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:73) [vdsbroker.jar:] at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:] at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:449) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:649) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:121) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refresh(HostMonitoring.java:85) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:234) [vdsbroker.jar:] at sun.reflect.GeneratedMethodAccessor82.invoke(Unknown Source) [:1.8.0_77] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_77] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_77] at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:] at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:] Caused by: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection timeout at org.ovirt.vdsm.jsonrpc.client.reactors.stomp.StompClient$1.execute(StompClient.java:56) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.reactors.stomp.StompClient.postConnect(StompClient.java:90) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.connect(ReactorClient.java:142) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.getClient(JsonRpcClient.java:114) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.call(JsonRpcClient.java:73) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:68) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcVdsServer.getCapabilities(JsonRpcVdsServer.java:255) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:15) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110) [vdsbroker.jar:] ... 14 more Version-Release number of selected component (if applicable): engine 4.0 master, commit 4ffd5a4 vdsm Version : 4.17.999 Release : 837.gite31d74d.fc23 How reproducible: ? Steps to Reproduce: 1. I don't know how to trigger it. It happens pretty randomly. Additional info: It can be work-arounded by restarting the engine. I've never noticed the engine to recover from this state. --- Additional comment from on 2016-04-05 13:04 EDT --- --- Additional comment from Piotr Kliczewski on 2016-04-06 02:45:52 EDT --- please provide engine log with debug level. --- Additional comment from on 2016-04-06 09:04:27 EDT --- I updated my log level, I will send the log when it happen again. --- Additional comment from on 2016-04-07 11:42 EDT --- --- Additional comment from Piotr Kliczewski on 2016-04-08 03:38:00 EDT --- It looks like you lost connectivity with a host x.x.122.154. Please provide vdsm log from this host around 2016-04-07 17:23:57,266. --- Additional comment from on 2016-04-08 05:48 EDT ---
oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target.
This bug was fixes together with BZ #1320128 moving to on_qa
BZ #1320128 is now closed and fixed in 3.6.6. According to #7 the fix for this BZ is the same as BZ #1320128. Shouldn't this BZ be closed or there are still some other pieces missing to close this one too?
*** This bug has been marked as a duplicate of bug 1323465 ***