Created attachment 1219244 [details] engine logs Description of problem: Servers state isn't stable and they changing state to non-responsive every few minutes. The servers on latest master are moving from non-responsive to up every few minutes. 2016-11-10 10:16:47,116 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null 2016-11-10 10:16:47,116 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null 2016-11-10 10:16:47,128 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler3) [31d4820c] Command 'GetAllVmStatsVDSCommand(HostName = orchid-vds1.qa.lab.tlv.redhat.com, VdsIdVDSCommandParametersBase:{runAsync='true', hostId='a8ac96b7-bce5-4039-b4ee-e608f34ceac7'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2016-11-10 10:16:47,128 INFO [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher] (DefaultQuartzScheduler3) [31d4820c] Failed to fetch vms info for host 'orchid-vds1.qa.lab.tlv.redhat.com' - skipping VMs monitoring. 2016-11-10 10:16:47,154 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler2) [7e87bdde] Command 'GetAllVmStatsVDSCommand(HostName = navy-vds1.qa.lab.tlv.redhat.com, VdsIdVDSCommandParametersBase:{runAsync='true', hostId='cd60d5ed-3b0e-46e5-be1a-272ed6516c45'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2016-11-10 10:16:47,154 INFO [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher] (DefaultQuartzScheduler2) [7e87bdde] Failed to fetch vms info for host 'navy-vds1.qa.lab.tlv.redhat.com' - skipping VMs monitoring. 2016-11-10 10:16:47,157 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler7) [7e920fc] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM orchid-vds1.qa.lab.tlv.redhat.com command failed: Heartbeat exceeded 2016-11-10 10:16:47,158 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand] (DefaultQuartzScheduler7) [7e920fc] Command 'GetStatsVDSCommand(HostName = orchid-vds1.qa.lab.tlv.redhat.com, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='a8ac96b7-bce5-4039-b4ee-e608f34ceac7', vds='Host[orchid-vds1.qa.lab.tlv.redhat.com,a8ac96b7-bce5-4039-b4ee-e608f34ceac7]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2016-11-10 10:16:47,158 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler7) [7e920fc] Failed getting vds stats, host='orchid-vds1.qa.lab.tlv.redhat.com'(a8ac96b7-bce5-4039-b4ee-e608f34ceac7): org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2016-11-10 10:16:47,158 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler7) [7e920fc] Failure to refresh host 'orchid-vds1.qa.lab.tlv.redhat.com' runtime info: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2016-11-10 10:16:47,158 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler7) [7e920fc] Failed to refresh VDS, network error, continuing, vds='orchid-vds1.qa.lab.tlv.redhat.com'(a8ac96b7-bce5-4039-b4ee-e608f34ceac7): VDSGenericException: VDSNetworkException: Heartbeat exceeded 2016-11-10 10:16:47,159 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-6-thread-17) [31d4820c] Host 'orchid-vds1.qa.lab.tlv.redhat.com' is not responding. 2016-11-10 10:16:47,164 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler10) [33fad06d] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM orchid-vds1.qa.lab.tlv.redhat.com command failed: Heartbeat exceeded 2016-11-10 10:16:47,165 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler10) [33fad06d] Command 'SpmStatusVDSCommand(HostName = orchid-vds1.qa.lab.tlv.redhat.com, SpmStatusVDSCommandParameters:{runAsync='true', hostId='a8ac96b7-bce5-4039-b4ee-e608f34ceac7', storagePoolId='ea5798e5-47b6-4e81-8005-cceffdddcd5b'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2016-11-10 10:16:47,180 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler8) [db08d74] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM navy-vds1.qa.lab.tlv.redhat.com command failed: Heartbeat exceeded 2016-11-10 10:16:47,181 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand] (DefaultQuartzScheduler8) [db08d74] Command 'GetStatsVDSCommand(HostName = navy-vds1.qa.lab.tlv.redhat.com, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='cd60d5ed-3b0e-46e5-be1a-272ed6516c45', vds='Host[navy-vds1.qa.lab.tlv.redhat.com,cd60d5ed-3b0e-46e5-be1a-272ed6516c45]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2016-11-10 10:16:47,181 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler8) [db08d74] Failed getting vds stats, host='navy-vds1.qa.lab.tlv.redhat.com'(cd60d5ed-3b0e-46e5-be1a-272ed6516c45): org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2016-11-10 10:16:47,181 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler8) [db08d74] Failure to refresh host 'navy-vds1.qa.lab.tlv.redhat.com' runtime info: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2016-11-10 10:16:47,181 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler8) [db08d74] Failed to refresh VDS, network error, continuing, vds='navy-vds1.qa.lab.tlv.redhat.com'(cd60d5ed-3b0e-46e5-be1a-272ed6516c45): VDSGenericException: VDSNetworkException: Heartbeat exceeded 2016-11-10 10:21:11,124 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to orchid-vds1.qa.lab. tlv.redhat.com/10.35.128.22 2016-11-10 10:21:11,579 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to navy-vds1.qa.lab.tl v.redhat.com/10.35.128.14 2016-11-10 10:21:11,591 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "2d5664fa-cc8 e-459d-b282-623f619b6445" 2016-11-10 10:21:11,591 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "3aecd3b6-43f 3-4caf-8642-486156591a97" Version-Release number of selected component (if applicable): 4.1.0-0.0.master.20161109091313.gitedb19fb.el7.centos How reproducible: 100
Can you attach vdsm.log as well?
Created attachment 1219247 [details] vdsm logs
Looking at the logs I see that issues started from: 2016-11-10 03:45:58,466 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages: Connection reset by peer It is just after installation of camel-vdsa.qa.lab.tlv.redhat.com. Later we see plenty of failures for other hosts due to heart beat exception. Please make sure that there are no networking issues between engine, vdsm and storage.
Also, I see in supervdsm on navy-vds1 that the "n1" interface failed to get ip address. Is that a required network?
I tried to connect to camel-vdsa.qa.lab.tlv.redhat.com but I was unable to. It seems that adding this host caused network instability which resulted in heartbeats for other hosts and later engine was not able to reconnect to them. Do we know what happened to camel host?
We don't have network issues. I think i understand what triggers this behavior, it's happening on 4.0.5 as well. Once a host in an OVS cluster is non-responding, this behavior starts and all other servers in the DC(legacy cluster) starting to change state to non-responsive and up again. Attaching new engine logs, i have initiated a reboot on an ovs host and this behavior started right away and all hosts on legacy cluster became non-responsive. I think it some how related to ovs, but i don't understand how exactly.
(In reply to Piotr Kliczewski from comment #5) > I tried to connect to camel-vdsa.qa.lab.tlv.redhat.com but I was unable to. > It seems that adding this host caused network instability which resulted in > heartbeats for other hosts and later engine was not able to reconnect to > them. > > Do we know what happened to camel host? Yes, it's an ovs host(in ovs cluster) which i rebooted in order to reproduce this bug)
Created attachment 1219253 [details] engine log
Michael, I seems that it could be related. Please provide tcpdump for the engine. In my opinion it seems to be network team issue.
I checked the logs and it was not installation but host upgrade manager.
It looks like ovs affects engine to vdsm communication which needs to be investigated by the network team. tcpdump taken on the engine host still would be nice to have.
Created attachment 1219292 [details] tcpdump It's a tcpdump of the engine with one of the server that changing state once the host in the ovs cluster going down(src+dst)
Updating this report, this issue happens with legacy servers as well, once a server is non-responsive with engine, all other servers in the DC affected. both on master and 4.0.5
Can someone investigate this report? it is a 100% reproducible on all setups and versions. And it's not network related, we are tested if there was a network issue communications with engine and other hosts in the setup once 1 server is gone to non-responsive state and we had no disconnections between the engine and other hosts, but it looks like engine is mistake/confused and lie about the other connection with the other servers in the setup. We run the next script on the engine with the servers in the setup: from time import time, sleep from vdsm import jsonrpcvdscli import sys host = sys.argv[1] if len(sys.argv) > 1 else 'localhost' s = jsonrpcvdscli.connect(host=host) while True: t0 = time() try: s.ping() print(time() - t0) except Exception: print("No response for JSON-RPC Host.ping request") sleep(0.5) There were no prints and ping was consistent, but engine report that the servers state is changing every few minutes. This can be caused by current changes in the vdsm-jsonrpc..
Michael, Please do one test for me. Take any version you tested and configure bunch of hosts. Block one of them and see if you will be able to see the same results. Once you reproduce the same issue without using ovs please attach the logs.
Piotr, I'm attaching the engine log in debug mode from our tests today.
Created attachment 1220866 [details] engine in debug
Micheal, Can you check which versions are affected?
(In reply to Piotr Kliczewski from comment #18) > Micheal, Can you check which versions are affected? Latest master 4.1 and rhevm-4.0.5.5-0.1.el7ev.noarch
During my investigation of the issue I saw that when one host is down every 3 - 4 mins other hosts go down for less than a second with heartbeat exceeded and they are Up again. This happens for the duration of a host being down. The fix was tested on Michael's environment.
The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified.
Verified on - 4.1.0-0.2.master.20161210231201.git26a385e.el7.centos
I think i have same problem on my oVirt 4.0.5 installation. How can i solve this ? [root@prd-rbs-ovirt01-poa ~]# grep -i error /var/log/ovirt-engine/engine.log | grep -v org.ovirt.engine.core.vdsbroker.HostDevListByCapsVDSCommand | tail -50 2017-01-06 16:20:06,897 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler6) [a6fc972] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm10-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='8b4bc7dc-af8c-4415-a2ef-bccc11ddf23a', vds='Host[prd-rbs-ovirt-kvm10-poa.rbs.com.br,8b4bc7dc-af8c-4415-a2ef-bccc11ddf23a]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2017-01-06 16:20:06,899 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler10) [19b85a77] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm17-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='795b917b-ea5b-499d-80c5-aa3aad4f2537', vds='Host[prd-rbs-ovirt-kvm17-poa.rbs.com.br,795b917b-ea5b-499d-80c5-aa3aad4f2537]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2017-01-06 16:20:06,901 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler8) [1bccfc8a] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm19-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='887b6e35-1fd1-4cd6-9e78-05bcab12a417', vds='Host[prd-rbs-ovirt-kvm19-poa.rbs.com.br,887b6e35-1fd1-4cd6-9e78-05bcab12a417]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2017-01-06 16:23:26,020 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler1) [7771902b] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm06-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='23b750ed-4ba6-4782-a9bd-c018c0f36e44', vds='Host[prd-rbs-ovirt-kvm06-poa.rbs.com.br,23b750ed-4ba6-4782-a9bd-c018c0f36e44]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2017-01-06 16:23:26,043 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "e5f7a478-841d-413c-baa1-d63632be7748" 2017-01-06 16:24:32,480 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "75bf7e9a-fb5f-4253-bdf6-ff0fcbf8c876" 2017-01-06 16:24:32,485 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler6) [35cc958c] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM prd-rbs-ovirt-kvm06-poa.rbs.com.br command failed: Heartbeat exceeded 2017-01-06 16:24:32,485 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler6) [35cc958c] Command 'SpmStatusVDSCommand(HostName = prd-rbs-ovirt-kvm06-poa.rbs.com.br, SpmStatusVDSCommandParameters:{runAsync='true', hostId='23b750ed-4ba6-4782-a9bd-c018c0f36e44', storagePoolId='00000001-0001-0001-0001-000000000198'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2017-01-06 16:24:32,524 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler6) [152cde35] Correlation ID: 152cde35, Call Stack: null, Custom Event ID: -1, Message: Invalid status on Data Center RBS. Setting Data Center status to Non Responsive (On host prd-rbs-ovirt-kvm06-poa.rbs.com.br, Error: Network error during communication with the Host.). 2017-01-06 16:25:37,493 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null 2017-01-06 16:25:37,494 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null 2017-01-06 16:25:37,494 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null 2017-01-06 16:25:37,494 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null 2017-01-06 16:25:37,494 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null 2017-01-06 16:25:37,494 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null 2017-01-06 16:25:37,495 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler4) [44e1d6ed] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm03-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='f7842244-646c-400a-9736-f8d4aa9b1cef', vds='Host[prd-rbs-ovirt-kvm03-poa.rbs.com.br,f7842244-646c-400a-9736-f8d4aa9b1cef]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2017-01-06 16:25:37,499 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler5) [4fb9c0c9] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm17-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='795b917b-ea5b-499d-80c5-aa3aad4f2537', vds='Host[prd-rbs-ovirt-kvm17-poa.rbs.com.br,795b917b-ea5b-499d-80c5-aa3aad4f2537]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2017-01-06 16:25:37,499 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler7) [7bc94bc6] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm16-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='e2e148b4-00b2-444a-b935-633882e840af', vds='Host[prd-rbs-ovirt-kvm16-poa.rbs.com.br,e2e148b4-00b2-444a-b935-633882e840af]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2017-01-06 16:25:37,500 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler10) [19b85a77] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm08-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='91dde882-0c86-4330-a206-499275557534', vds='Host[prd-rbs-ovirt-kvm08-poa.rbs.com.br,91dde882-0c86-4330-a206-499275557534]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2017-01-06 16:25:37,501 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler8) [1bccfc8a] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm10-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='8b4bc7dc-af8c-4415-a2ef-bccc11ddf23a', vds='Host[prd-rbs-ovirt-kvm10-poa.rbs.com.br,8b4bc7dc-af8c-4415-a2ef-bccc11ddf23a]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2017-01-06 16:25:37,501 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler6) [152cde35] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm01-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='c4a21d0b-f003-4ee1-8b6b-2b26671d410f', vds='Host[prd-rbs-ovirt-kvm01-poa.rbs.com.br,c4a21d0b-f003-4ee1-8b6b-2b26671d410f]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2017-01-06 16:25:37,504 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler9) [2c0caad] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm12-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='46a28187-ab69-4871-9182-24dd910d2784', vds='Host[prd-rbs-ovirt-kvm12-poa.rbs.com.br,46a28187-ab69-4871-9182-24dd910d2784]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded 2017-01-06 16:25:37,558 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "c5556060-f18c-4021-a4b2-8300c9137133" 2017-01-06 16:26:43,588 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null 2017-01-06 16:26:43,588 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null 2017-01-06 16:26:43,589 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null 2017-01-06 16:26:43,589 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null 2017-01-06 16:26:43,589 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null 2017-01-06 16:26:43,589 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null 2017-01-06 16:26:43,590 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler1) [585c6b32] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm09-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='da563ca2-eb39-451d-8ee8-20853b87d341', vds='Host[prd-rbs-ovirt-kvm09-poa.rbs.com.br,da563ca2-eb39-451d-8ee8-20853b87d341]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
Rogerio, This fix was backported to 4.0.6 so once it is released you need to upgrade.