Bug 1393714 - Servers state isn't stable and they changing state to non-responsive every few minutes, if one host in the DC is non-responsive with the engine
Summary: Servers state isn't stable and they changing state to non-responsive every fe...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm-jsonrpc-java
Classification: oVirt
Component: Core
Version: ---
Hardware: x86_64
OS: Linux
medium
urgent
Target Milestone: ovirt-4.1.0-alpha
: ---
Assignee: Piotr Kliczewski
QA Contact: Michael Burman
URL:
Whiteboard:
Depends On:
Blocks: 1395625
TreeView+ depends on / blocked
 
Reported: 2016-11-10 08:27 UTC by Michael Burman
Modified: 2017-02-15 15:07 UTC (History)
10 users (show)

Fixed In Version:
Clone Of:
: 1395625 (view as bug list)
Environment:
Last Closed: 2017-02-15 15:07:54 UTC
oVirt Team: Infra
Embargoed:
rule-engine: ovirt-4.1+
rule-engine: blocker+
rule-engine: planning_ack+
oourfali: devel_ack+
lsvaty: testing_ack+


Attachments (Terms of Use)
engine logs (991.19 KB, application/x-gzip)
2016-11-10 08:27 UTC, Michael Burman
no flags Details
vdsm logs (391.92 KB, application/x-gzip)
2016-11-10 08:56 UTC, Michael Burman
no flags Details
engine log (955.60 KB, application/x-gzip)
2016-11-10 09:24 UTC, Michael Burman
no flags Details
tcpdump (50.47 KB, application/x-gzip)
2016-11-10 10:09 UTC, Michael Burman
no flags Details
engine in debug (2.15 MB, application/x-gzip)
2016-11-15 15:35 UTC, Michael Burman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 66799 0 master MERGED Closing channel could take too much reactor's time 2021-01-26 14:26:29 UTC
oVirt gerrit 66838 0 ovirt-4.0 MERGED Closing channel could take too much reactor's time 2021-01-26 14:26:29 UTC
oVirt gerrit 67016 0 master MERGED jsonrpc: version bump 2021-01-26 14:25:46 UTC
oVirt gerrit 67017 0 ovirt-engine-4.0 MERGED jsonrpc: version bump 2021-01-26 14:25:46 UTC
oVirt gerrit 67018 0 ovirt-engine-4.0.6 MERGED jsonrpc: version bump 2021-01-26 14:25:46 UTC

Description Michael Burman 2016-11-10 08:27:19 UTC
Created attachment 1219244 [details]
engine logs

Description of problem:
Servers state isn't stable and they changing state to non-responsive every few minutes.

The servers on latest master are moving from non-responsive to up every few minutes.

2016-11-10 10:16:47,116 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null
2016-11-10 10:16:47,116 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null
2016-11-10 10:16:47,128 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler3) [31d4820c] Command 'GetAllVmStatsVDSCommand(HostName = orchid-vds1.qa.lab.tlv.redhat.com, VdsIdVDSCommandParametersBase:{runAsync='true', hostId='a8ac96b7-bce5-4039-b4ee-e608f34ceac7'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2016-11-10 10:16:47,128 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher] (DefaultQuartzScheduler3) [31d4820c] Failed to fetch vms info for host 'orchid-vds1.qa.lab.tlv.redhat.com' - skipping VMs monitoring.
2016-11-10 10:16:47,154 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler2) [7e87bdde] Command 'GetAllVmStatsVDSCommand(HostName = navy-vds1.qa.lab.tlv.redhat.com, VdsIdVDSCommandParametersBase:{runAsync='true', hostId='cd60d5ed-3b0e-46e5-be1a-272ed6516c45'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2016-11-10 10:16:47,154 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher] (DefaultQuartzScheduler2) [7e87bdde] Failed to fetch vms info for host 'navy-vds1.qa.lab.tlv.redhat.com' - skipping VMs monitoring.
2016-11-10 10:16:47,157 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler7) [7e920fc] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM orchid-vds1.qa.lab.tlv.redhat.com command failed: Heartbeat exceeded
2016-11-10 10:16:47,158 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand] (DefaultQuartzScheduler7) [7e920fc] Command 'GetStatsVDSCommand(HostName = orchid-vds1.qa.lab.tlv.redhat.com, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='a8ac96b7-bce5-4039-b4ee-e608f34ceac7', vds='Host[orchid-vds1.qa.lab.tlv.redhat.com,a8ac96b7-bce5-4039-b4ee-e608f34ceac7]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2016-11-10 10:16:47,158 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler7) [7e920fc] Failed getting vds stats, host='orchid-vds1.qa.lab.tlv.redhat.com'(a8ac96b7-bce5-4039-b4ee-e608f34ceac7): org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2016-11-10 10:16:47,158 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler7) [7e920fc] Failure to refresh host 'orchid-vds1.qa.lab.tlv.redhat.com' runtime info: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2016-11-10 10:16:47,158 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler7) [7e920fc] Failed to refresh VDS, network error, continuing, vds='orchid-vds1.qa.lab.tlv.redhat.com'(a8ac96b7-bce5-4039-b4ee-e608f34ceac7): VDSGenericException: VDSNetworkException: Heartbeat exceeded
2016-11-10 10:16:47,159 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-6-thread-17) [31d4820c] Host 'orchid-vds1.qa.lab.tlv.redhat.com' is not responding.
2016-11-10 10:16:47,164 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler10) [33fad06d] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM orchid-vds1.qa.lab.tlv.redhat.com command failed: Heartbeat exceeded
2016-11-10 10:16:47,165 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler10) [33fad06d] Command 'SpmStatusVDSCommand(HostName = orchid-vds1.qa.lab.tlv.redhat.com, SpmStatusVDSCommandParameters:{runAsync='true', hostId='a8ac96b7-bce5-4039-b4ee-e608f34ceac7', storagePoolId='ea5798e5-47b6-4e81-8005-cceffdddcd5b'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2016-11-10 10:16:47,180 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler8) [db08d74] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM navy-vds1.qa.lab.tlv.redhat.com command failed: Heartbeat exceeded
2016-11-10 10:16:47,181 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand] (DefaultQuartzScheduler8) [db08d74] Command 'GetStatsVDSCommand(HostName = navy-vds1.qa.lab.tlv.redhat.com, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='cd60d5ed-3b0e-46e5-be1a-272ed6516c45', vds='Host[navy-vds1.qa.lab.tlv.redhat.com,cd60d5ed-3b0e-46e5-be1a-272ed6516c45]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2016-11-10 10:16:47,181 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler8) [db08d74] Failed getting vds stats, host='navy-vds1.qa.lab.tlv.redhat.com'(cd60d5ed-3b0e-46e5-be1a-272ed6516c45): org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2016-11-10 10:16:47,181 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler8) [db08d74] Failure to refresh host 'navy-vds1.qa.lab.tlv.redhat.com' runtime info: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2016-11-10 10:16:47,181 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler8) [db08d74] Failed to refresh VDS, network error, continuing, vds='navy-vds1.qa.lab.tlv.redhat.com'(cd60d5ed-3b0e-46e5-be1a-272ed6516c45): VDSGenericException: VDSNetworkException: Heartbeat exceeded

2016-11-10 10:21:11,124 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to orchid-vds1.qa.lab.
tlv.redhat.com/10.35.128.22
2016-11-10 10:21:11,579 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to navy-vds1.qa.lab.tl
v.redhat.com/10.35.128.14
2016-11-10 10:21:11,591 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "2d5664fa-cc8
e-459d-b282-623f619b6445"
2016-11-10 10:21:11,591 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "3aecd3b6-43f
3-4caf-8642-486156591a97"

Version-Release number of selected component (if applicable):
4.1.0-0.0.master.20161109091313.gitedb19fb.el7.centos

How reproducible:
100

Comment 1 Oved Ourfali 2016-11-10 08:32:02 UTC
Can you attach vdsm.log as well?

Comment 2 Michael Burman 2016-11-10 08:56:02 UTC
Created attachment 1219247 [details]
vdsm logs

Comment 3 Piotr Kliczewski 2016-11-10 08:58:09 UTC
Looking at the logs I see that issues started from:

2016-11-10 03:45:58,466 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages: Connection reset by peer

It is just after installation of camel-vdsa.qa.lab.tlv.redhat.com. Later we see plenty of failures for other hosts due to heart beat exception.

Please make sure that there are no networking issues between engine, vdsm and storage.

Comment 4 Oved Ourfali 2016-11-10 09:02:47 UTC
Also, I see in supervdsm on navy-vds1 that the "n1" interface failed to get ip address. Is that a required network?

Comment 5 Piotr Kliczewski 2016-11-10 09:20:16 UTC
I tried to connect to camel-vdsa.qa.lab.tlv.redhat.com but I was unable to. It seems that adding this host caused network instability which resulted in heartbeats for other hosts and later engine was not able to reconnect to them.

Do we know what happened to camel host?

Comment 6 Michael Burman 2016-11-10 09:21:13 UTC
We don't have network issues. I think i understand what triggers this behavior, it's happening on 4.0.5 as well.  
Once a host in an OVS cluster is non-responding, this behavior starts and all other servers in the DC(legacy cluster) starting to change state to non-responsive and up again.

Attaching new engine logs, i have initiated a reboot on an ovs host and this behavior started right away and all hosts on legacy cluster became non-responsive.
I think it some how related to ovs, but i don't understand how exactly.

Comment 7 Michael Burman 2016-11-10 09:22:07 UTC
(In reply to Piotr Kliczewski from comment #5)
> I tried to connect to camel-vdsa.qa.lab.tlv.redhat.com but I was unable to.
> It seems that adding this host caused network instability which resulted in
> heartbeats for other hosts and later engine was not able to reconnect to
> them.
> 
> Do we know what happened to camel host?

Yes, it's an ovs host(in ovs cluster) which i rebooted in order to reproduce this bug)

Comment 8 Michael Burman 2016-11-10 09:24:16 UTC
Created attachment 1219253 [details]
engine log

Comment 9 Piotr Kliczewski 2016-11-10 09:25:39 UTC
Michael,

I seems that it could be related. Please provide tcpdump for the engine.

In my opinion it seems to be network team issue.

Comment 10 Piotr Kliczewski 2016-11-10 09:38:55 UTC
I checked the logs and it was not installation but host upgrade manager.

Comment 11 Piotr Kliczewski 2016-11-10 09:43:50 UTC
It looks like ovs affects engine to vdsm communication which needs to be investigated by the network team. tcpdump taken on the engine host still would be nice to have.

Comment 12 Michael Burman 2016-11-10 10:09:30 UTC
Created attachment 1219292 [details]
tcpdump

It's a tcpdump of the engine with one of the server that changing state once the host in the ovs cluster going down(src+dst)

Comment 13 Michael Burman 2016-11-14 08:26:54 UTC
Updating this report, this issue happens with legacy servers as well, once a server is non-responsive with engine, all other servers in the DC affected.
both on master and 4.0.5

Comment 14 Michael Burman 2016-11-15 10:07:19 UTC
Can someone investigate this report? it is a 100% reproducible on all setups and versions. And it's not network related, we are tested if there was a network issue communications with engine and other hosts in the setup once 1 server is gone to non-responsive state and we had no disconnections between the engine and other hosts, but it looks like engine is mistake/confused and lie about the other connection with the other servers in the setup.

We run the next script on the engine with the servers in the setup:

from time import time, sleep
from vdsm import jsonrpcvdscli
import sys

host = sys.argv[1] if len(sys.argv) > 1 else 'localhost'

s = jsonrpcvdscli.connect(host=host)

while True:
    t0 = time()
    try:
        s.ping()
        print(time() - t0)
    except Exception:
        print("No response for JSON-RPC Host.ping request")
    sleep(0.5)

There were no prints and ping was consistent, but engine report that the servers state is changing every few minutes. This can be caused by current changes in the vdsm-jsonrpc..

Comment 15 Piotr Kliczewski 2016-11-15 10:44:40 UTC
Michael,

Please do one test for me. Take any version you tested and configure bunch of hosts. Block one of them and see if you will be able to see the same results.
Once you reproduce the same issue without using ovs please attach the logs.

Comment 16 Michael Burman 2016-11-15 15:33:51 UTC
Piotr, 

I'm attaching the engine log in debug mode from our tests today.

Comment 17 Michael Burman 2016-11-15 15:35:15 UTC
Created attachment 1220866 [details]
engine in debug

Comment 18 Piotr Kliczewski 2016-11-16 08:31:30 UTC
Micheal, Can you check which versions are affected?

Comment 19 Michael Burman 2016-11-16 08:43:10 UTC
(In reply to Piotr Kliczewski from comment #18)
> Micheal, Can you check which versions are affected?

Latest master 4.1 and rhevm-4.0.5.5-0.1.el7ev.noarch

Comment 20 Piotr Kliczewski 2016-11-16 09:06:17 UTC
During my investigation of the issue I saw that when one host is down every 3 - 4 mins other hosts go down for less than a second with heartbeat exceeded and they are Up again. This happens for the duration of a host being down. The fix was tested on Michael's environment.

Comment 22 Sandro Bonazzola 2016-12-12 14:03:33 UTC
The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified.

Comment 23 Michael Burman 2016-12-13 12:29:35 UTC
Verified on -  4.1.0-0.2.master.20161210231201.git26a385e.el7.centos

Comment 24 Rogerio Ceni Coelho 2017-01-06 18:39:54 UTC
I think i have same problem on my oVirt 4.0.5 installation. How can i solve this ?

[root@prd-rbs-ovirt01-poa ~]# grep -i error /var/log/ovirt-engine/engine.log | grep -v org.ovirt.engine.core.vdsbroker.HostDevListByCapsVDSCommand | tail -50 
2017-01-06 16:20:06,897 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler6) [a6fc972] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm10-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='8b4bc7dc-af8c-4415-a2ef-bccc11ddf23a', vds='Host[prd-rbs-ovirt-kvm10-poa.rbs.com.br,8b4bc7dc-af8c-4415-a2ef-bccc11ddf23a]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2017-01-06 16:20:06,899 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler10) [19b85a77] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm17-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='795b917b-ea5b-499d-80c5-aa3aad4f2537', vds='Host[prd-rbs-ovirt-kvm17-poa.rbs.com.br,795b917b-ea5b-499d-80c5-aa3aad4f2537]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2017-01-06 16:20:06,901 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler8) [1bccfc8a] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm19-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='887b6e35-1fd1-4cd6-9e78-05bcab12a417', vds='Host[prd-rbs-ovirt-kvm19-poa.rbs.com.br,887b6e35-1fd1-4cd6-9e78-05bcab12a417]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2017-01-06 16:23:26,020 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler1) [7771902b] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm06-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='23b750ed-4ba6-4782-a9bd-c018c0f36e44', vds='Host[prd-rbs-ovirt-kvm06-poa.rbs.com.br,23b750ed-4ba6-4782-a9bd-c018c0f36e44]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2017-01-06 16:23:26,043 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "e5f7a478-841d-413c-baa1-d63632be7748"
2017-01-06 16:24:32,480 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "75bf7e9a-fb5f-4253-bdf6-ff0fcbf8c876"
2017-01-06 16:24:32,485 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler6) [35cc958c] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM prd-rbs-ovirt-kvm06-poa.rbs.com.br command failed: Heartbeat exceeded
2017-01-06 16:24:32,485 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler6) [35cc958c] Command 'SpmStatusVDSCommand(HostName = prd-rbs-ovirt-kvm06-poa.rbs.com.br, SpmStatusVDSCommandParameters:{runAsync='true', hostId='23b750ed-4ba6-4782-a9bd-c018c0f36e44', storagePoolId='00000001-0001-0001-0001-000000000198'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2017-01-06 16:24:32,524 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler6) [152cde35] Correlation ID: 152cde35, Call Stack: null, Custom Event ID: -1, Message: Invalid status on Data Center RBS. Setting Data Center status to Non Responsive (On host prd-rbs-ovirt-kvm06-poa.rbs.com.br, Error: Network error during communication with the Host.).
2017-01-06 16:25:37,493 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null
2017-01-06 16:25:37,494 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null
2017-01-06 16:25:37,494 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null
2017-01-06 16:25:37,494 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null
2017-01-06 16:25:37,494 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null
2017-01-06 16:25:37,494 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null
2017-01-06 16:25:37,495 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler4) [44e1d6ed] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm03-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='f7842244-646c-400a-9736-f8d4aa9b1cef', vds='Host[prd-rbs-ovirt-kvm03-poa.rbs.com.br,f7842244-646c-400a-9736-f8d4aa9b1cef]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2017-01-06 16:25:37,499 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler5) [4fb9c0c9] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm17-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='795b917b-ea5b-499d-80c5-aa3aad4f2537', vds='Host[prd-rbs-ovirt-kvm17-poa.rbs.com.br,795b917b-ea5b-499d-80c5-aa3aad4f2537]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2017-01-06 16:25:37,499 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler7) [7bc94bc6] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm16-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='e2e148b4-00b2-444a-b935-633882e840af', vds='Host[prd-rbs-ovirt-kvm16-poa.rbs.com.br,e2e148b4-00b2-444a-b935-633882e840af]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2017-01-06 16:25:37,500 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler10) [19b85a77] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm08-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='91dde882-0c86-4330-a206-499275557534', vds='Host[prd-rbs-ovirt-kvm08-poa.rbs.com.br,91dde882-0c86-4330-a206-499275557534]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2017-01-06 16:25:37,501 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler8) [1bccfc8a] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm10-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='8b4bc7dc-af8c-4415-a2ef-bccc11ddf23a', vds='Host[prd-rbs-ovirt-kvm10-poa.rbs.com.br,8b4bc7dc-af8c-4415-a2ef-bccc11ddf23a]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2017-01-06 16:25:37,501 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler6) [152cde35] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm01-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='c4a21d0b-f003-4ee1-8b6b-2b26671d410f', vds='Host[prd-rbs-ovirt-kvm01-poa.rbs.com.br,c4a21d0b-f003-4ee1-8b6b-2b26671d410f]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2017-01-06 16:25:37,504 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler9) [2c0caad] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm12-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='46a28187-ab69-4871-9182-24dd910d2784', vds='Host[prd-rbs-ovirt-kvm12-poa.rbs.com.br,46a28187-ab69-4871-9182-24dd910d2784]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded
2017-01-06 16:25:37,558 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "c5556060-f18c-4021-a4b2-8300c9137133"
2017-01-06 16:26:43,588 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null
2017-01-06 16:26:43,588 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null
2017-01-06 16:26:43,589 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null
2017-01-06 16:26:43,589 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null
2017-01-06 16:26:43,589 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null
2017-01-06 16:26:43,589 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error: null
2017-01-06 16:26:43,590 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler1) [585c6b32] Command 'GetAllVmStatsVDSCommand(HostName = prd-rbs-ovirt-kvm09-poa.rbs.com.br, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='da563ca2-eb39-451d-8ee8-20853b87d341', vds='Host[prd-rbs-ovirt-kvm09-poa.rbs.com.br,da563ca2-eb39-451d-8ee8-20853b87d341]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exceeded

Comment 25 Piotr Kliczewski 2017-01-09 08:17:49 UTC
Rogerio,

This fix was backported to 4.0.6 so once it is released you need to upgrade.


Note You need to log in before you can comment on or make changes to this bug.