Bug 1384197 - Frequent heartbeat exceeded timeouts and hosts going non-responsive
Summary: Frequent heartbeat exceeded timeouts and hosts going non-responsive
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.0.3
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Piotr Kliczewski
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-12 19:16 UTC by Gordon Watson
Modified: 2016-10-19 12:33 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-18 09:16:36 UTC
oVirt Team: Infra
Target Upstream Version:


Attachments (Terms of Use)

Description Gordon Watson 2016-10-12 19:16:38 UTC
Description of problem:

The customer has three hosts in the only cluster in this RHEV 4.0 environment. From time to time, all encounter "Heartbeat exeeded" timeouts and, within that time-frame, sometimes go non-responsive.

They have 'vdsHeartbeatInSeconds' currently set to 20.

The customer has disabled Power Management to prevent hosts from getting fenced.

RHEV-M runs inside a VM within a VMWare environment.


Version-Release number of selected component (if applicable):

RHEV-M 4.0.3
RHVH-4.0 with vdsm-4.18.11-1.el7


How reproducible:

Not. It's random.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Details to follow.


Note You need to log in before you can comment on or make changes to this bug.