Bug 605331 - RFE: TestOnly: Need to support cluster infrastructure running over network links with higher than LAN latency conditions
RFE: TestOnly: Need to support cluster infrastructure running over network li...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: corosync (Show other bugs)
7.0
All Linux
low Severity low
: rc
: ---
Assigned To: Jan Friesse
Cluster QE
: FutureFeature, TestOnly
Depends On:
Blocks: 605332
  Show dependency treegraph
 
Reported: 2010-06-17 12:28 EDT by Perry Myers
Modified: 2013-09-04 09:36 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
: 605332 (view as bug list)
Environment:
Last Closed: 2013-09-04 09:36:01 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Perry Myers 2010-06-17 12:28:44 EDT
Description of problem:
Right now RHEL HA (both RHCS and Pacemaker based stacks) require running on networks with LAN-like latency that we have defined to be <= 2ms.

The primary constraints on latency are in the membership which is done via Corosync.  In addition, plocks via GFS are also latency sensitive.

For the context of this bug, we are not concerned about GFS over high latency links, just the core cluster infrastructure.

What we need to do is simulate high latency links and test out the HA stacks to determine what is the highest latency that we can support w/o needing to make significant code or configuration (timeout) changes.

Then we can begin officially QE testing at this higher latency and support links with up to this delay.

This bug for the time being should be considered TestOnly, but it needs testing first from development perspective before QE can begin running more comprehensive tests.

The initial use case is to run stretch clusters with 2 sites and between 1 and 8 nodes at each site.  The membership list should be configured so that the Totem token does not bounce back and forth the high latency link, but crosses it minimally.  (i.e. nodes 1-8 on SiteA and 9-16 on SiteB, meaning token only crosses high latency link between nodes 8 and 9 and between nodes 16 and 1)

If specific code changes are required to support this, the engineers testing this feature should file dependent bugs on their components (for example a bug on Corosync)
Comment 8 Jan Friesse 2013-06-19 11:01:42 EDT
Corosync with properly set timeouts (specially token timeout) should be able to handle non lan conditions quite well. It is really testonly.
Comment 9 RHEL Product and Program Management 2013-09-04 09:36:01 EDT
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.