Bug 605331 - RFE: TestOnly: Need to support cluster infrastructure running over network links with higher than LAN latency conditions
Summary: RFE: TestOnly: Need to support cluster infrastructure running over network li...
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: corosync
Version: 7.0
Hardware: All
OS: Linux
Target Milestone: rc
: ---
Assignee: Jan Friesse
QA Contact: Cluster QE
Keywords: FutureFeature, TestOnly
Depends On:
Blocks: 605332
TreeView+ depends on / blocked
Reported: 2010-06-17 16:28 UTC by Perry Myers
Modified: 2013-09-04 13:36 UTC (History)
9 users (show)

Clone Of:
: 605332 (view as bug list)
Last Closed: 2013-09-04 13:36:01 UTC

Attachments (Terms of Use)

Description Perry Myers 2010-06-17 16:28:44 UTC
Description of problem:
Right now RHEL HA (both RHCS and Pacemaker based stacks) require running on networks with LAN-like latency that we have defined to be <= 2ms.

The primary constraints on latency are in the membership which is done via Corosync.  In addition, plocks via GFS are also latency sensitive.

For the context of this bug, we are not concerned about GFS over high latency links, just the core cluster infrastructure.

What we need to do is simulate high latency links and test out the HA stacks to determine what is the highest latency that we can support w/o needing to make significant code or configuration (timeout) changes.

Then we can begin officially QE testing at this higher latency and support links with up to this delay.

This bug for the time being should be considered TestOnly, but it needs testing first from development perspective before QE can begin running more comprehensive tests.

The initial use case is to run stretch clusters with 2 sites and between 1 and 8 nodes at each site.  The membership list should be configured so that the Totem token does not bounce back and forth the high latency link, but crosses it minimally.  (i.e. nodes 1-8 on SiteA and 9-16 on SiteB, meaning token only crosses high latency link between nodes 8 and 9 and between nodes 16 and 1)

If specific code changes are required to support this, the engineers testing this feature should file dependent bugs on their components (for example a bug on Corosync)

Comment 8 Jan Friesse 2013-06-19 15:01:42 UTC
Corosync with properly set timeouts (specially token timeout) should be able to handle non lan conditions quite well. It is really testonly.

Comment 9 RHEL Product and Program Management 2013-09-04 13:36:01 UTC
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.