Bug 971368 - Docs: Limitations to cluster.conf setup for newHA
Docs: Limitations to cluster.conf setup for newHA
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: Messaging_Installation_and_Configuration_Guide (Show other bugs)
2.3
All All
high Severity high
: 3.0
: ---
Assigned To: Jared MORGAN
Frantisek Reznicek
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-06 07:21 EDT by Pavel Moravec
Modified: 2015-08-09 21:23 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-01-22 10:28:13 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Pavel Moravec 2013-06-06 07:21:55 EDT
Description of problem:
Based on https://bugzilla.redhat.com/show_bug.cgi?id=970657#c2, there are few requirements to cluster.conf in order to make active-passive qpid clusters properly work.

In particular:
1) Manual reallocation of qpidd-primary service cannot be done to a node where qpid broker is not in ready state (is stopped, or either in catchup or joining state). Such reallocation would definitely fail.

2) When using ordered failover domains, use nofailback option (nofailback="1"). That prevents the below situation to occur:
- the most priority node is joining the cluster and starting qpidd service
- qpidd service is in catchup or joining state
- rgmanager tries to relocate qpidd-primary to this node (such that it restarts qpidd broker on 2nd node that runs qpidd-primary)
- reallocation fails as qpidd on node1 isnt ready, so rgmanager tries to reallocate to 2nd node
- broker on 2nd node is in joining state, so qpidd-primary service fails to start
- rgmanager tries to reallocate to 1st node to closing this infinite loop

3) primary service recovery procedure has to be "relocate", not "restart". As currently stopping qpidd-primary means stopping / restarting qpidd broker as well. Newly started broker wont be in ready state when qpidd-primary service would be attempted to start.
Comment 1 Joshua Wulf 2013-09-23 02:55:29 EDT
Added notes about the first two here:

http://deathstar1.usersys.redhat.com:3000/builds/18173-Messaging_Installation_and_Configuration_Guide/#Limitations_in_HA_in_MRG_3

With the third one, about relocate vs restart, currently

http://deathstar1.usersys.redhat.com:3000/builds/18173-Messaging_Installation_and_Configuration_Guide/#Configure_rgmanager

Has in step 9 restart for the individual nodes, and in step 10 relocate for the primary service.
Comment 2 Frantisek Reznicek 2013-12-05 09:56:48 EST
1), 3) are ok.

2) wording is not optimal, see below proposed change:

Failback with ordered domains can cause an infinite failover loop under certain conditions. To avoid this, when using ordered domains use nofailback=1.

replace to (when talking about domain - it has to be alwasy [cluster] failover-domain)

Failback with cluster ordered failover-domains (cluster.conf 'ordered=1') can cause an infinite failover loop under certain conditions. To avoid this use cluster ordered failover-domains with nofailback=1 parameter.

-> ASSIGNED
Comment 3 Joshua Wulf 2013-12-12 00:02:02 EST
Changed to: 

"Failback with cluster ordered failover-domains ('ordered=1' in cluster.conf) can cause an infinite failover loop under certain conditions. To avoid this, use cluster ordered failover-domains with nofailback=1 specified in cluster.conf."

http://deathstar1.usersys.redhat.com:3000/builds/18173-Messaging_Installation_and_Configuration_Guide/#Limitations_in_HA_in_MRG_3
Comment 4 Frantisek Reznicek 2013-12-17 07:26:10 EST
Thanks for your change, I'm satisfied now.

-> VERIFIED

Note You need to log in before you can comment on or make changes to this bug.