Bug 971368 - Docs: Limitations to cluster.conf setup for newHA
Summary: Docs: Limitations to cluster.conf setup for newHA
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: Messaging_Installation_and_Configuration_Guide
Version: 2.3
Hardware: All
OS: All
high
high
Target Milestone: 3.0
: ---
Assignee: Jared MORGAN
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-06-06 11:21 UTC by Pavel Moravec
Modified: 2015-08-10 01:23 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-01-22 15:28:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 970657 0 medium NEW Failover and/or relocation of qpidd-primary service should be limited to ready brokers only 2024-01-19 19:11:17 UTC

Internal Links: 970657

Description Pavel Moravec 2013-06-06 11:21:55 UTC
Description of problem:
Based on https://bugzilla.redhat.com/show_bug.cgi?id=970657#c2, there are few requirements to cluster.conf in order to make active-passive qpid clusters properly work.

In particular:
1) Manual reallocation of qpidd-primary service cannot be done to a node where qpid broker is not in ready state (is stopped, or either in catchup or joining state). Such reallocation would definitely fail.

2) When using ordered failover domains, use nofailback option (nofailback="1"). That prevents the below situation to occur:
- the most priority node is joining the cluster and starting qpidd service
- qpidd service is in catchup or joining state
- rgmanager tries to relocate qpidd-primary to this node (such that it restarts qpidd broker on 2nd node that runs qpidd-primary)
- reallocation fails as qpidd on node1 isnt ready, so rgmanager tries to reallocate to 2nd node
- broker on 2nd node is in joining state, so qpidd-primary service fails to start
- rgmanager tries to reallocate to 1st node to closing this infinite loop

3) primary service recovery procedure has to be "relocate", not "restart". As currently stopping qpidd-primary means stopping / restarting qpidd broker as well. Newly started broker wont be in ready state when qpidd-primary service would be attempted to start.

Comment 1 Joshua Wulf 2013-09-23 06:55:29 UTC
Added notes about the first two here:

http://deathstar1.usersys.redhat.com:3000/builds/18173-Messaging_Installation_and_Configuration_Guide/#Limitations_in_HA_in_MRG_3

With the third one, about relocate vs restart, currently

http://deathstar1.usersys.redhat.com:3000/builds/18173-Messaging_Installation_and_Configuration_Guide/#Configure_rgmanager

Has in step 9 restart for the individual nodes, and in step 10 relocate for the primary service.

Comment 2 Frantisek Reznicek 2013-12-05 14:56:48 UTC
1), 3) are ok.

2) wording is not optimal, see below proposed change:

Failback with ordered domains can cause an infinite failover loop under certain conditions. To avoid this, when using ordered domains use nofailback=1.

replace to (when talking about domain - it has to be alwasy [cluster] failover-domain)

Failback with cluster ordered failover-domains (cluster.conf 'ordered=1') can cause an infinite failover loop under certain conditions. To avoid this use cluster ordered failover-domains with nofailback=1 parameter.

-> ASSIGNED

Comment 3 Joshua Wulf 2013-12-12 05:02:02 UTC
Changed to: 

"Failback with cluster ordered failover-domains ('ordered=1' in cluster.conf) can cause an infinite failover loop under certain conditions. To avoid this, use cluster ordered failover-domains with nofailback=1 specified in cluster.conf."

http://deathstar1.usersys.redhat.com:3000/builds/18173-Messaging_Installation_and_Configuration_Guide/#Limitations_in_HA_in_MRG_3

Comment 4 Frantisek Reznicek 2013-12-17 12:26:10 UTC
Thanks for your change, I'm satisfied now.

-> VERIFIED


Note You need to log in before you can comment on or make changes to this bug.