971368 – Docs: Limitations to cluster.conf setup for newHA

Bug 971368 - Docs: Limitations to cluster.conf setup for newHA

Summary: Docs: Limitations to cluster.conf setup for newHA

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	Messaging_Installation_and_Configuration_Guide
Sub Component:
Version:	2.3
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	3.0
Target Release:	---
Assignee:	Jared MORGAN
QA Contact:	Frantisek Reznicek
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-06-06 11:21 UTC by Pavel Moravec
Modified:	2015-08-10 01:23 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-01-22 15:28:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	970657	0	medium	CLOSED	Failover and/or relocation of qpidd-primary service should be limited to ready brokers only	2025-02-10 03:27:51 UTC

Internal Links: 970657

Description Pavel Moravec 2013-06-06 11:21:55 UTC

Description of problem:
Based on https://bugzilla.redhat.com/show_bug.cgi?id=970657#c2, there are few requirements to cluster.conf in order to make active-passive qpid clusters properly work.

In particular:
1) Manual reallocation of qpidd-primary service cannot be done to a node where qpid broker is not in ready state (is stopped, or either in catchup or joining state). Such reallocation would definitely fail.

2) When using ordered failover domains, use nofailback option (nofailback="1"). That prevents the below situation to occur:
- the most priority node is joining the cluster and starting qpidd service
- qpidd service is in catchup or joining state
- rgmanager tries to relocate qpidd-primary to this node (such that it restarts qpidd broker on 2nd node that runs qpidd-primary)
- reallocation fails as qpidd on node1 isnt ready, so rgmanager tries to reallocate to 2nd node
- broker on 2nd node is in joining state, so qpidd-primary service fails to start
- rgmanager tries to reallocate to 1st node to closing this infinite loop

3) primary service recovery procedure has to be "relocate", not "restart". As currently stopping qpidd-primary means stopping / restarting qpidd broker as well. Newly started broker wont be in ready state when qpidd-primary service would be attempted to start.

Comment 1 Joshua Wulf 2013-09-23 06:55:29 UTC

Added notes about the first two here:

http://deathstar1.usersys.redhat.com:3000/builds/18173-Messaging_Installation_and_Configuration_Guide/#Limitations_in_HA_in_MRG_3

With the third one, about relocate vs restart, currently

http://deathstar1.usersys.redhat.com:3000/builds/18173-Messaging_Installation_and_Configuration_Guide/#Configure_rgmanager

Has in step 9 restart for the individual nodes, and in step 10 relocate for the primary service.

Comment 2 Frantisek Reznicek 2013-12-05 14:56:48 UTC

1), 3) are ok.

2) wording is not optimal, see below proposed change:

Failback with ordered domains can cause an infinite failover loop under certain conditions. To avoid this, when using ordered domains use nofailback=1.

replace to (when talking about domain - it has to be alwasy [cluster] failover-domain)

Failback with cluster ordered failover-domains (cluster.conf 'ordered=1') can cause an infinite failover loop under certain conditions. To avoid this use cluster ordered failover-domains with nofailback=1 parameter.

-> ASSIGNED

Comment 3 Joshua Wulf 2013-12-12 05:02:02 UTC

Changed to: 

"Failback with cluster ordered failover-domains ('ordered=1' in cluster.conf) can cause an infinite failover loop under certain conditions. To avoid this, use cluster ordered failover-domains with nofailback=1 specified in cluster.conf."

http://deathstar1.usersys.redhat.com:3000/builds/18173-Messaging_Installation_and_Configuration_Guide/#Limitations_in_HA_in_MRG_3

Comment 4 Frantisek Reznicek 2013-12-17 12:26:10 UTC

Thanks for your change, I'm satisfied now.

-> VERIFIED

Note You need to log in before you can comment on or make changes to this bug.