Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 955711

Summary:	New HA regularly shutting down active node
Product:	Red Hat Enterprise MRG	Reporter:	Mike Cressman <mcressma>
Component:	qpid-cpp	Assignee:	Mike Cressman <mcressma>
Status:	CLOSED WONTFIX	QA Contact:	MRG Quality Engineering <mrgqe-bugs>
Severity:	high	Docs Contact:
Priority:	high
Version:	2.3	CC:	jross, mcressma, pmoravec, seldridg
Target Milestone:	---	Keywords:	OtherQA
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	qpid-cpp-0.18-15	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	891689	Environment:
Last Closed:	2020-03-20 16:51:29 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	891689
Bug Blocks:

Description Mike Cressman 2013-04-23 15:18:48 UTC

+++ This bug was initially created as a clone of Bug #891689 +++

Description of problem:
Copying cluster.conf from upstream Programming reference manual, I see regular shutdown of active node due to error:

"error Broker: Cluster already active, cannot be promoted"

The errors appear every 40 seconds - everytime on the node that is active.


Version-Release number of selected component (if applicable):
qpid-cpp-server-ha-0.18-13.el6.x86_64


How reproducible:
100% at my hosts


Steps to Reproduce:
1. Use cluster.conf from http://qpid.apache.org/books/0.18/Programming-In-Apache-Qpid/pdf/Programming-In-Apache-Qpid.pdf
2. Remove virtual IP address from it
3. qpidd.conf:

auth=no
log-to-file=/tmp/qpidd.log
ha-cluster=yes
ha-brokers-url=amqp:train1,train2,train3
ha-backup-timeout=60
log-enable=info+
trace=yes

4. Start cman & rgmanager

  
Actual results:
"error Broker: Cluster already active, cannot be promoted" on active broker every 40 seconds, causing the active broker shutdown (and restart by rgmanager)


Expected results:
no broker shutdown


Additional info:
attached trace logs from all 3 nodes

--- Additional comment from Pavel Moravec on 2013-01-03 12:36:01 EST ---

Created attachment 672138 [details]
qpid traces

--- Additional comment from Pavel Moravec on 2013-01-04 03:14:08 EST ---

when testing with manually started qpidd / qpidd-primary services (i.e. rgmanager off, cman on), no issue appears, brokers are stable.

But why rgmanager can affect this? If a process / service it manages is running it should not intervene..

--- Additional comment from Justin Ross on 2013-02-14 13:44:44 EST ---

Alan, is this expected?

--- Additional comment from Alan Conway on 2013-02-14 16:09:41 EST ---

Background: if a broker is started when there is already an active primary, that broker cannot be promoted until it connects and becomes a READY backup, otherwise messages can be lost. If the primary is killed before that and rgmanager tries to promote the unready backup, it will die with that error message, so that rgmanager can hopefully promote a broker that is ready.

It shouldn't be happening so frequently however, so this probably bears investigation.

--- Additional comment from Alan Conway on 2013-02-25 13:30:33 EST ---

Fixed http://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?h=0.18-mrg-aconway-bz891689&id=d0262927d32bdd043125373a7f3a969e7600713d

commit d0262927d32bdd043125373a7f3a969e7600713d
Author: Alan Conway <aconway>
Commit: Alan Conway <aconway>

    Bug 891689 - New HA regularly shutting down active node
    
    qpid-primary script was incorrect and failing on status calls,
    causing the broker to be restarted by rgmanager.

--- Additional comment from Mike Cressman on 2013-04-23 11:15:17 EDT ---

Trunk checkin svn rev: 1449870