Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 891689

Summary: New HA regularly shutting down active node
Product: Red Hat Enterprise MRG Reporter: Pavel Moravec <pmoravec>
Component: qpid-cppAssignee: Alan Conway <aconway>
Status: CLOSED CURRENTRELEASE QA Contact: mick <mgoulish>
Severity: high Docs Contact:
Priority: high    
Version: 2.3CC: esammons, freznice, iboverma, jross, mcressma, mgoulish, mtoth
Target Milestone: 3.0Keywords: OtherQA
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qpid-cpp-0.22 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 955711 (view as bug list) Environment:
Last Closed: 2015-01-21 12:56:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 955711    
Attachments:
Description Flags
qpid traces none

Description Pavel Moravec 2013-01-03 17:33:31 UTC
Description of problem:
Copying cluster.conf from upstream Programming reference manual, I see regular shutdown of active node due to error:

"error Broker: Cluster already active, cannot be promoted"

The errors appear every 40 seconds - everytime on the node that is active.


Version-Release number of selected component (if applicable):
qpid-cpp-server-ha-0.18-13.el6.x86_64


How reproducible:
100% at my hosts


Steps to Reproduce:
1. Use cluster.conf from http://qpid.apache.org/books/0.18/Programming-In-Apache-Qpid/pdf/Programming-In-Apache-Qpid.pdf
2. Remove virtual IP address from it
3. qpidd.conf:

auth=no
log-to-file=/tmp/qpidd.log
ha-cluster=yes
ha-brokers-url=amqp:train1,train2,train3
ha-backup-timeout=60
log-enable=info+
trace=yes

4. Start cman & rgmanager

  
Actual results:
"error Broker: Cluster already active, cannot be promoted" on active broker every 40 seconds, causing the active broker shutdown (and restart by rgmanager)


Expected results:
no broker shutdown


Additional info:
attached trace logs from all 3 nodes

Comment 1 Pavel Moravec 2013-01-03 17:36:01 UTC
Created attachment 672138 [details]
qpid traces

Comment 2 Pavel Moravec 2013-01-04 08:14:08 UTC
when testing with manually started qpidd / qpidd-primary services (i.e. rgmanager off, cman on), no issue appears, brokers are stable.

But why rgmanager can affect this? If a process / service it manages is running it should not intervene..

Comment 3 Justin Ross 2013-02-14 18:44:44 UTC
Alan, is this expected?

Comment 4 Alan Conway 2013-02-14 21:09:41 UTC
Background: if a broker is started when there is already an active primary, that broker cannot be promoted until it connects and becomes a READY backup, otherwise messages can be lost. If the primary is killed before that and rgmanager tries to promote the unready backup, it will die with that error message, so that rgmanager can hopefully promote a broker that is ready.

It shouldn't be happening so frequently however, so this probably bears investigation.

Comment 5 Alan Conway 2013-02-25 18:30:33 UTC
Fixed http://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?h=0.18-mrg-aconway-bz891689&id=d0262927d32bdd043125373a7f3a969e7600713d

commit d0262927d32bdd043125373a7f3a969e7600713d
Author: Alan Conway <aconway>
Commit: Alan Conway <aconway>

    Bug 891689 - New HA regularly shutting down active node
    
    qpid-primary script was incorrect and failing on status calls,
    causing the broker to be restarted by rgmanager.

Comment 6 Mike Cressman 2013-04-23 15:15:17 UTC
Trunk checkin svn rev: 1449870

Comment 7 mick 2013-12-13 17:42:23 UTC
With these qpid pkgs:

qpid-qmf-0.22-24.el6.x86_64
qpid-cpp-client-devel-0.22-29.el6.x86_64
qpid-proton-c-0.5-9.el6.x86_64
qpid-cpp-server-0.22-29.el6.x86_64
python-qpid-0.22-8.el6.noarch
qpid-tools-0.22-7.el6.noarch
qpid-cpp-server-ha-0.22-29.el6.x86_64
qpid-cpp-client-0.22-29.el6.x86_64
python-qpid-qmf-0.22-24.el6.x86_64


with cman amd rgmanager running -- 3 separate physical boxes (mrg3, mrg25, mrg28) -- no broker shutdown, no error messages after 30 mins with HA cluster running.

--> verified.