Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 891689

Summary:

New HA regularly shutting down active node

Product:

Red Hat Enterprise MRG

Reporter:

Pavel Moravec <pmoravec>

Component:

qpid-cpp

Assignee:

Alan Conway <aconway>

Status:

CLOSED CURRENTRELEASE

QA Contact:

mick <mgoulish>

Severity:

high

Docs Contact:

Priority:

high

Version:

2.3

CC:

esammons, freznice, iboverma, jross, mcressma, mgoulish, mtoth

Target Milestone:

3.0

Keywords:

OtherQA

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

qpid-cpp-0.22

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

955711 (view as bug list)

Environment:

Last Closed:

2015-01-21 12:56:11 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

955711

Attachments:

Description	Flags
qpid traces	none

Description Pavel Moravec 2013-01-03 17:33:31 UTC

Description of problem:
Copying cluster.conf from upstream Programming reference manual, I see regular shutdown of active node due to error:

"error Broker: Cluster already active, cannot be promoted"

The errors appear every 40 seconds - everytime on the node that is active.


Version-Release number of selected component (if applicable):
qpid-cpp-server-ha-0.18-13.el6.x86_64


How reproducible:
100% at my hosts


Steps to Reproduce:
1. Use cluster.conf from http://qpid.apache.org/books/0.18/Programming-In-Apache-Qpid/pdf/Programming-In-Apache-Qpid.pdf
2. Remove virtual IP address from it
3. qpidd.conf:

auth=no
log-to-file=/tmp/qpidd.log
ha-cluster=yes
ha-brokers-url=amqp:train1,train2,train3
ha-backup-timeout=60
log-enable=info+
trace=yes

4. Start cman & rgmanager

  
Actual results:
"error Broker: Cluster already active, cannot be promoted" on active broker every 40 seconds, causing the active broker shutdown (and restart by rgmanager)


Expected results:
no broker shutdown


Additional info:
attached trace logs from all 3 nodes

Comment 1 Pavel Moravec 2013-01-03 17:36:01 UTC

Created attachment 672138 [details]
qpid traces

Comment 2 Pavel Moravec 2013-01-04 08:14:08 UTC

when testing with manually started qpidd / qpidd-primary services (i.e. rgmanager off, cman on), no issue appears, brokers are stable.

But why rgmanager can affect this? If a process / service it manages is running it should not intervene..

Comment 3 Justin Ross 2013-02-14 18:44:44 UTC

Alan, is this expected?

Comment 4 Alan Conway 2013-02-14 21:09:41 UTC

Background: if a broker is started when there is already an active primary, that broker cannot be promoted until it connects and becomes a READY backup, otherwise messages can be lost. If the primary is killed before that and rgmanager tries to promote the unready backup, it will die with that error message, so that rgmanager can hopefully promote a broker that is ready.

It shouldn't be happening so frequently however, so this probably bears investigation.

Comment 5 Alan Conway 2013-02-25 18:30:33 UTC

Fixed http://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?h=0.18-mrg-aconway-bz891689&id=d0262927d32bdd043125373a7f3a969e7600713d

commit d0262927d32bdd043125373a7f3a969e7600713d
Author: Alan Conway <aconway>
Commit: Alan Conway <aconway>

    Bug 891689 - New HA regularly shutting down active node
    
    qpid-primary script was incorrect and failing on status calls,
    causing the broker to be restarted by rgmanager.

Comment 6 Mike Cressman 2013-04-23 15:15:17 UTC

Trunk checkin svn rev: 1449870

Comment 7 mick 2013-12-13 17:42:23 UTC

With these qpid pkgs:

qpid-qmf-0.22-24.el6.x86_64
qpid-cpp-client-devel-0.22-29.el6.x86_64
qpid-proton-c-0.5-9.el6.x86_64
qpid-cpp-server-0.22-29.el6.x86_64
python-qpid-0.22-8.el6.noarch
qpid-tools-0.22-7.el6.noarch
qpid-cpp-server-ha-0.22-29.el6.x86_64
qpid-cpp-client-0.22-29.el6.x86_64
python-qpid-qmf-0.22-24.el6.x86_64


with cman amd rgmanager running -- 3 separate physical boxes (mrg3, mrg25, mrg28) -- no broker shutdown, no error messages after 30 mins with HA cluster running.

--> verified.