Bug 501537 - qpidd should shut down immediately on loss of quorum.
Summary: qpidd should shut down immediately on loss of quorum.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1.1
Hardware: All
OS: Linux
urgent
high
Target Milestone: 1.2
: ---
Assignee: Alan Conway
QA Contact: Jan Sarenik
URL:
Whiteboard:
: 471290 510880 (view as bug list)
Depends On:
Blocks: 527551
TreeView+ depends on / blocked
 
Reported: 2009-05-19 15:59 UTC by Alan Conway
Modified: 2018-10-27 16:01 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Messaging enhancement Qpidd now shuts down immediately if the cluster quorum is lost.
Clone Of:
Environment:
Last Closed: 2009-12-03 09:16:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:1633 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging and Grid Version 1.2 2009-12-03 09:15:33 UTC

Description Alan Conway 2009-05-19 15:59:05 UTC
Description of problem:

Qpidd checks for cluster quorum only when sending to a client.
It should register with cman for notification and shut down immediately
on loss of quorum. 

With the current behaviour an idle qpidd could fail to notice a short-lived loss of quorum, which could put it in an invalid state due to missed activity.

How reproducible: easy


Steps to Reproduce:
1. confiure a cluster, start cman and qpidd --cluster-cman
2. stop cluster nodes till the cluster is inquorate.
3. start nodes till the cluster is quorate again.
  
Actual results:

qpidd fails not notice the loss of quorum.

Expected results:

qpidd shuts down with a  "lost quorum" message.

Comment 1 Alan Conway 2009-07-13 12:41:07 UTC
*** Bug 510880 has been marked as a duplicate of this bug. ***

Comment 2 Alan Conway 2009-08-06 17:42:14 UTC
Fixed in revision 801740

Comment 3 Alan Conway 2009-09-11 12:52:04 UTC
*** Bug 471290 has been marked as a duplicate of this bug. ***

Comment 4 Alan Conway 2009-09-11 17:18:42 UTC
Note: to stop a cluster node use: sudo cman_tool close force

Comment 5 Alan Conway 2009-09-11 17:20:44 UTC
Typo in previous comment, should be: sudo cman_tool leave force

Comment 7 Alan Conway 2009-09-18 13:17:08 UTC
Note that this also affects performance. Prior to this fix turning on cman degraded performance significantly (e.g. latencytest). With this fix, enabling cman support has no effect on performance.

Comment 9 Jan Sarenik 2009-10-13 14:34:51 UTC
Excuse me, it takes me more time than I expected.
For last two days I am setting up a cman cluster,
reading the docs and today I already verified one
architecture, just need to reproduce it tomorrow
and validate on the other architecture.

Comment 10 Jan Sarenik 2009-10-13 23:03:04 UTC
Verified on qpidd-cluster-0.5.752581-28.el5 i386 and x86_64

Reproduced on qpidd-cluster-0.5.752581-26.el5

---------------------------------------------------------------------
root@mrg-qe-10:~# cat /etc/cluster/cluster.conf 
<?xml version="1.0"?>
<cluster name="jasanclust" config_version="6">
  <clusternodes><clusternode name="mrg-qe-09.lab.eng.brq.redhat.com" votes="1" nodeid="1"><fence><method name="single"><device name="manual" ipaddr="10.34.33.62"/></method></fence></clusternode><clusternode name="mrg-qe-10.lab.eng.brq.redhat.com" votes="1" nodeid="2"><fence><method name="single"><device name="manual" ipaddr="10.34.33.63"/></method></fence></clusternode><clusternode name="mrg-qe-11.lab.eng.brq.redhat.com" votes="1" nodeid="3"><fence><method name="single"><device name="manual" ipaddr="10.34.33.64"/></method></fence></clusternode><clusternode name="mrg-qe-12.lab.eng.brq.redhat.com" votes="1" nodeid="4"><fence><method name="single"><device name="manual" ipaddr="10.34.33.65"/></method></fence></clusternode></clusternodes>
  <fencedevices><fencedevice name="manual" agent="manual"/></fencedevices>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>
root@mrg-qe-10:~# cat /etc/sysconfig/cman 
FENCED_START_TIMEOUT=1
FENCED_MEMBER_DELAY=1
FENCE_JOIN="no"
root@mrg-qe-10:~# cat /etc/ais/openais.conf 
totem {
        version: 2
        secauth: off
        threads: 0
        rrp_mode: none
        interface {
                ringnumber: 0
                bindnetaddr: 10.34.33.0
                mcastaddr: 226.94.11.1
                mcastport: 5405
        }
}

logging {
        debug: off
        timestamp: on
}

amf {
        mode: disabled
}

Comment 11 Irina Boverman 2009-10-22 17:25:28 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Qpidd now shuts down immediately when cluster quorum is lost (501537)

Comment 12 Lana Brindley 2009-11-26 21:13:25 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,3 @@
-Qpidd now shuts down immediately when cluster quorum is lost (501537)+Messaging enhancement
+
+Qpidd now shuts down immediately if the cluster quorum is lost.

Comment 14 errata-xmlrpc 2009-12-03 09:16:34 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html


Note You need to log in before you can comment on or make changes to this bug.