Bug 501537

Summary: qpidd should shut down immediately on loss of quorum.
Product: Red Hat Enterprise MRG Reporter: Alan Conway <aconway>
Component: qpid-cppAssignee: Alan Conway <aconway>
Status: CLOSED ERRATA QA Contact: Jan Sarenik <jsarenik>
Severity: high Docs Contact:
Priority: urgent    
Version: 1.1.1CC: cctrieloff, freznice, iboverma, jsarenik, lbrindle, mcressma, tao, tross
Target Milestone: 1.2   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Messaging enhancement Qpidd now shuts down immediately if the cluster quorum is lost.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-03 09:16:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 527551    

Description Alan Conway 2009-05-19 15:59:05 UTC
Description of problem:

Qpidd checks for cluster quorum only when sending to a client.
It should register with cman for notification and shut down immediately
on loss of quorum. 

With the current behaviour an idle qpidd could fail to notice a short-lived loss of quorum, which could put it in an invalid state due to missed activity.

How reproducible: easy


Steps to Reproduce:
1. confiure a cluster, start cman and qpidd --cluster-cman
2. stop cluster nodes till the cluster is inquorate.
3. start nodes till the cluster is quorate again.
  
Actual results:

qpidd fails not notice the loss of quorum.

Expected results:

qpidd shuts down with a  "lost quorum" message.

Comment 1 Alan Conway 2009-07-13 12:41:07 UTC
*** Bug 510880 has been marked as a duplicate of this bug. ***

Comment 2 Alan Conway 2009-08-06 17:42:14 UTC
Fixed in revision 801740

Comment 3 Alan Conway 2009-09-11 12:52:04 UTC
*** Bug 471290 has been marked as a duplicate of this bug. ***

Comment 4 Alan Conway 2009-09-11 17:18:42 UTC
Note: to stop a cluster node use: sudo cman_tool close force

Comment 5 Alan Conway 2009-09-11 17:20:44 UTC
Typo in previous comment, should be: sudo cman_tool leave force

Comment 7 Alan Conway 2009-09-18 13:17:08 UTC
Note that this also affects performance. Prior to this fix turning on cman degraded performance significantly (e.g. latencytest). With this fix, enabling cman support has no effect on performance.

Comment 9 Jan Sarenik 2009-10-13 14:34:51 UTC
Excuse me, it takes me more time than I expected.
For last two days I am setting up a cman cluster,
reading the docs and today I already verified one
architecture, just need to reproduce it tomorrow
and validate on the other architecture.

Comment 10 Jan Sarenik 2009-10-13 23:03:04 UTC
Verified on qpidd-cluster-0.5.752581-28.el5 i386 and x86_64

Reproduced on qpidd-cluster-0.5.752581-26.el5

---------------------------------------------------------------------
root@mrg-qe-10:~# cat /etc/cluster/cluster.conf 
<?xml version="1.0"?>
<cluster name="jasanclust" config_version="6">
  <clusternodes><clusternode name="mrg-qe-09.lab.eng.brq.redhat.com" votes="1" nodeid="1"><fence><method name="single"><device name="manual" ipaddr="10.34.33.62"/></method></fence></clusternode><clusternode name="mrg-qe-10.lab.eng.brq.redhat.com" votes="1" nodeid="2"><fence><method name="single"><device name="manual" ipaddr="10.34.33.63"/></method></fence></clusternode><clusternode name="mrg-qe-11.lab.eng.brq.redhat.com" votes="1" nodeid="3"><fence><method name="single"><device name="manual" ipaddr="10.34.33.64"/></method></fence></clusternode><clusternode name="mrg-qe-12.lab.eng.brq.redhat.com" votes="1" nodeid="4"><fence><method name="single"><device name="manual" ipaddr="10.34.33.65"/></method></fence></clusternode></clusternodes>
  <fencedevices><fencedevice name="manual" agent="manual"/></fencedevices>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>
root@mrg-qe-10:~# cat /etc/sysconfig/cman 
FENCED_START_TIMEOUT=1
FENCED_MEMBER_DELAY=1
FENCE_JOIN="no"
root@mrg-qe-10:~# cat /etc/ais/openais.conf 
totem {
        version: 2
        secauth: off
        threads: 0
        rrp_mode: none
        interface {
                ringnumber: 0
                bindnetaddr: 10.34.33.0
                mcastaddr: 226.94.11.1
                mcastport: 5405
        }
}

logging {
        debug: off
        timestamp: on
}

amf {
        mode: disabled
}

Comment 11 Irina Boverman 2009-10-22 17:25:28 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Qpidd now shuts down immediately when cluster quorum is lost (501537)

Comment 12 Lana Brindley 2009-11-26 21:13:25 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,3 @@
-Qpidd now shuts down immediately when cluster quorum is lost (501537)+Messaging enhancement
+
+Qpidd now shuts down immediately if the cluster quorum is lost.

Comment 14 errata-xmlrpc 2009-12-03 09:16:34 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html