RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 819787 - cmannotifyd does not issue initial quorum state
Summary: cmannotifyd does not issue initial quorum state
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: cluster
Version: 6.3
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: rc
: 6.3
Assignee: Fabio Massimo Di Nitto
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 733221 820357
TreeView+ depends on / blocked
 
Reported: 2012-05-08 08:48 UTC by mick
Modified: 2012-06-20 13:58 UTC (History)
12 users (show)

Fixed In Version: cluster-3.0.12.1-32.el6
Doc Type: Bug Fix
Doc Text:
No Documentation needed
Clone Of:
Environment:
Last Closed: 2012-06-20 13:58:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2012:0861 0 normal SHIPPED_LIVE cluster and gfs2-utils bug fix and enhancement update 2013-05-30 19:58:20 UTC

Description mick 2012-05-08 08:48:01 UTC
On RHEL 6.2, using a cluster of 4 boxes, we are not getting a notification of initial quorum state from cmannotifyd when we first bring everything up.  

We believe that the notify demon is being started after cman, so cman has already achieved quorum (or not) when the demon starts -- and our notification script does not get called.

Our script *does* get called if, after everything is up, we then take down box 'D'.  Then boxes A, B, and C correctly get a notification that they have quorum.  If we then bring D back up -- A, B, and C get another notification that they still have quorum.

Our script on box A is also correctly called on loss of quorum if we shut down C and D.  So all the state changes after the initial startup are working correctly.

But we need that initial quorum notification, or we are doomed.

Comment 3 Fabio Massimo Di Nitto 2012-05-08 12:38:55 UTC
Unit test:

setup:

2 node cluster
enable <logging debug="on"/>

on both nodes:
cp /usr/share/doc/cman-$vesion/cman_notify_template.sh /etc/cluster/cman-notify.d/
chmod 755 /etc/cluster/cman-notify.d/cman_notify_template.sh

pre patch:

on both nodes:
cman_tool join

wait for cman to be quorate, verify with cman_tool status

start cmannotifyd

[root@rhel6-node2 cman-notify.d]# cat /var/log/cluster/file.log
cat: /var/log/cluster/file.log: No such file or directory

killall cmannotifyd


post patch:

on both nodes:
cman_tool join

wait for cman to be quorate, verify with cman_tool status

start cmannotifyd

[root@rhel6-node2 cman-notify.d]# cat /var/log/cluster/file.log
debugging is enabled
replace me with something to do
debugging is enabled
replace me with something to do
we still have quorum

output is from the generic example script and can be tuned if necessary
output file.log can also be tuned if necessary

Test has been repeated to test cman disappearing and reapparing:

cman_tool leave

cmannotifyd will wait for cman to come back (see /var/log/cluster/cmannotifyd.log)

cman_tool join

Comment 6 Fabio Massimo Di Nitto 2012-05-08 16:41:32 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
No Documentation needed

Comment 9 Justin Payne 2012-05-08 23:41:56 UTC
Verified in cman-3.0.12.1-32.el6

[root@dash-03 ~]# rpm -q cman
cman-3.0.12.1-28.el6.x86_64
[root@dash-03 ~]# ls /etc/cluster/cman-notify.d/
cman_notify_template.sh

[root@dash-01 ~]# cp /usr/share/doc/cman-3.0.12.1/cman_notify_template.sh /etc/cluster/cman-notify.d/.
[root@dash-01 ~]# chmod 755 /etc/cluster/cman-notify.d/cman_notify_template.sh
[root@dash-01 ~]# rpm -q cman
cman-3.0.12.1-28.el6.x86_64

[root@dash-01 ~]# cman_tool status
<---------- cut out ------------------>
Version: 6.2.0
Config Version: 1
Cluster Name: dash
Cluster Id: 57228
Cluster Member: Yes
Cluster Generation: 64
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 1
Flags: 
Ports Bound: 0  
Node name: dash-01
Node ID: 1

[root@dash-03 ~]# cman_tool status
<----------- cut out ----------------->
Version: 6.2.0
Config Version: 1
Cluster Name: dash
Cluster Id: 57228
Cluster Member: Yes
Cluster Generation: 64
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 1
Flags: 
Ports Bound: 0  
Node name: dash-03
Node ID: 3

[root@dash-01 ~]# ps aux |grep cman
root     29395  0.2  0.0  37608  1316 ?        Ssl  18:18   0:00 /usr/sbin/cmannotifyd
root     29413  0.0  0.0 103240   808 pts/0    S+   18:18   0:00 grep cman
[root@dash-01 ~]# cat /var/log/cluster/file.log
cat: /var/log/cluster/file.log: No such file or directory

[root@dash-03 ~]# ps aux |grep cman
root     31032  0.1  0.0  37604  1404 ?        Ssl  18:18   0:00 /usr/sbin/cmannotifyd
root     31050  0.0  0.0 103240   804 pts/0    S+   18:19   0:00 grep cman
[root@dash-03 ~]# cat /var/log/cluster/file.log
cat: /var/log/cluster/file.log: No such file or directory


[POST UPDATE]

[root@dash-01 ~]# rpm -q cman
cman-3.0.12.1-32.el6.x86_64

[root@dash-03 ~]# rpm -q cman
cman-3.0.12.1-32.el6.x86_64

[root@dash-03 ~]# cman_tool status
<----------- cut out ----------------->
Version: 6.2.0
Config Version: 1
Cluster Name: dash
Cluster Id: 57228
Cluster Member: Yes
Cluster Generation: 76
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 1
Flags: 
Ports Bound: 0  
Node name: dash-03
Node ID: 3

[root@dash-01 ~]# cman_tool status
<---------- cut out ------------------>
Version: 6.2.0
Config Version: 1
Cluster Name: dash
Cluster Id: 57228
Cluster Member: Yes
Cluster Generation: 76
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 1
Flags: 
Ports Bound: 0  
Node name: dash-01
Node ID: 1


[root@dash-03 ~]# ps aux |grep cman
root     31217  2.5  0.0 103140  1452 ?        Ssl  18:27   0:00 /usr/sbin/cmannotifyd
root     31256  0.0  0.0 103240   808 pts/0    S+   18:27   0:00 grep cman
[root@dash-03 ~]# cat /var/log/cluster/file.log
debugging is enabled
replace me with something to do
debugging is enabled
replace me with something to do
we still have quorum

[root@dash-01 ~]# ps aux |grep cman; cat /var/log/cluster/file.log
root     29586  0.3  0.0 103140  1456 ?        Ssl  18:27   0:00 /usr/sbin/cmannotifyd
root     29625  0.0  0.0 103240   808 pts/0    S+   18:28   0:00 grep cman
debugging is enabled
replace me with something to do
debugging is enabled
replace me with something to do
we still have quorum

[root@dash-01 ~]# cman_tool leave; tail /var/log/cluster/cmannotifyd.log
May 08 18:29:01 corosync [CMAN  ] daemon: read 20 bytes from fd 18
May 08 18:29:01 corosync [CMAN  ] daemon: client command is 800000bb
May 08 18:29:01 corosync [CMAN  ] daemon: About to process command
May 08 18:29:01 corosync [CMAN  ] memb: command to process is 800000bb
May 08 18:29:01 corosync [CMAN  ] daemon: sending reply 102 to fd 17
May 08 18:29:01 corosync [CMAN  ] memb: command return code is -11
May 08 18:29:01 corosync [CMAN  ] daemon: read 20 bytes from fd 17
May 08 18:29:01 corosync [CMAN  ] daemon: client command is bc
May 08 18:29:01 corosync [CMAN  ] daemon: About to process command
May 08 18:29:01 corosync [CMAN  ] memb: command to process is bc
May 08 18:29:01 corosync [CMAN  ] memb: Shutdown reply is 1
May 08 18:29:01 corosync [CMAN  ] memb: Sending LEAVE, reason 0
May 08 18:29:01 corosync [CMAN  ] ais: comms send message 0x7fff2e4c93e0 len = 4
May 08 18:29:01 corosync [CMAN  ] memb: shutdown decision is: 0 (yes=1, no=0) flags=0
May 08 18:29:01 corosync [CMAN  ] memb: command return code is -11
May 08 18:29:01 corosync [TOTEM ] mcasted message added to pending queue
May 08 18:29:01 corosync [TOTEM ] Delivering 1a to 1b
May 08 18:29:01 corosync [TOTEM ] Delivering MCAST message with seq 1b to pending delivery queue
May 08 18:29:01 corosync [CMAN  ] ais: deliver_fn source nodeid = 1, len=20, endian_conv=0
May 08 18:29:01 corosync [CMAN  ] memb: Message on port 0 is 7
May 08 18:29:01 corosync [CMAN  ] memb: got LEAVE from node 1, reason = 0
May 08 18:29:01 corosync [CMAN  ] daemon: send status return: 0
May 08 18:29:01 corosync [CMAN  ] daemon: sending reply c00000bb to fd 18
May 08 18:29:01 corosync [TOTEM ] Received ringid(10.15.89.168:76) seq 1b
May 08 18:29:01 corosync [SERV  ] Unloading all Corosync service engines.
May 08 18:29:01 corosync [TOTEM ] releasing messages up to and including 1b
May 08 18:29:01 corosync [SERV  ] Service engine unloaded: corosync extended virtual synchrony service
May 08 18:29:01 corosync [SERV  ] Service engine unloaded: corosync configuration service
May 08 18:29:01 corosync [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
May 08 18:29:01 corosync [SERV  ] Service engine unloaded: corosync cluster config database access v1.01
May 08 18:29:01 corosync [SERV  ] Service engine unloaded: corosync profile loading service
May 08 18:29:01 corosync [SERV  ] Service engine unloaded: openais checkpoint service B.01.01
May 08 18:29:01 corosync [SERV  ] Service engine unloaded: corosync CMAN membership service 2.90
May 08 18:29:01 corosync [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
May 08 18:29:01 corosync [TOTEM ] sending join/leave message
May 08 18:29:01 corosync [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:1864.
May 08 18:20:19 cmannotifyd shutting down...
May 08 18:27:01 cmannotifyd Dispatching first cluster status
May 08 18:29:01 cmannotifyd Received a cman shutdown request

[root@dash-03 ~]# cat /var/log/cluster/cmannotifyd.log
May 08 18:20:35 cmannotifyd shutting down...
May 08 18:27:24 cmannotifyd Dispatching first cluster status
May 08 18:29:20 cmannotifyd Received a cman statechange notification

[root@dash-03 ~]# cman_tool status
<----------- cut out ----------------->
Version: 6.2.0
Config Version: 1
Cluster Name: dash
Cluster Id: 57228
Cluster Member: Yes
Cluster Generation: 80
Membership state: Cluster-Member
Nodes: 1
Expected votes: 3

[root@dash-03 ~]# cat /var/log/cluster/cmannotifyd.log
May 08 18:20:35 cmannotifyd shutting down...
May 08 18:27:24 cmannotifyd Dispatching first cluster status
May 08 18:29:20 cmannotifyd Received a cman statechange notification

[CMAN_TOOL JOIN ON NODE 1]

[root@dash-03 ~]# cman_tool status
<----------- cut out ----------------->
Version: 6.2.0
Config Version: 1
Cluster Name: dash
Cluster Id: 57228
Cluster Member: Yes
Cluster Generation: 84
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 2
Flags: 
Ports Bound: 0  
Node name: dash-03
Node ID: 3

[root@dash-03 ~]# cat /var/log/cluster/cmannotifyd.log
May 08 18:20:35 cmannotifyd shutting down...
May 08 18:27:24 cmannotifyd Dispatching first cluster status
May 08 18:29:20 cmannotifyd Received a cman statechange notification
May 08 18:34:20 cmannotifyd Received a cman statechange notification
May 08 18:34:20 cmannotifyd Received a cman statechange notification

Comment 13 errata-xmlrpc 2012-06-20 13:58:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0861.html


Note You need to log in before you can comment on or make changes to this bug.