Bug 819787

Summary: cmannotifyd does not issue initial quorum state
Product: Red Hat Enterprise Linux 6 Reporter: mick <mgoulish>
Component: clusterAssignee: Fabio Massimo Di Nitto <fdinitto>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.3CC: ccaulfie, cluster-maint, djansa, fdinitto, jpayne, jwest, lhh, mgoulish, rpeterso, syeghiay, teigland, tross
Target Milestone: rcKeywords: ZStream
Target Release: 6.3   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: cluster-3.0.12.1-32.el6 Doc Type: Bug Fix
Doc Text:
No Documentation needed
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 13:58:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 733221, 820357    

Description mick 2012-05-08 08:48:01 UTC
On RHEL 6.2, using a cluster of 4 boxes, we are not getting a notification of initial quorum state from cmannotifyd when we first bring everything up.  

We believe that the notify demon is being started after cman, so cman has already achieved quorum (or not) when the demon starts -- and our notification script does not get called.

Our script *does* get called if, after everything is up, we then take down box 'D'.  Then boxes A, B, and C correctly get a notification that they have quorum.  If we then bring D back up -- A, B, and C get another notification that they still have quorum.

Our script on box A is also correctly called on loss of quorum if we shut down C and D.  So all the state changes after the initial startup are working correctly.

But we need that initial quorum notification, or we are doomed.

Comment 3 Fabio Massimo Di Nitto 2012-05-08 12:38:55 UTC
Unit test:

setup:

2 node cluster
enable <logging debug="on"/>

on both nodes:
cp /usr/share/doc/cman-$vesion/cman_notify_template.sh /etc/cluster/cman-notify.d/
chmod 755 /etc/cluster/cman-notify.d/cman_notify_template.sh

pre patch:

on both nodes:
cman_tool join

wait for cman to be quorate, verify with cman_tool status

start cmannotifyd

[root@rhel6-node2 cman-notify.d]# cat /var/log/cluster/file.log
cat: /var/log/cluster/file.log: No such file or directory

killall cmannotifyd


post patch:

on both nodes:
cman_tool join

wait for cman to be quorate, verify with cman_tool status

start cmannotifyd

[root@rhel6-node2 cman-notify.d]# cat /var/log/cluster/file.log
debugging is enabled
replace me with something to do
debugging is enabled
replace me with something to do
we still have quorum

output is from the generic example script and can be tuned if necessary
output file.log can also be tuned if necessary

Test has been repeated to test cman disappearing and reapparing:

cman_tool leave

cmannotifyd will wait for cman to come back (see /var/log/cluster/cmannotifyd.log)

cman_tool join

Comment 6 Fabio Massimo Di Nitto 2012-05-08 16:41:32 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
No Documentation needed

Comment 9 Justin Payne 2012-05-08 23:41:56 UTC
Verified in cman-3.0.12.1-32.el6

[root@dash-03 ~]# rpm -q cman
cman-3.0.12.1-28.el6.x86_64
[root@dash-03 ~]# ls /etc/cluster/cman-notify.d/
cman_notify_template.sh

[root@dash-01 ~]# cp /usr/share/doc/cman-3.0.12.1/cman_notify_template.sh /etc/cluster/cman-notify.d/.
[root@dash-01 ~]# chmod 755 /etc/cluster/cman-notify.d/cman_notify_template.sh
[root@dash-01 ~]# rpm -q cman
cman-3.0.12.1-28.el6.x86_64

[root@dash-01 ~]# cman_tool status
<---------- cut out ------------------>
Version: 6.2.0
Config Version: 1
Cluster Name: dash
Cluster Id: 57228
Cluster Member: Yes
Cluster Generation: 64
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 1
Flags: 
Ports Bound: 0  
Node name: dash-01
Node ID: 1

[root@dash-03 ~]# cman_tool status
<----------- cut out ----------------->
Version: 6.2.0
Config Version: 1
Cluster Name: dash
Cluster Id: 57228
Cluster Member: Yes
Cluster Generation: 64
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 1
Flags: 
Ports Bound: 0  
Node name: dash-03
Node ID: 3

[root@dash-01 ~]# ps aux |grep cman
root     29395  0.2  0.0  37608  1316 ?        Ssl  18:18   0:00 /usr/sbin/cmannotifyd
root     29413  0.0  0.0 103240   808 pts/0    S+   18:18   0:00 grep cman
[root@dash-01 ~]# cat /var/log/cluster/file.log
cat: /var/log/cluster/file.log: No such file or directory

[root@dash-03 ~]# ps aux |grep cman
root     31032  0.1  0.0  37604  1404 ?        Ssl  18:18   0:00 /usr/sbin/cmannotifyd
root     31050  0.0  0.0 103240   804 pts/0    S+   18:19   0:00 grep cman
[root@dash-03 ~]# cat /var/log/cluster/file.log
cat: /var/log/cluster/file.log: No such file or directory


[POST UPDATE]

[root@dash-01 ~]# rpm -q cman
cman-3.0.12.1-32.el6.x86_64

[root@dash-03 ~]# rpm -q cman
cman-3.0.12.1-32.el6.x86_64

[root@dash-03 ~]# cman_tool status
<----------- cut out ----------------->
Version: 6.2.0
Config Version: 1
Cluster Name: dash
Cluster Id: 57228
Cluster Member: Yes
Cluster Generation: 76
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 1
Flags: 
Ports Bound: 0  
Node name: dash-03
Node ID: 3

[root@dash-01 ~]# cman_tool status
<---------- cut out ------------------>
Version: 6.2.0
Config Version: 1
Cluster Name: dash
Cluster Id: 57228
Cluster Member: Yes
Cluster Generation: 76
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 1
Flags: 
Ports Bound: 0  
Node name: dash-01
Node ID: 1


[root@dash-03 ~]# ps aux |grep cman
root     31217  2.5  0.0 103140  1452 ?        Ssl  18:27   0:00 /usr/sbin/cmannotifyd
root     31256  0.0  0.0 103240   808 pts/0    S+   18:27   0:00 grep cman
[root@dash-03 ~]# cat /var/log/cluster/file.log
debugging is enabled
replace me with something to do
debugging is enabled
replace me with something to do
we still have quorum

[root@dash-01 ~]# ps aux |grep cman; cat /var/log/cluster/file.log
root     29586  0.3  0.0 103140  1456 ?        Ssl  18:27   0:00 /usr/sbin/cmannotifyd
root     29625  0.0  0.0 103240   808 pts/0    S+   18:28   0:00 grep cman
debugging is enabled
replace me with something to do
debugging is enabled
replace me with something to do
we still have quorum

[root@dash-01 ~]# cman_tool leave; tail /var/log/cluster/cmannotifyd.log
May 08 18:29:01 corosync [CMAN  ] daemon: read 20 bytes from fd 18
May 08 18:29:01 corosync [CMAN  ] daemon: client command is 800000bb
May 08 18:29:01 corosync [CMAN  ] daemon: About to process command
May 08 18:29:01 corosync [CMAN  ] memb: command to process is 800000bb
May 08 18:29:01 corosync [CMAN  ] daemon: sending reply 102 to fd 17
May 08 18:29:01 corosync [CMAN  ] memb: command return code is -11
May 08 18:29:01 corosync [CMAN  ] daemon: read 20 bytes from fd 17
May 08 18:29:01 corosync [CMAN  ] daemon: client command is bc
May 08 18:29:01 corosync [CMAN  ] daemon: About to process command
May 08 18:29:01 corosync [CMAN  ] memb: command to process is bc
May 08 18:29:01 corosync [CMAN  ] memb: Shutdown reply is 1
May 08 18:29:01 corosync [CMAN  ] memb: Sending LEAVE, reason 0
May 08 18:29:01 corosync [CMAN  ] ais: comms send message 0x7fff2e4c93e0 len = 4
May 08 18:29:01 corosync [CMAN  ] memb: shutdown decision is: 0 (yes=1, no=0) flags=0
May 08 18:29:01 corosync [CMAN  ] memb: command return code is -11
May 08 18:29:01 corosync [TOTEM ] mcasted message added to pending queue
May 08 18:29:01 corosync [TOTEM ] Delivering 1a to 1b
May 08 18:29:01 corosync [TOTEM ] Delivering MCAST message with seq 1b to pending delivery queue
May 08 18:29:01 corosync [CMAN  ] ais: deliver_fn source nodeid = 1, len=20, endian_conv=0
May 08 18:29:01 corosync [CMAN  ] memb: Message on port 0 is 7
May 08 18:29:01 corosync [CMAN  ] memb: got LEAVE from node 1, reason = 0
May 08 18:29:01 corosync [CMAN  ] daemon: send status return: 0
May 08 18:29:01 corosync [CMAN  ] daemon: sending reply c00000bb to fd 18
May 08 18:29:01 corosync [TOTEM ] Received ringid(10.15.89.168:76) seq 1b
May 08 18:29:01 corosync [SERV  ] Unloading all Corosync service engines.
May 08 18:29:01 corosync [TOTEM ] releasing messages up to and including 1b
May 08 18:29:01 corosync [SERV  ] Service engine unloaded: corosync extended virtual synchrony service
May 08 18:29:01 corosync [SERV  ] Service engine unloaded: corosync configuration service
May 08 18:29:01 corosync [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
May 08 18:29:01 corosync [SERV  ] Service engine unloaded: corosync cluster config database access v1.01
May 08 18:29:01 corosync [SERV  ] Service engine unloaded: corosync profile loading service
May 08 18:29:01 corosync [SERV  ] Service engine unloaded: openais checkpoint service B.01.01
May 08 18:29:01 corosync [SERV  ] Service engine unloaded: corosync CMAN membership service 2.90
May 08 18:29:01 corosync [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
May 08 18:29:01 corosync [TOTEM ] sending join/leave message
May 08 18:29:01 corosync [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:1864.
May 08 18:20:19 cmannotifyd shutting down...
May 08 18:27:01 cmannotifyd Dispatching first cluster status
May 08 18:29:01 cmannotifyd Received a cman shutdown request

[root@dash-03 ~]# cat /var/log/cluster/cmannotifyd.log
May 08 18:20:35 cmannotifyd shutting down...
May 08 18:27:24 cmannotifyd Dispatching first cluster status
May 08 18:29:20 cmannotifyd Received a cman statechange notification

[root@dash-03 ~]# cman_tool status
<----------- cut out ----------------->
Version: 6.2.0
Config Version: 1
Cluster Name: dash
Cluster Id: 57228
Cluster Member: Yes
Cluster Generation: 80
Membership state: Cluster-Member
Nodes: 1
Expected votes: 3

[root@dash-03 ~]# cat /var/log/cluster/cmannotifyd.log
May 08 18:20:35 cmannotifyd shutting down...
May 08 18:27:24 cmannotifyd Dispatching first cluster status
May 08 18:29:20 cmannotifyd Received a cman statechange notification

[CMAN_TOOL JOIN ON NODE 1]

[root@dash-03 ~]# cman_tool status
<----------- cut out ----------------->
Version: 6.2.0
Config Version: 1
Cluster Name: dash
Cluster Id: 57228
Cluster Member: Yes
Cluster Generation: 84
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 2
Flags: 
Ports Bound: 0  
Node name: dash-03
Node ID: 3

[root@dash-03 ~]# cat /var/log/cluster/cmannotifyd.log
May 08 18:20:35 cmannotifyd shutting down...
May 08 18:27:24 cmannotifyd Dispatching first cluster status
May 08 18:29:20 cmannotifyd Received a cman statechange notification
May 08 18:34:20 cmannotifyd Received a cman statechange notification
May 08 18:34:20 cmannotifyd Received a cman statechange notification

Comment 13 errata-xmlrpc 2012-06-20 13:58:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0861.html