Bug 494803

Summary: On a two node cluster, cman status shows different outputs, then clvmd hangs.
Product: [Retired] Red Hat Cluster Suite Reporter: Alex Urbanowicz <aurbanowicz>
Component: cmanAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: low    
Version: 4CC: cluster-maint, edamato
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-04-08 15:02:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/messages excerpt from the desynchronized node blade301 and 302 (concatenated). none

Description Alex Urbanowicz 2009-04-08 07:39:48 UTC
Created attachment 338659 [details]
/var/log/messages excerpt from the desynchronized node blade301 and 302 (concatenated).

Description of problem:

I have a two node cluster with following config:

<?xml version="1.0"?>
<cluster config_version="5" name="gfs-project-mysql">
        <fence_daemon post_fail_delay="0" post_join_delay="33"/>
        <clusternodes>
                <clusternode name="blade301-cluster" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="rsysrq" nodename="blade301-cluster" password="x" port="9" operation="1bbbb"/>
                                </method>
                                <method name="2">
                                        <device name="manual" nodename="blade301-cluster"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="blade302-cluster" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="rsysrq" nodename="blade302-cluster" password="x" port="9" operation="1bbbb"/>
                                </method>
                                <method name="2">
                                        <device name="manual" nodename="blade302-cluster"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_rsysrq" name="rsysrq"/>
                <fencedevice agent="fence_manual" name="manual"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>

After starting by hand, the cluster synchronizes properly with running clvmd and gfs on both nodes. After restarting one of the nodes, cman nodes display different information on the nodes:

[root@blade301 alex]# date
Wed Apr  8 09:30:37 CEST 2009
[root@blade301 alex]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M    372   2009-04-07 15:48:23  blade301-cluster
   2   M    392   2009-04-07 16:29:33  blade302-cluster
[root@blade301 alex]# cman_tool status 
Version: 6.1.0
Config Version: 5
Cluster Name: gfs-orange-mysql
Cluster Id: 18
Cluster Member: Yes
Cluster Generation: 392
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Quorum: 1  
Active subsystems: 7
Flags: 2node Dirty 
Ports Bound: 0  
Node name: blade301-cluster
Node ID: 1
Multicast addresses: 239.192.0.18 
Node addresses: 10.100.216.16 

[root@blade302 alex]# date
Wed Apr  8 09:30:42 CEST 2009
[root@blade302 alex]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   X      0                        blade301-cluster
   2   M    388   2009-04-07 16:29:34  blade302-cluster
[root@blade302 alex]# cman_tool status
Version: 6.1.0
Config Version: 5
Cluster Name: gfs-orange-mysql
Cluster Id: 18
Cluster Member: Yes
Cluster Generation: 392
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Quorum: 1  
Active subsystems: 8
Flags: 2node Dirty 
Ports Bound: 0 11  
Node name: blade302-cluster
Node ID: 2
Multicast addresses: 239.192.0.18 
Node addresses: 10.100.216.17 

When the cluster is in this state, any CLVM related operation hangs on both nodes and it is impossible to use the gfs volume. The logs (attached) do not show any indication of the state. The logs and included outputs are from state after blade301 node was rebooted.

Version-Release number of selected component (if applicable): cman-2.0.98-1.el5

How reproducible:


Steps to Reproduce:

1. set up cluster using the above config with cman, clvmd and gfs running, rgmanager not running, iscsi storage as a backend

2. start the cluster 

3. if the cluster synchronizes properly, reboot one of the nodes
  
Actual results:


Expected results:


Additional info:

Comment 1 Christine Caulfield 2009-04-08 07:54:54 UTC
Can you try the patch mentioned in bz#487397 please ?

Comment 2 Alex Urbanowicz 2009-04-08 14:39:41 UTC
(In reply to comment #1)
> Can you try the patch mentioned in bz#487397 please ?  

The bug comes and goes periodically, but the patched cman package referenced in 487397 seems to make it go away permanently. Thank you very much!

Comment 3 Christine Caulfield 2009-04-08 15:02:56 UTC
I'm pleased that helped. I'll close this bug now.

*** This bug has been marked as a duplicate of bug 487397 ***