Bug 494803 - On a two node cluster, cman status shows different outputs, then clvmd hangs.
Summary: On a two node cluster, cman status shows different outputs, then clvmd hangs.
Keywords:
Status: CLOSED DUPLICATE of bug 487397
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: cman
Version: 4
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-04-08 07:39 UTC by Alex Urbanowicz
Modified: 2009-04-16 20:32 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2009-04-08 15:02:56 UTC
Embargoed:


Attachments (Terms of Use)
/var/log/messages excerpt from the desynchronized node blade301 and 302 (concatenated). (17.89 KB, text/plain)
2009-04-08 07:39 UTC, Alex Urbanowicz
no flags Details

Description Alex Urbanowicz 2009-04-08 07:39:48 UTC
Created attachment 338659 [details]
/var/log/messages excerpt from the desynchronized node blade301 and 302 (concatenated).

Description of problem:

I have a two node cluster with following config:

<?xml version="1.0"?>
<cluster config_version="5" name="gfs-project-mysql">
        <fence_daemon post_fail_delay="0" post_join_delay="33"/>
        <clusternodes>
                <clusternode name="blade301-cluster" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="rsysrq" nodename="blade301-cluster" password="x" port="9" operation="1bbbb"/>
                                </method>
                                <method name="2">
                                        <device name="manual" nodename="blade301-cluster"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="blade302-cluster" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="rsysrq" nodename="blade302-cluster" password="x" port="9" operation="1bbbb"/>
                                </method>
                                <method name="2">
                                        <device name="manual" nodename="blade302-cluster"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_rsysrq" name="rsysrq"/>
                <fencedevice agent="fence_manual" name="manual"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>

After starting by hand, the cluster synchronizes properly with running clvmd and gfs on both nodes. After restarting one of the nodes, cman nodes display different information on the nodes:

[root@blade301 alex]# date
Wed Apr  8 09:30:37 CEST 2009
[root@blade301 alex]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M    372   2009-04-07 15:48:23  blade301-cluster
   2   M    392   2009-04-07 16:29:33  blade302-cluster
[root@blade301 alex]# cman_tool status 
Version: 6.1.0
Config Version: 5
Cluster Name: gfs-orange-mysql
Cluster Id: 18
Cluster Member: Yes
Cluster Generation: 392
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Quorum: 1  
Active subsystems: 7
Flags: 2node Dirty 
Ports Bound: 0  
Node name: blade301-cluster
Node ID: 1
Multicast addresses: 239.192.0.18 
Node addresses: 10.100.216.16 

[root@blade302 alex]# date
Wed Apr  8 09:30:42 CEST 2009
[root@blade302 alex]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   X      0                        blade301-cluster
   2   M    388   2009-04-07 16:29:34  blade302-cluster
[root@blade302 alex]# cman_tool status
Version: 6.1.0
Config Version: 5
Cluster Name: gfs-orange-mysql
Cluster Id: 18
Cluster Member: Yes
Cluster Generation: 392
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Quorum: 1  
Active subsystems: 8
Flags: 2node Dirty 
Ports Bound: 0 11  
Node name: blade302-cluster
Node ID: 2
Multicast addresses: 239.192.0.18 
Node addresses: 10.100.216.17 

When the cluster is in this state, any CLVM related operation hangs on both nodes and it is impossible to use the gfs volume. The logs (attached) do not show any indication of the state. The logs and included outputs are from state after blade301 node was rebooted.

Version-Release number of selected component (if applicable): cman-2.0.98-1.el5

How reproducible:


Steps to Reproduce:

1. set up cluster using the above config with cman, clvmd and gfs running, rgmanager not running, iscsi storage as a backend

2. start the cluster 

3. if the cluster synchronizes properly, reboot one of the nodes
  
Actual results:


Expected results:


Additional info:

Comment 1 Christine Caulfield 2009-04-08 07:54:54 UTC
Can you try the patch mentioned in bz#487397 please ?

Comment 2 Alex Urbanowicz 2009-04-08 14:39:41 UTC
(In reply to comment #1)
> Can you try the patch mentioned in bz#487397 please ?  

The bug comes and goes periodically, but the patched cman package referenced in 487397 seems to make it go away permanently. Thank you very much!

Comment 3 Christine Caulfield 2009-04-08 15:02:56 UTC
I'm pleased that helped. I'll close this bug now.

*** This bug has been marked as a duplicate of bug 487397 ***


Note You need to log in before you can comment on or make changes to this bug.