Bug 249715

Summary: ccsd not picking up new cluster.conf when adding a cluster node
Product: Red Hat Enterprise Linux 5 Reporter: Ryan McCabe <rmccabe>
Component: cmanAssignee: Ryan O'Hara <rohara>
Status: CLOSED WONTFIX QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 5.0CC: cluster-maint, jparsons, kanderso, lhh, rudi123, teigland
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-09-09 14:45:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 249342    

Description Ryan McCabe 2007-07-26 16:42:08 UTC
I ran into this when adding and deleting nodes from a cluster in conga:

I had a 3 node cluster and deleted one of the nodes (successfully), then tried
to add it back. The cluster I'm trying to add it to is named duck. Here's the
skeletal cluster.conf file in /etc/cluster/ on louey (the node I'm adding):

<?xml version="1.0"?>
<cluster config_version="1" name="duck">
   <fence_daemon post_fail_delay="0"post_join_delay="3"/>
   <clusternodes/>
   <cman/>
   <fencedevices/>
   <rm/>
</cluster>

When i run 'service cman start', ccsd starts, but doesn't pick up the current
configuration from the one of the other two cluster nodes, causing cman to error
out because it can't find the local node name in the cluster.conf.

Here's the current configuration from the other two nodes:

<?xml version="1.0" ?>
<cluster alias="duck" config_version="8" name="duck">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="huey.lab.boston.redhat.com" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="donald" port="1"/>
                                </method>
                                <method name="2">
                                        <device name="donald" port="5"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="dewey.lab.boston.redhat.com" nodeid="3"
votes="1">
                        <fence>
                                <method name="1">
                                        <device name="donald" port="2"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="louey.lab.boston.redhat.com" nodeid="4"
votes="1"/>
        </clusternodes>
        <cman/>
        <fencedevices>
                <fencedevice agent="fence_apc"
ipaddr="donald.lab.boston.redhat.com" login="X" name="donald" passwd="X"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>

I restarted the two nodes that remained in the cluster to rule out any possible
problems with lingering cman two_node="1" state.

[root@huey ~]# cman_tool status
Version: 6.0.1
Config Version: 8
Cluster Name: duck
Cluster Id: 1573
Cluster Member: Yes
Cluster Generation: 188
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Quorum: 2  
Active subsystems: 6
Flags: 
Ports Bound: 0  
Node name: huey.lab.boston.redhat.com
Node ID: 2
Multicast addresses: 239.192.6.43 
Node addresses: 192.168.77.141 

[root@huey ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   2   M    184   2007-07-26 12:38:39  huey.lab.boston.redhat.com
   3   M    188   2007-07-26 12:38:39  dewey.lab.boston.redhat.com
   4   X      0                        louey.lab.boston.redhat.com

Comment 1 Ryan McCabe 2007-07-26 16:46:03 UTC
Forgot to mention above, I'm running the 20070725.0 RHEL5.1-Server tree with
cman package cman-2.0.70-1.el5

Comment 3 Ryan O'Hara 2007-07-27 16:36:33 UTC
Did the cluster.conf file on "louey" get updated to match the config file on the
other two nodes? I'm just wondering if the file was correctly updated on the new
node.



Comment 4 Ryan McCabe 2007-07-30 02:37:52 UTC
Nope, the new node never received the updated file, and still has the 
version="1" cluster.conf file.

Comment 5 Ryan O'Hara 2007-08-28 17:11:14 UTC
I'm not sure that this was ever intended to work. I think the best way to add a
node is to add the node to copy the cluster.conf to the new node and start from
there. That said, it seems like this should be doable.

I've added Dave to the CC list for this bug. He had some comments about this
method of adding a node.



Comment 6 Ryan O'Hara 2007-09-04 16:21:52 UTC
Note that this has been reported to work when running a 3 node cluster and
adding a fourth node. Potential that this bug may only exist when going from a 2
node cluster to a 3 node cluster, since the 2 node cluster is a special case.

Comment 7 Ryan O'Hara 2008-07-07 15:54:20 UTC
Is this still an issue for Conga? I'm assuming that Conga actually distributes
the full config file to all nodes rather than create a skeleton config file and
them depend on ccsd to update. That was the solution that was proposed.

If this is stil causing problems, let met know. Otherwise I'll close it.

Comment 8 Ryan O'Hara 2008-09-09 14:45:06 UTC
Since there is a workaround, closing this WONTFIX. Also, ccsd won't be around too much longer, and ricci has ability to distibute cluster.conf file across cluster. All of these are better options.

Comment 9 Ruediger Plueckhahn 2008-10-09 12:20:26 UTC
Hello,
this is still causing problems and breaks cluster administration as described in the manual with luci. I'm using the current packages provided by RHN.