Hide Forgot
Description of problem: When a node in a cluster is rebooted and the running version of cluster.conf changes, the newest version does not get transferred and the node cannot join the cluster, or sometimes creates its own 1-node cluster. Version-Release number of selected component (if applicable): RHEL 6.1 cman-3.0.12-41.el6_1.1 clusterlib-3.0.12-41.el6_1.1 How reproducible: Easily Steps to Reproduce: 1. In a two nodes cluster, switch off node 2. 2. Increase the version of CMAN in node 1 and update with "cman_tool version -r". 3. Boot node 2. Actual results: In node 2 we see the following messages: Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [CMAN ] CMAN 3.0.12 (built Jul 11 2011 04:18:42) started Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [SERV ] Service engine loaded: corosync CMAN membership service 2.90 Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [SERV ] Service engine loaded: openais checkpoint service B.01.01 Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [SERV ] Service engine loaded: corosync extended virtual synchrony service Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [SERV ] Service engine loaded: corosync configuration service Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [SERV ] Service engine loaded: corosync cluster config database access v1.01 Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [SERV ] Service engine loaded: corosync profile loading service Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [QUORUM] Using quorum provider quorum_cman Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [CMAN ] quorum regained, resuming activity Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [QUORUM] This node is within the primary component and will provide service. Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [QUORUM] Members[1]: 1 Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [QUORUM] Members[1]: 1 Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [CPG ] downlist received left_list: 0 Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [CPG ] chosen downlist from node r(0) ip(192.168.122.120) Aug 26 15:38:28 jc_rhcs6_B corosync[1217]: [MAIN ] Completed service synchronization, ready to provide service. Aug 26 15:38:29 jc_rhcs6_B corosync[1217]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Aug 26 15:38:29 jc_rhcs6_B corosync[1217]: [CMAN ] Unable to load new config in corosync: New configuration version has to be newer than current running configuration Aug 26 15:38:29 jc_rhcs6_B corosync[1217]: [CMAN ] Can't get updated config version 26: New configuration version has to be newer than current running configuration#012. Aug 26 15:38:29 jc_rhcs6_B corosync[1217]: [CMAN ] Activity suspended on this node Aug 26 15:38:29 jc_rhcs6_B corosync[1217]: [CMAN ] Error reloading the configuration, will retry every second Aug 26 15:38:29 jc_rhcs6_B corosync[1217]: [CMAN ] Node 2 conflict, remote config version id=26, local=20 Aug 26 15:38:29 jc_rhcs6_B corosync[1217]: [CPG ] downlist received left_list: 0 Aug 26 15:38:29 jc_rhcs6_B corosync[1217]: [CPG ] downlist received left_list: 0 Aug 26 15:38:29 jc_rhcs6_B corosync[1217]: [CPG ] chosen downlist from node r(0) ip(192.168.122.120) Aug 26 15:38:29 jc_rhcs6_B corosync[1217]: [MAIN ] Completed service synchronization, ready to provide service. Aug 26 15:38:30 jc_rhcs6_B corosync[1217]: [CMAN ] Unable to load new config in corosync: New configuration version has to be newer than current running configuration Aug 26 15:38:30 jc_rhcs6_B corosync[1217]: [CMAN ] Can't get updated config version 26: New configuration version has to be newer than current running configuration#012. Aug 26 15:38:30 jc_rhcs6_B corosync[1217]: [CMAN ] Activity suspended on this node Aug 26 15:38:30 jc_rhcs6_B corosync[1217]: [CMAN ] Error reloading the configuration, will retry every second Aug 26 15:38:31 jc_rhcs6_B corosync[1217]: [CMAN ] Unable to load new config in corosync: New configuration version has to be newer than current running configuration Aug 26 15:38:31 jc_rhcs6_B corosync[1217]: [CMAN ] Can't get updated config version 26: New configuration version has to be newer than current running configuration#012. Aug 26 15:38:31 jc_rhcs6_B corosync[1217]: [CMAN ] Activity suspended on this node Aug 26 15:38:31 jc_rhcs6_B corosync[1217]: [CMAN ] Error reloading the configuration, will retry every second Aug 26 15:38:32 jc_rhcs6_B corosync[1217]: [CMAN ] Unable to load new config in corosync: New configuration version has to be newer than current running configuration Aug 26 15:38:32 jc_rhcs6_B corosync[1217]: [CMAN ] Can't get updated config version 26: New configuration version has to be newer than current running configuration#012. Aug 26 15:38:32 jc_rhcs6_B corosync[1217]: [CMAN ] Activity suspended on this node Aug 26 15:38:32 jc_rhcs6_B corosync[1217]: [CMAN ] Error reloading the configuration, will retry every second Aug 26 15:38:33 jc_rhcs6_B corosync[1217]: [CMAN ] Unable to load new config in corosync: New configuration version has to be newer than current running configuration Aug 26 15:38:33 jc_rhcs6_B corosync[1217]: [CMAN ] Can't get updated config version 26: New configuration version has to be newer than current running configuration#012. Aug 26 15:38:33 jc_rhcs6_B corosync[1217]: [CMAN ] Activity suspended on this node Aug 26 15:38:33 jc_rhcs6_B corosync[1217]: [CMAN ] Error reloading the configuration, will retry every second Node 2 creates a 1-node cluster: # cman_tool status Version: 6.2.0 Config Version: 20 Cluster Name: cluster Cluster Id: 63628 Cluster Member: Yes Cluster Generation: 376 Membership state: Cluster-Member Nodes: 1 Expected votes: 1 Total votes: 1 Node votes: 1 Quorum: 1 Active subsystems: 7 Flags: 2node Error Ports Bound: 0 Node name: jc_rhcs6_B Node ID: 1 Multicast addresses: 239.192.248.133 Node addresses: 192.168.122.120 # cman_tool nodes Node Sts Inc Joined Name 1 M 372 2011-08-26 15:38:28 jc_rhcs6_B 2 X 0 jc_rhcs6_A While node 1 became a 2-node cluster: # cman_tool status Version: 6.2.0 Config Version: 26 Cluster Name: cluster Cluster Id: 63628 Cluster Member: Yes Cluster Generation: 376 Membership state: Cluster-Member Nodes: 2 Expected votes: 1 Total votes: 2 Node votes: 1 Quorum: 1 Active subsystems: 7 Flags: 2node Ports Bound: 0 Node name: jc_rhcs6_A Node ID: 2 Multicast addresses: 239.192.248.133 Node addresses: 192.168.122.117 # cman_tool nodes Node Sts Inc Joined Name 1 M 376 2011-08-26 15:38:28 jc_rhcs6_B 2 M 356 2011-08-26 15:10:32 jc_rhcs6_A Expected results: I expected that if the version that is being used in the running cluster is newer than the one the rebooted node has, it will be copied, or at least it won't create its own cluster. Are my expectations wrong? Additional info: A similar bug was already reported on bz 680155 but I don't know if both are completely related. The resolution was in errata http://rhn.redhat.com/errata/RHBA-2011-0537.html and involved upgrading to: cman-3.0.12-41.el6 clusterlib-3.0.12-41.el6 But as you can see above I have newer versions running in my test. Let me know if you need other data.
(In reply to comment #0) > Description of problem: > When a node in a cluster is rebooted and the running version of cluster.conf > changes, the newest version does not get transferred and the node cannot join > the cluster, or sometimes creates its own 1-node cluster. In RHEL6 this behaviour is by design. If a configuration is wrong is wrong and needs to be fixed. This is no different than expecting any other daemon on a system to fix their configuration automatically if it's wrong. In short, after lengthy discussion with Chrissie, synchronizing the configuration at startup is a very complex operation that has so many path to failures that is not worth even considering.