Hide Forgot
+++ This bug was initially created as a clone of Bug #689128 +++ Attached patch fixes typo in code which leads to <cman transport="..." > is ignored if <totem /> XML node is present. --- Additional comment from bubble on 2011-03-19 13:52:39 EDT --- Created attachment 486397 [details] Fix typo --- Additional comment from fdinitto on 2011-03-22 05:01:22 EDT --- Hi Vladislav, in principle the patch is correct, but cannot be applied as is and needs some more work. <cluster> <cman transport="...."/> <totem transport="...."/> In this case the patch should take care to check and either report an error that only one can be specified or eventually apply a bigger hammer and say: cman config has higher priority than totem and take appropriate action. Basically it needs a failsafe for bad configs. Thanks Fabio --- Additional comment from bubble on 2011-03-22 05:09:14 EDT --- Will <totem transport="...."/> pass validation? --- Additional comment from fdinitto on 2011-03-22 05:25:13 EDT --- (In reply to comment #3) > Will <totem transport="...."/> pass validation? even if it doesn´t, validation can always be turned off or set to warning. It´s a matter of trying to be resilient to user errors and make sure expectations are met. If you specify both, which one should win? etc.. --- Additional comment from bubble on 2011-03-23 09:54:42 EDT --- Created attachment 487042 [details] 2nd version of patch Hi Fabio, attached should be close to what you've requested. Best, Vladislav --- Additional comment from fdinitto on 2011-03-23 10:58:56 EDT --- (In reply to comment #5) > Created attachment 487042 [details] > 2nd version of patch > > Hi Fabio, > > attached should be close to what you've requested. > > Best, > Vladislav Hi Vladislav, at a first glance the patch looks Ok. I'll need to test it before I merge it upstream. Thanks a lot for your work! Fabio --- Additional comment from fdinitto on 2011-03-28 08:12:28 EDT --- Hi Vladislav, there is a substantial error in the patch. transport is a key to totem and not an object underneath totem. So basically this will never work. Also your patch triggers the error only when cman and totem transport are specified but it should provide always an error path when specified in totem.
*** Bug 695794 has been marked as a duplicate of this bug. ***
http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=da36fb6bc9e7a8908011e41ef78235f6c0160ca6 Unit test results: pre patch: 1) normal startup <cman/> <totem/> [root@rhel6-node2 ~]# /etc/init.d/cman start [all good] grep -i transport /var/log/cluster/corosync.log Aug 08 11:01:29 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). OK 2) specify transport in <totem.. <cman/> <totem transport="udpu"/> [root@rhel6-node2 ~]# ccs_config_validate Relax-NG validity error : Extra element totem in interleave tempfile:5: element totem: Relax-NG validity error : Element cluster failed to validate content Configuration fails to validate [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cman... Relax-NG validity error : Extra element totem in interleave tempfile:5: element totem: Relax-NG validity error : Element cluster failed to validate content Configuration fails to validate [fail to start after timeout] corosync process will be hanging in background 3) specify transport in cman <cman transport="udpu"/> <totem/> [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Aug 08 11:06:47 corosync [MAIN ] Corosync Cluster Engine ('1.3.2'): started and ready to provide service. Aug 08 11:06:47 corosync [MAIN ] Corosync built-in features: nss rdma Aug 08 11:06:47 corosync [MAIN ] Successfully read config from /etc/cluster/cluster.conf Aug 08 11:06:47 corosync [MAIN ] Successfully parsed cman config Aug 08 11:06:47 corosync [MAIN ] Successfully configured openais services to load Aug 08 11:06:47 corosync [MAIN ] parse error in config: No multicast address specified Aug 08 11:06:47 corosync [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1665. corosync died: Could not read cluster configuration Check cluster logs for details [FAILED] 4) broadcast/udpb (not support in rhel6, but check anyway for regression): <cman broadcast="yes"/> [root@rhel6-node2 ~]# ccs_config_validate tempfile:4: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xF0 0xB3 0xAB 0x22 <cman transport="udp" broadcast="-ð³«" nodename="rhel6-node2" cluster_id="25 hit a memory corruptor post patch: 1) normal startup <cman/> <totem/> [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] Starting qdiskd... [ OK ] Waiting for quorum... [ OK ] Starting fenced... [ OK ] Starting dlm_controld... [ OK ] Starting gfs_controld: [ OK ] Unfencing self... [ OK ] Joining fence domain... [ OK ] [root@rhel6-node2 daemon]# grep -i transport /var/log/cluster/corosync.log Aug 08 11:09:02 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). 2) specify transport in <totem.. <cman/> <totem transport="udpu"/> [root@rhel6-node2 ~]# ccs_config_validate Transport should not be specified within <totem .../>, use <cman transport="..." /> instead Unable to get the configuration [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Transport should not be specified within <totem .../>, use <cman transport="..." /> instead Unable to get the configuration corosync [MAIN ] Corosync Cluster Engine ('1.3.2'): started and ready to provide service. corosync [MAIN ] Corosync built-in features: nss rdma corosync [MAIN ] Successfully read config from /etc/cluster/cluster.conf corosync [MAIN ] Transport should not be specified within <totem .../>, use <cman transport="..." /> instead corosync [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1616. Transport should not be specified within <totem .../>, use <cman transport="..." /> instead cman_tool: corosync daemon didn't start Check cluster logs for details [FAILED] 3) specify transport in both cman and totem: <cman transport="udpu"/> <totem transport="udpu"/> [root@rhel6-node2 ~]# ccs_config_validate Transport should not be specified within <totem .../>, use <cman transport="..." /> instead Unable to get the configuration [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Transport should not be specified within <totem .../>, use <cman transport="..." /> instead Unable to get the configuration corosync [MAIN ] Corosync Cluster Engine ('1.3.2'): started and ready to provide service. corosync [MAIN ] Corosync built-in features: nss rdma corosync [MAIN ] Successfully read config from /etc/cluster/cluster.conf corosync [MAIN ] Transport should not be specified within <totem .../>, use <cman transport="..." /> instead corosync [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1616. Transport should not be specified within <totem .../>, use <cman transport="..." /> instead cman_tool: corosync daemon didn't start Check cluster logs for details [FAILED] 4) specify transport only in cman: <cman transport="udpu"/> <totem/> [root@rhel6-node2 ~]# ccs_config_validate Configuration validates [root@rhel6-node2 ~]# /etc/init.d/cman start [all good] [root@rhel6-node2 daemon]# grep -i transport /var/log/cluster/corosync.log Aug 08 11:12:33 corosync [TOTEM ] Initializing transport (UDP/IP Unicast). value is applied correctly 5) quick check for regressions <cman transport="udp"/> <totem/> [root@rhel6-node2 daemon]# grep -i transport /var/log/cluster/corosync.log Aug 08 11:13:29 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). <cman transport="udpu"/> (note drop totem config bits) [root@rhel6-node2 daemon]# grep -i transport /var/log/cluster/corosync.log Aug 08 11:14:23 corosync [TOTEM ] Initializing transport (UDP/IP Unicast). <cman transport="udpb"/> [root@rhel6-node2 daemon]# grep -i transport /var/log/cluster/corosync.log Aug 08 11:19:59 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). Multicast addresses: 255.255.255.255 <cman broadcast="yes"/> [root@rhel6-node2 daemon]# grep -i transport /var/log/cluster/corosync.log Aug 08 11:21:12 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). Multicast addresses: 255.255.255.255
http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=90182b28490ec8383e2d45dbab8c23a1d1420bdc post patch: All steps have been verified with cman_tool status and checking corosync.log for correct configs. Traffic flow over altname have been checked with tcpdump and in a couple of cases (but unnecessary to this test) downing/up'ing intefaces. 1) normal startup no <cman> no <totem> no <altname> [root@clusternet-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] [snip] [root@clusternet-node2 ~]# cman_tool status Version: 6.2.0 Config Version: 2 Cluster Name: fabbione Cluster Id: 25573 Cluster Member: Yes Cluster Generation: 72 Membership state: Cluster-Member Nodes: 2 Expected votes: 2 Quorum device votes: 1 Total votes: 3 Node votes: 1 Quorum: 2 Active subsystems: 7 Flags: Ports Bound: 0 178 Node name: clusternet-node2-eth1 Node ID: 2 Multicast addresses: 239.192.99.73 Node addresses: 192.168.4.2 [root@clusternet-node2 ~]# grep -i transport /var/log/cluster/corosync.log Aug 18 09:35:52 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). 2) specify transport in <totem.. <cman/> <totem transport="udpu"/> [root@clusternet-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Transport should not be specified within <totem .../>, use <cman transport="..." /> instead Unable to get the configuration 3) specify transport in both cman and totem: <cman transport="udpu"/> <totem transport="udpu"/> [root@clusternet-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Transport should not be specified within <totem .../>, use <cman transport="..." /> instead Unable to get the configuration 4) specify transport only in cman: <cman transport="udpu"/> <totem/> [root@clusternet-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] [snip] 5) do not specify totem at all <cman transport="udpu"/> [root@clusternet-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] [root@clusternet-node2 ~]# grep -i transport /var/log/cluster/corosync.log Aug 18 09:40:56 corosync [TOTEM ] Initializing transport (UDP/IP Unicast). 6) configure altname no totem/no cman/no transport: <clusternode name="clusternet-node1-eth1" votes="1" nodeid="1"> <altname name="clusternet-node1-eth2"/> [root@clusternet-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] [root@clusternet-node2 ~]# ccs_config_validate Configuration validates 7) add totem transport <totem transport="udpu"/> [root@clusternet-node2 ~]# ccs_config_validate Transport should not be specified within <totem .../>, use <cman transport="..." /> instead Unable to get the configuration [root@clusternet-node2 ~]# /etc/init.d/cman start Starting cman... Transport should not be specified within <totem .../>, use <cman transport="..." /> instead Unable to get the configuration 8) switch to cman transport <cman transport="udpu"/> [root@clusternet-node2 ~]# ccs_config_validate Configuration validates [root@clusternet-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ]
Fabio, did you test actually using the <TOTEM> section to change a few of the timing settings such as token, token_retransmits_before_loss_const, join, consensus...and make sure they were properly applied to the configuration when using UDPU? That was the problem I had in 6.1, I had to take out the <totem> section so I could no longer modify any of the defaults for the various values you can put into the totem section to modify quorum behavior.
(In reply to comment #9) > Fabio, did you test actually using the <TOTEM> section to change a few of the > timing settings such as token, token_retransmits_before_loss_const, join, > consensus...and make sure they were properly applied to the configuration when > using UDPU? > > That was the problem I had in 6.1, I had to take out the <totem> section so I > could no longer modify any of the defaults for the various values you can put > into the totem section to modify quorum behavior. This patch addresses exactly the problem you had. token values here are random just to make it easier to spot them in the config/objdb. The code that copies the values into totem is the same for every parameter, I am using token as one of them. Adding extra unit test cases per customer request: 9) specify totem token (no cman) <totem token="6000"/> [root@rhel6-node2 ~]# ccs_config_validate Configuration validates [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] [snip] [root@rhel6-node2 cluster]# corosync-objctl |grep totem |grep token cluster.totem.token=6000 totem.token=6000 10) specify token and transport <cman transport="udpu"/> <totem token="16000"/> [root@rhel6-node2 ~]# ccs_config_validate Configuration validates [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] [snip] [root@rhel6-node2 cluster]# corosync-objctl |grep totem |grep token cluster.totem.token=16000 totem.token=16000 [root@rhel6-node2 cluster]# corosync-objctl |grep totem |grep transport totem.transport=udpu
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: cman implements a complex set of checks to configure totem. One of the checks, that copies the configuration data was not correct Consequence: the transport protocol option was not handled correctly Fix: change cman copy and checks to handle transport correctly Result: cman now handles the transport option properly
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1516.html