Bug 695795
Summary: | Do not ignore 'transport' if 'totem' node exists | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Fabio Massimo Di Nitto <fdinitto> |
Component: | cluster | Assignee: | Fabio Massimo Di Nitto <fdinitto> |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 6.2 | CC: | agk, bubble, ccaulfie, cfeist, cluster-maint, djansa, donhoover, fdinitto, jkortus, lhh, rpeterso, sdake, swhiteho, teigland |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | cluster-3.0.12.1-11.el6 | Doc Type: | Bug Fix |
Doc Text: |
Cause: cman implements a complex set of checks to configure totem. One of the checks, that copies the configuration data was not correct
Consequence: the transport protocol option was not handled correctly
Fix: change cman copy and checks to handle transport correctly
Result: cman now handles the transport option properly
|
Story Points: | --- |
Clone Of: | 689128 | Environment: | |
Last Closed: | 2011-12-06 14:51:48 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 689128 | ||
Bug Blocks: | 695794 |
Description
Fabio Massimo Di Nitto
2011-04-12 17:49:05 UTC
*** Bug 695794 has been marked as a duplicate of this bug. *** http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=da36fb6bc9e7a8908011e41ef78235f6c0160ca6 Unit test results: pre patch: 1) normal startup <cman/> <totem/> [root@rhel6-node2 ~]# /etc/init.d/cman start [all good] grep -i transport /var/log/cluster/corosync.log Aug 08 11:01:29 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). OK 2) specify transport in <totem.. <cman/> <totem transport="udpu"/> [root@rhel6-node2 ~]# ccs_config_validate Relax-NG validity error : Extra element totem in interleave tempfile:5: element totem: Relax-NG validity error : Element cluster failed to validate content Configuration fails to validate [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cman... Relax-NG validity error : Extra element totem in interleave tempfile:5: element totem: Relax-NG validity error : Element cluster failed to validate content Configuration fails to validate [fail to start after timeout] corosync process will be hanging in background 3) specify transport in cman <cman transport="udpu"/> <totem/> [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Aug 08 11:06:47 corosync [MAIN ] Corosync Cluster Engine ('1.3.2'): started and ready to provide service. Aug 08 11:06:47 corosync [MAIN ] Corosync built-in features: nss rdma Aug 08 11:06:47 corosync [MAIN ] Successfully read config from /etc/cluster/cluster.conf Aug 08 11:06:47 corosync [MAIN ] Successfully parsed cman config Aug 08 11:06:47 corosync [MAIN ] Successfully configured openais services to load Aug 08 11:06:47 corosync [MAIN ] parse error in config: No multicast address specified Aug 08 11:06:47 corosync [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1665. corosync died: Could not read cluster configuration Check cluster logs for details [FAILED] 4) broadcast/udpb (not support in rhel6, but check anyway for regression): <cman broadcast="yes"/> [root@rhel6-node2 ~]# ccs_config_validate tempfile:4: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xF0 0xB3 0xAB 0x22 <cman transport="udp" broadcast="-ð³«" nodename="rhel6-node2" cluster_id="25 hit a memory corruptor post patch: 1) normal startup <cman/> <totem/> [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] Starting qdiskd... [ OK ] Waiting for quorum... [ OK ] Starting fenced... [ OK ] Starting dlm_controld... [ OK ] Starting gfs_controld: [ OK ] Unfencing self... [ OK ] Joining fence domain... [ OK ] [root@rhel6-node2 daemon]# grep -i transport /var/log/cluster/corosync.log Aug 08 11:09:02 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). 2) specify transport in <totem.. <cman/> <totem transport="udpu"/> [root@rhel6-node2 ~]# ccs_config_validate Transport should not be specified within <totem .../>, use <cman transport="..." /> instead Unable to get the configuration [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Transport should not be specified within <totem .../>, use <cman transport="..." /> instead Unable to get the configuration corosync [MAIN ] Corosync Cluster Engine ('1.3.2'): started and ready to provide service. corosync [MAIN ] Corosync built-in features: nss rdma corosync [MAIN ] Successfully read config from /etc/cluster/cluster.conf corosync [MAIN ] Transport should not be specified within <totem .../>, use <cman transport="..." /> instead corosync [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1616. Transport should not be specified within <totem .../>, use <cman transport="..." /> instead cman_tool: corosync daemon didn't start Check cluster logs for details [FAILED] 3) specify transport in both cman and totem: <cman transport="udpu"/> <totem transport="udpu"/> [root@rhel6-node2 ~]# ccs_config_validate Transport should not be specified within <totem .../>, use <cman transport="..." /> instead Unable to get the configuration [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Transport should not be specified within <totem .../>, use <cman transport="..." /> instead Unable to get the configuration corosync [MAIN ] Corosync Cluster Engine ('1.3.2'): started and ready to provide service. corosync [MAIN ] Corosync built-in features: nss rdma corosync [MAIN ] Successfully read config from /etc/cluster/cluster.conf corosync [MAIN ] Transport should not be specified within <totem .../>, use <cman transport="..." /> instead corosync [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1616. Transport should not be specified within <totem .../>, use <cman transport="..." /> instead cman_tool: corosync daemon didn't start Check cluster logs for details [FAILED] 4) specify transport only in cman: <cman transport="udpu"/> <totem/> [root@rhel6-node2 ~]# ccs_config_validate Configuration validates [root@rhel6-node2 ~]# /etc/init.d/cman start [all good] [root@rhel6-node2 daemon]# grep -i transport /var/log/cluster/corosync.log Aug 08 11:12:33 corosync [TOTEM ] Initializing transport (UDP/IP Unicast). value is applied correctly 5) quick check for regressions <cman transport="udp"/> <totem/> [root@rhel6-node2 daemon]# grep -i transport /var/log/cluster/corosync.log Aug 08 11:13:29 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). <cman transport="udpu"/> (note drop totem config bits) [root@rhel6-node2 daemon]# grep -i transport /var/log/cluster/corosync.log Aug 08 11:14:23 corosync [TOTEM ] Initializing transport (UDP/IP Unicast). <cman transport="udpb"/> [root@rhel6-node2 daemon]# grep -i transport /var/log/cluster/corosync.log Aug 08 11:19:59 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). Multicast addresses: 255.255.255.255 <cman broadcast="yes"/> [root@rhel6-node2 daemon]# grep -i transport /var/log/cluster/corosync.log Aug 08 11:21:12 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). Multicast addresses: 255.255.255.255 http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=90182b28490ec8383e2d45dbab8c23a1d1420bdc post patch: All steps have been verified with cman_tool status and checking corosync.log for correct configs. Traffic flow over altname have been checked with tcpdump and in a couple of cases (but unnecessary to this test) downing/up'ing intefaces. 1) normal startup no <cman> no <totem> no <altname> [root@clusternet-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] [snip] [root@clusternet-node2 ~]# cman_tool status Version: 6.2.0 Config Version: 2 Cluster Name: fabbione Cluster Id: 25573 Cluster Member: Yes Cluster Generation: 72 Membership state: Cluster-Member Nodes: 2 Expected votes: 2 Quorum device votes: 1 Total votes: 3 Node votes: 1 Quorum: 2 Active subsystems: 7 Flags: Ports Bound: 0 178 Node name: clusternet-node2-eth1 Node ID: 2 Multicast addresses: 239.192.99.73 Node addresses: 192.168.4.2 [root@clusternet-node2 ~]# grep -i transport /var/log/cluster/corosync.log Aug 18 09:35:52 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). 2) specify transport in <totem.. <cman/> <totem transport="udpu"/> [root@clusternet-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Transport should not be specified within <totem .../>, use <cman transport="..." /> instead Unable to get the configuration 3) specify transport in both cman and totem: <cman transport="udpu"/> <totem transport="udpu"/> [root@clusternet-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Transport should not be specified within <totem .../>, use <cman transport="..." /> instead Unable to get the configuration 4) specify transport only in cman: <cman transport="udpu"/> <totem/> [root@clusternet-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] [snip] 5) do not specify totem at all <cman transport="udpu"/> [root@clusternet-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] [root@clusternet-node2 ~]# grep -i transport /var/log/cluster/corosync.log Aug 18 09:40:56 corosync [TOTEM ] Initializing transport (UDP/IP Unicast). 6) configure altname no totem/no cman/no transport: <clusternode name="clusternet-node1-eth1" votes="1" nodeid="1"> <altname name="clusternet-node1-eth2"/> [root@clusternet-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] [root@clusternet-node2 ~]# ccs_config_validate Configuration validates 7) add totem transport <totem transport="udpu"/> [root@clusternet-node2 ~]# ccs_config_validate Transport should not be specified within <totem .../>, use <cman transport="..." /> instead Unable to get the configuration [root@clusternet-node2 ~]# /etc/init.d/cman start Starting cman... Transport should not be specified within <totem .../>, use <cman transport="..." /> instead Unable to get the configuration 8) switch to cman transport <cman transport="udpu"/> [root@clusternet-node2 ~]# ccs_config_validate Configuration validates [root@clusternet-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] Fabio, did you test actually using the <TOTEM> section to change a few of the timing settings such as token, token_retransmits_before_loss_const, join, consensus...and make sure they were properly applied to the configuration when using UDPU? That was the problem I had in 6.1, I had to take out the <totem> section so I could no longer modify any of the defaults for the various values you can put into the totem section to modify quorum behavior. (In reply to comment #9) > Fabio, did you test actually using the <TOTEM> section to change a few of the > timing settings such as token, token_retransmits_before_loss_const, join, > consensus...and make sure they were properly applied to the configuration when > using UDPU? > > That was the problem I had in 6.1, I had to take out the <totem> section so I > could no longer modify any of the defaults for the various values you can put > into the totem section to modify quorum behavior. This patch addresses exactly the problem you had. token values here are random just to make it easier to spot them in the config/objdb. The code that copies the values into totem is the same for every parameter, I am using token as one of them. Adding extra unit test cases per customer request: 9) specify totem token (no cman) <totem token="6000"/> [root@rhel6-node2 ~]# ccs_config_validate Configuration validates [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] [snip] [root@rhel6-node2 cluster]# corosync-objctl |grep totem |grep token cluster.totem.token=6000 totem.token=6000 10) specify token and transport <cman transport="udpu"/> <totem token="16000"/> [root@rhel6-node2 ~]# ccs_config_validate Configuration validates [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] [snip] [root@rhel6-node2 cluster]# corosync-objctl |grep totem |grep token cluster.totem.token=16000 totem.token=16000 [root@rhel6-node2 cluster]# corosync-objctl |grep totem |grep transport totem.transport=udpu Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: cman implements a complex set of checks to configure totem. One of the checks, that copies the configuration data was not correct Consequence: the transport protocol option was not handled correctly Fix: change cman copy and checks to handle transport correctly Result: cman now handles the transport option properly Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1516.html |