Hide Forgot
Description of problem: Cluster names can be a maximum of 15 characters but there seems to be no useful checking in cman in RHEL6. Starting a cluster with an invalid cluster name causes corosync/cman to crash with signal 6. Version-Release number of selected component (if applicable): RHEL 6.1 How reproducible: Every time Steps to Reproduce: 1. Create a cluster.conf with a long cluster name 2. start cman Actual results: cman crashes Expected results: cman should not crash. Additional info: In RHEL5 an error message was printed if the cluster name was too long. This appears not to be the case. # cman_tool join corosync died with signal: 6 or: # cman_tool join -d Validating configuration calling '/usr/sbin/ccs_config_validate ' Configuration validates Starting /usr/sbin/corosync corosync -f CMAN_DEBUG=255 COROSYNC_DEFAULT_CONFIG_IFACE=xmlconfig:cmanpreconfig CMAN_PIPE=4 Aug 04 14:05:56 corosync [MAIN ] Corosync Cluster Engine ('1.2.3'): started and ready to provide service. Aug 04 14:05:56 corosync [MAIN ] Corosync built-in features: nss rdma Aug 04 14:05:56 corosync [MAIN ] Successfully read config from /etc/cluster/cluster.conf Aug 04 14:05:56 corosync [MAIN ] Successfully parsed cman config Aug 04 14:05:56 corosync [TOTEM ] Initializing transport (UDP/IP). Aug 04 14:05:56 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). *** buffer overflow detected ***: corosync terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x7fd47460a6a7] /lib64/libc.so.6(+0xfe5a0)[0x7fd4746085a0] /usr/libexec/lcrso/service_cman.lcrso(read_cman_config+0xae)[0x7fd46bbf78de] /usr/libexec/lcrso/service_cman.lcrso(+0x4087)[0x7fd46bbf2087] corosync(corosync_service_link_and_init+0xf7)[0x408177] corosync(corosync_service_defaults_link_and_init+0xf1)[0x4084c1] corosync[0x405e18] /usr/lib64/libtotem_pg.so.4(main_iface_change_fn+0x10f)[0x7fd4752e2aff] /usr/lib64/libtotem_pg.so.4(+0xa07a)[0x7fd4752dc07a] /usr/lib64/libtotem_pg.so.4(poll_run+0x29d)[0x7fd4752d875d] corosync(main+0x6cb)[0x4056ab] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fd474528c9d] corosync[0x404219] ======= Memory map: ======== <snip> Aug 04 14:05:56 corosync [TOTEM ] The network interface [192.168.1.201] is now up. Aug 04 14:05:56 corosync [QUORUM] Using quorum provider quorum_cman Aug 04 14:05:56 corosync [SERV ] Service engine loaded: corosync cluster quorum service v0.1 forked process ID is 1706 corosync died with signal: 6
Created attachment 516709 [details] Patch to add a check on the cluster name length
I should add that a customer has seen this problem, it is not 'internal' or theoretical.
http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=03e9af7db105bcfbb7a013974084d2ed171fb258 commit exists upstream, ACK for rhel6.
http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=ac195524d4a520b7f5bbd25e01715f4e0aa1ab19 little amendment to the original patch. Unit test results: <cluster name="fabbionefabbionefabbionefabbionefabbionefabbionefabbionefabbione" config_version="1"> /etc/init.d/cman start *** buffer overflow detected ***: corosync terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x7fd7b1767127] /lib64/libc.so.6(+0xf8100)[0x7fd7b1765100] [yadayada] apply patches [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Invalid cluster name. It must be 15 characters or fewer Unable to get the configuration Invalid cluster name. It must be 15 characters or fewer cman_tool: corosync daemon didn't start Check cluster logs for details [FAILED]
Further testing did show another problem related to the lack of cluster name. Missing cluster name will also cause a crash. This is the final patch set and unit test results: http://git.fedorahosted.org/git?p=cluster.git;a=commitdiff;h=1f345b45a5eeaedfcf5c48ac328c32d32d30ac26 http://git.fedorahosted.org/git?p=cluster.git;a=commitdiff;h=79aafcef1dafff42afcc085d55188f495ee3cc54 http://git.fedorahosted.org/git?p=cluster.git;a=commitdiff;h=eecdcabac84dd93abf026fbfdb6f1c850c98fa5b old packages: <cluster config_version="1" > [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... /usr/sbin/ccs_config_validate: line 186: 2007 Segmentation fault (core dumped) ccs_config_dump > $tempfile Unable to get the configuration <cluster name="fabbionefabbionefabbionefabbionefabbionefabbione" config_version="1" > [root@rhel6-node2 ~]# /etc/init.d/cman start [snip] *** buffer overflow detected ***: corosync terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x7f1edb35b427] /lib64/libc.so.6(+0xfd310)[0x7f1edb359310] /usr/libexec/lcrso/service_cman.lcrso(read_cman_config+0xae)[0x7f1ed6a108fe] /usr/libexec/lcrso/service_cman.lcrso(+0x4077)[0x7f1ed6a0b077] corosync(corosync_service_link_and_init+0xf7)[0x408e97] corosync(corosync_service_defaults_link_and_init+0xf1)[0x4091e1] [snip] new packages: <cluster config_version="1" > [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Unable to determine cluster name. Unable to get the configuration Unable to determine cluster name. cman_tool: corosync daemon didn't start Check cluster logs for details [FAILED] <cluster name="fabbionefabbionefabbionefabbionefabbionefabbione" config_version="1" > [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Invalid cluster name. It must be 15 characters or fewer Unable to get the configuration Invalid cluster name. It must be 15 characters or fewer cman_tool: corosync daemon didn't start Check cluster logs for details [FAILED]
Verified in version cman-3.0.12.1-9.el6, kernel 2.6.32-131.0.15.el6 1)Cluster name longer than 15 characters: ... <cluster config_version="1" name="Z_ClusterZ_ClusterZ_Cluster"> ... [root@z2 /]# service cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Invalid cluster name. It must be 15 characters or fewer Unable to get the configuration Invalid cluster name. It must be 15 characters or fewer cman_tool: corosync daemon didn't start Check cluster logs for details [FAILED] 2) Cluster name not set: ... <cluster config_version="1"> ... [root@z2 /]# service cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Unable to determine cluster name. Unable to get the configuration Unable to determine cluster name. cman_tool: corosync daemon didn't start Check cluster logs for details [FAILED]
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: The lack of 2 sanity checks related to the length of cluster name would cause cman to crash at startup. Consequence: cman would crash when starting up Fix: Implemented the correct sanity checks and report proper error as necessary Result: cman does not crash anylonger and inform the users of the incorrect value of cluster name
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1516.html