Bug 728230

Summary: cman crashes on startup if cluster name is too long or is not set at all
Product: Red Hat Enterprise Linux 6 Reporter: Christine Caulfield <ccaulfie>
Component: clusterAssignee: Fabio Massimo Di Nitto <fdinitto>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1CC: ccaulfie, cluster-maint, djansa, lhh, mjuricek, rdassen, rpeterso, teigland
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: cluster-3.0.12.1-9.el6 Doc Type: Bug Fix
Doc Text:
Cause: The lack of 2 sanity checks related to the length of cluster name would cause cman to crash at startup. Consequence: cman would crash when starting up Fix: Implemented the correct sanity checks and report proper error as necessary Result: cman does not crash anylonger and inform the users of the incorrect value of cluster name
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 14:52:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 658636    
Attachments:
Description Flags
Patch to add a check on the cluster name length none

Description Christine Caulfield 2011-08-04 13:07:02 UTC
Description of problem:
Cluster names can be a maximum of 15 characters but there seems to be no useful checking in cman in RHEL6. Starting a cluster with an invalid cluster name causes corosync/cman to crash with signal 6.

Version-Release number of selected component (if applicable):
RHEL 6.1

How reproducible:
Every time

Steps to Reproduce:
1. Create a cluster.conf with a long cluster name
2. start cman
  
Actual results:
cman crashes

Expected results:
cman should not crash.

Additional info:
In RHEL5 an error message was printed if the cluster name was too long. This appears not to be the case.

# cman_tool join
corosync died with signal: 6

or:

# cman_tool join -d
Validating configuration
calling '/usr/sbin/ccs_config_validate  '
Configuration validates
Starting /usr/sbin/corosync corosync -f
CMAN_DEBUG=255
COROSYNC_DEFAULT_CONFIG_IFACE=xmlconfig:cmanpreconfig
CMAN_PIPE=4
Aug 04 14:05:56 corosync [MAIN  ] Corosync Cluster Engine ('1.2.3'): started and ready to provide service.
Aug 04 14:05:56 corosync [MAIN  ] Corosync built-in features: nss rdma
Aug 04 14:05:56 corosync [MAIN  ] Successfully read config from /etc/cluster/cluster.conf
Aug 04 14:05:56 corosync [MAIN  ] Successfully parsed cman config
Aug 04 14:05:56 corosync [TOTEM ] Initializing transport (UDP/IP).
Aug 04 14:05:56 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
*** buffer overflow detected ***: corosync terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7fd47460a6a7]
/lib64/libc.so.6(+0xfe5a0)[0x7fd4746085a0]
/usr/libexec/lcrso/service_cman.lcrso(read_cman_config+0xae)[0x7fd46bbf78de]
/usr/libexec/lcrso/service_cman.lcrso(+0x4087)[0x7fd46bbf2087]
corosync(corosync_service_link_and_init+0xf7)[0x408177]
corosync(corosync_service_defaults_link_and_init+0xf1)[0x4084c1]
corosync[0x405e18]
/usr/lib64/libtotem_pg.so.4(main_iface_change_fn+0x10f)[0x7fd4752e2aff]
/usr/lib64/libtotem_pg.so.4(+0xa07a)[0x7fd4752dc07a]
/usr/lib64/libtotem_pg.so.4(poll_run+0x29d)[0x7fd4752d875d]
corosync(main+0x6cb)[0x4056ab]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7fd474528c9d]
corosync[0x404219]
======= Memory map: ========
<snip>

Aug 04 14:05:56 corosync [TOTEM ] The network interface [192.168.1.201] is now up.
Aug 04 14:05:56 corosync [QUORUM] Using quorum provider quorum_cman
Aug 04 14:05:56 corosync [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
forked process ID is 1706
corosync died with signal: 6

Comment 2 Christine Caulfield 2011-08-04 13:58:58 UTC
Created attachment 516709 [details]
Patch to add a check on the cluster name length

Comment 3 Christine Caulfield 2011-08-04 14:00:13 UTC
I should add that a customer has seen this problem, it is not 'internal' or theoretical.

Comment 5 Fabio Massimo Di Nitto 2011-08-04 14:36:25 UTC
http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=03e9af7db105bcfbb7a013974084d2ed171fb258

commit exists upstream, ACK for rhel6.

Comment 6 Fabio Massimo Di Nitto 2011-08-05 08:11:21 UTC
http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=ac195524d4a520b7f5bbd25e01715f4e0aa1ab19

little amendment to the original patch.

Unit test results:

<cluster name="fabbionefabbionefabbionefabbionefabbionefabbionefabbionefabbione" config_version="1">

/etc/init.d/cman start

*** buffer overflow detected ***: corosync terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7fd7b1767127]
/lib64/libc.so.6(+0xf8100)[0x7fd7b1765100]
[yadayada]

apply patches

[root@rhel6-node2 ~]# /etc/init.d/cman start
Starting cluster:
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman... Invalid cluster name. It must be 15 characters or fewer

Unable to get the configuration
Invalid cluster name. It must be 15 characters or fewer

cman_tool: corosync daemon didn't start Check cluster logs for details
                                                           [FAILED]

Comment 7 Fabio Massimo Di Nitto 2011-08-05 09:27:32 UTC
Further testing did show another problem related to the lack of cluster name. Missing cluster name will also cause a crash.

This is the final patch set and unit test results:

http://git.fedorahosted.org/git?p=cluster.git;a=commitdiff;h=1f345b45a5eeaedfcf5c48ac328c32d32d30ac26
http://git.fedorahosted.org/git?p=cluster.git;a=commitdiff;h=79aafcef1dafff42afcc085d55188f495ee3cc54
http://git.fedorahosted.org/git?p=cluster.git;a=commitdiff;h=eecdcabac84dd93abf026fbfdb6f1c850c98fa5b

old packages:

<cluster config_version="1" >

[root@rhel6-node2 ~]# /etc/init.d/cman start
Starting cluster:
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman... /usr/sbin/ccs_config_validate: line 186:  2007 Segmentation fault      (core dumped) ccs_config_dump > $tempfile

Unable to get the configuration

<cluster name="fabbionefabbionefabbionefabbionefabbionefabbione" config_version="1" >

[root@rhel6-node2 ~]# /etc/init.d/cman start
[snip]
*** buffer overflow detected ***: corosync terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7f1edb35b427]
/lib64/libc.so.6(+0xfd310)[0x7f1edb359310]
/usr/libexec/lcrso/service_cman.lcrso(read_cman_config+0xae)[0x7f1ed6a108fe]
/usr/libexec/lcrso/service_cman.lcrso(+0x4077)[0x7f1ed6a0b077]
corosync(corosync_service_link_and_init+0xf7)[0x408e97]
corosync(corosync_service_defaults_link_and_init+0xf1)[0x4091e1]
[snip]

new packages:

<cluster config_version="1" >

[root@rhel6-node2 ~]# /etc/init.d/cman start
Starting cluster:
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman... Unable to determine cluster name.

Unable to get the configuration
Unable to determine cluster name.

cman_tool: corosync daemon didn't start Check cluster logs for details
                                                           [FAILED]

<cluster name="fabbionefabbionefabbionefabbionefabbionefabbione" config_version="1" >

[root@rhel6-node2 ~]# /etc/init.d/cman start
Starting cluster:
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman... Invalid cluster name. It must be 15 characters or fewer

Unable to get the configuration
Invalid cluster name. It must be 15 characters or fewer

cman_tool: corosync daemon didn't start Check cluster logs for details
                                                           [FAILED]

Comment 9 Martin Juricek 2011-08-08 11:20:22 UTC
Verified in version cman-3.0.12.1-9.el6, kernel 2.6.32-131.0.15.el6


1)Cluster name longer than 15 characters:
...
<cluster config_version="1" name="Z_ClusterZ_ClusterZ_Cluster">
...

[root@z2 /]# service cman start
Starting cluster: 
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman... Invalid cluster name. It must be 15 characters or fewer

Unable to get the configuration
Invalid cluster name. It must be 15 characters or fewer

cman_tool: corosync daemon didn't start Check cluster logs for details
                                                           [FAILED]



2) Cluster name not set:
...
<cluster config_version="1">
...

[root@z2 /]# service cman start
Starting cluster: 
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman... Unable to determine cluster name.

Unable to get the configuration
Unable to determine cluster name.

cman_tool: corosync daemon didn't start Check cluster logs for details
                                                           [FAILED]

Comment 10 Fabio Massimo Di Nitto 2011-10-27 08:22:11 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: The lack of 2 sanity checks related to the length of cluster name would cause cman to crash at startup.
Consequence: cman would crash when starting up
Fix: Implemented the correct sanity checks and report proper error as necessary
Result: cman does not crash anylonger and inform the users of the incorrect value of cluster name

Comment 11 errata-xmlrpc 2011-12-06 14:52:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1516.html