Bug 728230 - cman crashes on startup if cluster name is too long or is not set at all
cman crashes on startup if cluster name is too long or is not set at all
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: cluster (Show other bugs)
6.1
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: Fabio Massimo Di Nitto
Cluster QE
:
Depends On:
Blocks: GSS_6_2_PROPOSED
  Show dependency treegraph
 
Reported: 2011-08-04 09:07 EDT by Christine Caulfield
Modified: 2011-12-06 09:52 EST (History)
8 users (show)

See Also:
Fixed In Version: cluster-3.0.12.1-9.el6
Doc Type: Bug Fix
Doc Text:
Cause: The lack of 2 sanity checks related to the length of cluster name would cause cman to crash at startup. Consequence: cman would crash when starting up Fix: Implemented the correct sanity checks and report proper error as necessary Result: cman does not crash anylonger and inform the users of the incorrect value of cluster name
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-12-06 09:52:43 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to add a check on the cluster name length (665 bytes, patch)
2011-08-04 09:58 EDT, Christine Caulfield
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Legacy) 62056 None None None Never

  None (edit)
Description Christine Caulfield 2011-08-04 09:07:02 EDT
Description of problem:
Cluster names can be a maximum of 15 characters but there seems to be no useful checking in cman in RHEL6. Starting a cluster with an invalid cluster name causes corosync/cman to crash with signal 6.

Version-Release number of selected component (if applicable):
RHEL 6.1

How reproducible:
Every time

Steps to Reproduce:
1. Create a cluster.conf with a long cluster name
2. start cman
  
Actual results:
cman crashes

Expected results:
cman should not crash.

Additional info:
In RHEL5 an error message was printed if the cluster name was too long. This appears not to be the case.

# cman_tool join
corosync died with signal: 6

or:

# cman_tool join -d
Validating configuration
calling '/usr/sbin/ccs_config_validate  '
Configuration validates
Starting /usr/sbin/corosync corosync -f
CMAN_DEBUG=255
COROSYNC_DEFAULT_CONFIG_IFACE=xmlconfig:cmanpreconfig
CMAN_PIPE=4
Aug 04 14:05:56 corosync [MAIN  ] Corosync Cluster Engine ('1.2.3'): started and ready to provide service.
Aug 04 14:05:56 corosync [MAIN  ] Corosync built-in features: nss rdma
Aug 04 14:05:56 corosync [MAIN  ] Successfully read config from /etc/cluster/cluster.conf
Aug 04 14:05:56 corosync [MAIN  ] Successfully parsed cman config
Aug 04 14:05:56 corosync [TOTEM ] Initializing transport (UDP/IP).
Aug 04 14:05:56 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
*** buffer overflow detected ***: corosync terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7fd47460a6a7]
/lib64/libc.so.6(+0xfe5a0)[0x7fd4746085a0]
/usr/libexec/lcrso/service_cman.lcrso(read_cman_config+0xae)[0x7fd46bbf78de]
/usr/libexec/lcrso/service_cman.lcrso(+0x4087)[0x7fd46bbf2087]
corosync(corosync_service_link_and_init+0xf7)[0x408177]
corosync(corosync_service_defaults_link_and_init+0xf1)[0x4084c1]
corosync[0x405e18]
/usr/lib64/libtotem_pg.so.4(main_iface_change_fn+0x10f)[0x7fd4752e2aff]
/usr/lib64/libtotem_pg.so.4(+0xa07a)[0x7fd4752dc07a]
/usr/lib64/libtotem_pg.so.4(poll_run+0x29d)[0x7fd4752d875d]
corosync(main+0x6cb)[0x4056ab]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7fd474528c9d]
corosync[0x404219]
======= Memory map: ========
<snip>

Aug 04 14:05:56 corosync [TOTEM ] The network interface [192.168.1.201] is now up.
Aug 04 14:05:56 corosync [QUORUM] Using quorum provider quorum_cman
Aug 04 14:05:56 corosync [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
forked process ID is 1706
corosync died with signal: 6
Comment 2 Christine Caulfield 2011-08-04 09:58:58 EDT
Created attachment 516709 [details]
Patch to add a check on the cluster name length
Comment 3 Christine Caulfield 2011-08-04 10:00:13 EDT
I should add that a customer has seen this problem, it is not 'internal' or theoretical.
Comment 5 Fabio Massimo Di Nitto 2011-08-04 10:36:25 EDT
http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=03e9af7db105bcfbb7a013974084d2ed171fb258

commit exists upstream, ACK for rhel6.
Comment 6 Fabio Massimo Di Nitto 2011-08-05 04:11:21 EDT
http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=ac195524d4a520b7f5bbd25e01715f4e0aa1ab19

little amendment to the original patch.

Unit test results:

<cluster name="fabbionefabbionefabbionefabbionefabbionefabbionefabbionefabbione" config_version="1">

/etc/init.d/cman start

*** buffer overflow detected ***: corosync terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7fd7b1767127]
/lib64/libc.so.6(+0xf8100)[0x7fd7b1765100]
[yadayada]

apply patches

[root@rhel6-node2 ~]# /etc/init.d/cman start
Starting cluster:
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman... Invalid cluster name. It must be 15 characters or fewer

Unable to get the configuration
Invalid cluster name. It must be 15 characters or fewer

cman_tool: corosync daemon didn't start Check cluster logs for details
                                                           [FAILED]
Comment 7 Fabio Massimo Di Nitto 2011-08-05 05:27:32 EDT
Further testing did show another problem related to the lack of cluster name. Missing cluster name will also cause a crash.

This is the final patch set and unit test results:

http://git.fedorahosted.org/git?p=cluster.git;a=commitdiff;h=1f345b45a5eeaedfcf5c48ac328c32d32d30ac26
http://git.fedorahosted.org/git?p=cluster.git;a=commitdiff;h=79aafcef1dafff42afcc085d55188f495ee3cc54
http://git.fedorahosted.org/git?p=cluster.git;a=commitdiff;h=eecdcabac84dd93abf026fbfdb6f1c850c98fa5b

old packages:

<cluster config_version="1" >

[root@rhel6-node2 ~]# /etc/init.d/cman start
Starting cluster:
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman... /usr/sbin/ccs_config_validate: line 186:  2007 Segmentation fault      (core dumped) ccs_config_dump > $tempfile

Unable to get the configuration

<cluster name="fabbionefabbionefabbionefabbionefabbionefabbione" config_version="1" >

[root@rhel6-node2 ~]# /etc/init.d/cman start
[snip]
*** buffer overflow detected ***: corosync terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7f1edb35b427]
/lib64/libc.so.6(+0xfd310)[0x7f1edb359310]
/usr/libexec/lcrso/service_cman.lcrso(read_cman_config+0xae)[0x7f1ed6a108fe]
/usr/libexec/lcrso/service_cman.lcrso(+0x4077)[0x7f1ed6a0b077]
corosync(corosync_service_link_and_init+0xf7)[0x408e97]
corosync(corosync_service_defaults_link_and_init+0xf1)[0x4091e1]
[snip]

new packages:

<cluster config_version="1" >

[root@rhel6-node2 ~]# /etc/init.d/cman start
Starting cluster:
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman... Unable to determine cluster name.

Unable to get the configuration
Unable to determine cluster name.

cman_tool: corosync daemon didn't start Check cluster logs for details
                                                           [FAILED]

<cluster name="fabbionefabbionefabbionefabbionefabbionefabbione" config_version="1" >

[root@rhel6-node2 ~]# /etc/init.d/cman start
Starting cluster:
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman... Invalid cluster name. It must be 15 characters or fewer

Unable to get the configuration
Invalid cluster name. It must be 15 characters or fewer

cman_tool: corosync daemon didn't start Check cluster logs for details
                                                           [FAILED]
Comment 9 Martin Juricek 2011-08-08 07:20:22 EDT
Verified in version cman-3.0.12.1-9.el6, kernel 2.6.32-131.0.15.el6


1)Cluster name longer than 15 characters:
...
<cluster config_version="1" name="Z_ClusterZ_ClusterZ_Cluster">
...

[root@z2 /]# service cman start
Starting cluster: 
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman... Invalid cluster name. It must be 15 characters or fewer

Unable to get the configuration
Invalid cluster name. It must be 15 characters or fewer

cman_tool: corosync daemon didn't start Check cluster logs for details
                                                           [FAILED]



2) Cluster name not set:
...
<cluster config_version="1">
...

[root@z2 /]# service cman start
Starting cluster: 
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman... Unable to determine cluster name.

Unable to get the configuration
Unable to determine cluster name.

cman_tool: corosync daemon didn't start Check cluster logs for details
                                                           [FAILED]
Comment 10 Fabio Massimo Di Nitto 2011-10-27 04:22:11 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: The lack of 2 sanity checks related to the length of cluster name would cause cman to crash at startup.
Consequence: cman would crash when starting up
Fix: Implemented the correct sanity checks and report proper error as necessary
Result: cman does not crash anylonger and inform the users of the incorrect value of cluster name
Comment 11 errata-xmlrpc 2011-12-06 09:52:43 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1516.html

Note You need to log in before you can comment on or make changes to this bug.