Bug 728230
| Summary: | cman crashes on startup if cluster name is too long or is not set at all | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Christine Caulfield <ccaulfie> | ||||
| Component: | cluster | Assignee: | Fabio Massimo Di Nitto <fdinitto> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 6.1 | CC: | ccaulfie, cluster-maint, djansa, lhh, mjuricek, rdassen, rpeterso, teigland | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | cluster-3.0.12.1-9.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Cause: The lack of 2 sanity checks related to the length of cluster name would cause cman to crash at startup.
Consequence: cman would crash when starting up
Fix: Implemented the correct sanity checks and report proper error as necessary
Result: cman does not crash anylonger and inform the users of the incorrect value of cluster name
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-12-06 14:52:43 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 658636 | ||||||
| Attachments: |
|
||||||
Created attachment 516709 [details]
Patch to add a check on the cluster name length
I should add that a customer has seen this problem, it is not 'internal' or theoretical. http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=03e9af7db105bcfbb7a013974084d2ed171fb258 commit exists upstream, ACK for rhel6. http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=ac195524d4a520b7f5bbd25e01715f4e0aa1ab19 little amendment to the original patch. Unit test results: <cluster name="fabbionefabbionefabbionefabbionefabbionefabbionefabbionefabbione" config_version="1"> /etc/init.d/cman start *** buffer overflow detected ***: corosync terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x7fd7b1767127] /lib64/libc.so.6(+0xf8100)[0x7fd7b1765100] [yadayada] apply patches [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Invalid cluster name. It must be 15 characters or fewer Unable to get the configuration Invalid cluster name. It must be 15 characters or fewer cman_tool: corosync daemon didn't start Check cluster logs for details [FAILED] Further testing did show another problem related to the lack of cluster name. Missing cluster name will also cause a crash. This is the final patch set and unit test results: http://git.fedorahosted.org/git?p=cluster.git;a=commitdiff;h=1f345b45a5eeaedfcf5c48ac328c32d32d30ac26 http://git.fedorahosted.org/git?p=cluster.git;a=commitdiff;h=79aafcef1dafff42afcc085d55188f495ee3cc54 http://git.fedorahosted.org/git?p=cluster.git;a=commitdiff;h=eecdcabac84dd93abf026fbfdb6f1c850c98fa5b old packages: <cluster config_version="1" > [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... /usr/sbin/ccs_config_validate: line 186: 2007 Segmentation fault (core dumped) ccs_config_dump > $tempfile Unable to get the configuration <cluster name="fabbionefabbionefabbionefabbionefabbionefabbione" config_version="1" > [root@rhel6-node2 ~]# /etc/init.d/cman start [snip] *** buffer overflow detected ***: corosync terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x7f1edb35b427] /lib64/libc.so.6(+0xfd310)[0x7f1edb359310] /usr/libexec/lcrso/service_cman.lcrso(read_cman_config+0xae)[0x7f1ed6a108fe] /usr/libexec/lcrso/service_cman.lcrso(+0x4077)[0x7f1ed6a0b077] corosync(corosync_service_link_and_init+0xf7)[0x408e97] corosync(corosync_service_defaults_link_and_init+0xf1)[0x4091e1] [snip] new packages: <cluster config_version="1" > [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Unable to determine cluster name. Unable to get the configuration Unable to determine cluster name. cman_tool: corosync daemon didn't start Check cluster logs for details [FAILED] <cluster name="fabbionefabbionefabbionefabbionefabbionefabbione" config_version="1" > [root@rhel6-node2 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Invalid cluster name. It must be 15 characters or fewer Unable to get the configuration Invalid cluster name. It must be 15 characters or fewer cman_tool: corosync daemon didn't start Check cluster logs for details [FAILED] Verified in version cman-3.0.12.1-9.el6, kernel 2.6.32-131.0.15.el6
1)Cluster name longer than 15 characters:
...
<cluster config_version="1" name="Z_ClusterZ_ClusterZ_Cluster">
...
[root@z2 /]# service cman start
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... Invalid cluster name. It must be 15 characters or fewer
Unable to get the configuration
Invalid cluster name. It must be 15 characters or fewer
cman_tool: corosync daemon didn't start Check cluster logs for details
[FAILED]
2) Cluster name not set:
...
<cluster config_version="1">
...
[root@z2 /]# service cman start
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... Unable to determine cluster name.
Unable to get the configuration
Unable to determine cluster name.
cman_tool: corosync daemon didn't start Check cluster logs for details
[FAILED]
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Cause: The lack of 2 sanity checks related to the length of cluster name would cause cman to crash at startup.
Consequence: cman would crash when starting up
Fix: Implemented the correct sanity checks and report proper error as necessary
Result: cman does not crash anylonger and inform the users of the incorrect value of cluster name
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1516.html |
Description of problem: Cluster names can be a maximum of 15 characters but there seems to be no useful checking in cman in RHEL6. Starting a cluster with an invalid cluster name causes corosync/cman to crash with signal 6. Version-Release number of selected component (if applicable): RHEL 6.1 How reproducible: Every time Steps to Reproduce: 1. Create a cluster.conf with a long cluster name 2. start cman Actual results: cman crashes Expected results: cman should not crash. Additional info: In RHEL5 an error message was printed if the cluster name was too long. This appears not to be the case. # cman_tool join corosync died with signal: 6 or: # cman_tool join -d Validating configuration calling '/usr/sbin/ccs_config_validate ' Configuration validates Starting /usr/sbin/corosync corosync -f CMAN_DEBUG=255 COROSYNC_DEFAULT_CONFIG_IFACE=xmlconfig:cmanpreconfig CMAN_PIPE=4 Aug 04 14:05:56 corosync [MAIN ] Corosync Cluster Engine ('1.2.3'): started and ready to provide service. Aug 04 14:05:56 corosync [MAIN ] Corosync built-in features: nss rdma Aug 04 14:05:56 corosync [MAIN ] Successfully read config from /etc/cluster/cluster.conf Aug 04 14:05:56 corosync [MAIN ] Successfully parsed cman config Aug 04 14:05:56 corosync [TOTEM ] Initializing transport (UDP/IP). Aug 04 14:05:56 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). *** buffer overflow detected ***: corosync terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x7fd47460a6a7] /lib64/libc.so.6(+0xfe5a0)[0x7fd4746085a0] /usr/libexec/lcrso/service_cman.lcrso(read_cman_config+0xae)[0x7fd46bbf78de] /usr/libexec/lcrso/service_cman.lcrso(+0x4087)[0x7fd46bbf2087] corosync(corosync_service_link_and_init+0xf7)[0x408177] corosync(corosync_service_defaults_link_and_init+0xf1)[0x4084c1] corosync[0x405e18] /usr/lib64/libtotem_pg.so.4(main_iface_change_fn+0x10f)[0x7fd4752e2aff] /usr/lib64/libtotem_pg.so.4(+0xa07a)[0x7fd4752dc07a] /usr/lib64/libtotem_pg.so.4(poll_run+0x29d)[0x7fd4752d875d] corosync(main+0x6cb)[0x4056ab] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fd474528c9d] corosync[0x404219] ======= Memory map: ======== <snip> Aug 04 14:05:56 corosync [TOTEM ] The network interface [192.168.1.201] is now up. Aug 04 14:05:56 corosync [QUORUM] Using quorum provider quorum_cman Aug 04 14:05:56 corosync [SERV ] Service engine loaded: corosync cluster quorum service v0.1 forked process ID is 1706 corosync died with signal: 6