Hide Forgot
+++ This bug was initially created as a clone of Bug #741434 +++ Description of problem: Every once and awhile I'm seeing the cluster fail to start due to "parse errors", however if I run the actual cman_tool join cmd by itself, the cluster starts just fine. Also, I find it odd that there is a "parse error" right after seeing the "Successfully parsed cman config" message. [root@taft-01 tmp]# service cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Enable Xend bridge net workaround... action not required Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... corosync died: Could not read cluster configuration Check cluster logs for details [FAILED] Sep 26 14:44:17 taft-01 corosync[4787]: parse error in config: parse error in config: . Sep 26 14:44:17 taft-01 corosync[4787]: [MAIN ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service. Sep 26 14:44:17 taft-01 corosync[4787]: [MAIN ] Corosync built-in features: nss dbus rdma snmp Sep 26 14:44:17 taft-01 corosync[4787]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf Sep 26 14:44:17 taft-01 corosync[4787]: [MAIN ] Successfully parsed cman config Sep 26 14:44:17 taft-01 corosync[4787]: [MAIN ] parse error in config: parse error in config: . Sep 26 14:44:17 taft-01 corosync[4787]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1680. However if I run the actual cman_tool join cmd by itself, the cluster joins just fine. # cman_tool -t 60 -w join -DWARN Version-Release number of selected component (if applicable): Linux taft-01 2.6.32-198.el6.x86_64 #1 SMP Thu Sep 15 23:40:38 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux corosync-1.4.1-3.el6.x86_64 How reproducible: service cman start --- Additional comment from cmarthal on 2011-09-26 16:09:19 EDT --- Created attachment 524979 [details] failing cluster.conf file --- Additional comment from sdake on 2011-09-26 16:54:14 EDT --- type=AVC msg=audit(1316212684.082:172): avc: denied { read } for pid=8091 comm="corosync" name="corosync.log" dev=dm-0 ino=131142 scontext=unconfined_u:system_r:corosync_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file not a corosync problem. Corosync doesn't modify selinux settings. restorecon -R /var/log/messages fixed the problem. -rw-r--r--. root root system_u:object_r:corosync_var_log_t:s0 corosync.log The audit log doesn't show how the process got the wrong context. reassigning to selinux-policy. --- Additional comment from sdake on 2011-09-26 16:57:47 EDT --- Created attachment 524991 [details] audit log
An inability to read a config or operational file (because of selinux for example) should give a more useful error then a parse error. Instead the filename with an appropriate error should be printed.
Created attachment 559020 [details] Proposed patch Main problem wasn't hidden in coroparse (parser) but in mainconfig module, which passed incorrect string pointer to function which opens log file. Patch handles this by passing correct string to function, so correct error is now printed. Reproducer: set logfile: in corosync.conf to some non-existing directory. Example: logfile: /var/log/cluster2/corosync.log run corosync. Old result: As described in bz New result: Feb 02 10:47:24 corosync [MAIN ] parse error in config: Can't open logfile '/var/log/cluster2/corosync.log' for reason: No such file or directory (2).
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause Set logging to file if path to file contains nonexisting directory. Consequence Error message which doesn't make sense (configuration file error) Fix Pass proper string pointer to function for opening log file Result Correct error message (can't create log file) is displayed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0777.html