Hide Forgot
Description of problem: Every once and awhile I'm seeing the cluster fail to start due to "parse errors", however if I run the actual cman_tool join cmd by itself, the cluster starts just fine. Also, I find it odd that there is a "parse error" right after seeing the "Successfully parsed cman config" message. [root@taft-01 tmp]# service cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Enable Xend bridge net workaround... action not required Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... corosync died: Could not read cluster configuration Check cluster logs for details [FAILED] Sep 26 14:44:17 taft-01 corosync[4787]: parse error in config: parse error in config: . Sep 26 14:44:17 taft-01 corosync[4787]: [MAIN ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service. Sep 26 14:44:17 taft-01 corosync[4787]: [MAIN ] Corosync built-in features: nss dbus rdma snmp Sep 26 14:44:17 taft-01 corosync[4787]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf Sep 26 14:44:17 taft-01 corosync[4787]: [MAIN ] Successfully parsed cman config Sep 26 14:44:17 taft-01 corosync[4787]: [MAIN ] parse error in config: parse error in config: . Sep 26 14:44:17 taft-01 corosync[4787]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1680. However if I run the actual cman_tool join cmd by itself, the cluster joins just fine. # cman_tool -t 60 -w join -DWARN Version-Release number of selected component (if applicable): Linux taft-01 2.6.32-198.el6.x86_64 #1 SMP Thu Sep 15 23:40:38 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux corosync-1.4.1-3.el6.x86_64 How reproducible: service cman start
Created attachment 524979 [details] failing cluster.conf file
type=AVC msg=audit(1316212684.082:172): avc: denied { read } for pid=8091 comm="corosync" name="corosync.log" dev=dm-0 ino=131142 scontext=unconfined_u:system_r:corosync_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file not a corosync problem. Corosync doesn't modify selinux settings. restorecon -R /var/log/messages fixed the problem. -rw-r--r--. root root system_u:object_r:corosync_var_log_t:s0 corosync.log The audit log doesn't show how the process got the wrong context. reassigning to selinux-policy.
Created attachment 524991 [details] audit log
Corey, what does ausearch show if you execute # setenforce 0 # service cman start # ausearch -m avc -ts recent and does it work in permisive mode?
[root@taft-02 ~]# setenforce 0 [root@taft-02 ~]# service cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Enable Xend bridge net workaround... action not required Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] [...] Joining fence domain... [ OK ] [root@taft-02 ~]# ausearch -m avc -ts recent ---- time->Tue Sep 27 11:50:53 2011 type=SYSCALL msg=audit(1317142253.272:902): arch=c000003e syscall=2 success=no exit=-13 a0=245b2b0 a1=442 a2=1b6 a3=0 items=0 ppid=3516 pid=3568 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=1 comm="corosync" exe="/usr/sbin/corosync" subj=unconfined_u:system_r:corosync_t:s0 key=(null) type=AVC msg=audit(1317142253.272:902): avc: denied { read } for pid=3568 comm="corosync" name="corosync.log" dev=dm-0 ino=131131 scontext=unconfined_u:system_r:corosync_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file ---- time->Tue Sep 27 11:51:30 2011 type=SYSCALL msg=audit(1317142290.050:904): arch=c000003e syscall=2 success=yes exit=5 a0=16db2b0 a1=442 a2=1b6 a3=0 items=0 ppid=3622 pid=3674 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=1 comm="corosync" exe="/usr/sbin/corosync" subj=unconfined_u:system_r:corosync_t:s0 key=(null) type=AVC msg=audit(1317142290.050:904): avc: denied { read } for pid=3674 comm="corosync" name="corosync.log" dev=dm-0 ino=131131 scontext=unconfined_u:system_r:corosync_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file ---- time->Tue Sep 27 11:52:10 2011 type=SYSCALL msg=audit(1317142330.317:906): arch=c000003e syscall=2 success=yes exit=4 a0=31ca802dc0 a1=442 a2=1b6 a3=0 items=0 ppid=1 pid=3746 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=1 comm="dlm_controld" exe="/usr/sbin/dlm_controld" subj=unconfined_u:system_r:dlm_controld_t:s0 key=(null) type=AVC msg=audit(1317142330.317:906): avc: denied { read } for pid=3746 comm="dlm_controld" name="dlm_controld.log" dev=dm-0 ino=131166 scontext=unconfined_u:system_r:dlm_controld_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file ---- time->Tue Sep 27 11:52:10 2011 type=SYSCALL msg=audit(1317142330.208:905): arch=c000003e syscall=2 success=yes exit=4 a0=31ca802dc0 a1=442 a2=1b6 a3=0 items=0 ppid=1 pid=3730 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=1 comm="fenced" exe="/usr/sbin/fenced" subj=unconfined_u:system_r:fenced_t:s0 key=(null) type=AVC msg=audit(1317142330.208:905): avc: denied { read } for pid=3730 comm="fenced" name="fenced.log" dev=dm-0 ino=131164 scontext=unconfined_u:system_r:fenced_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file ---- time->Tue Sep 27 11:52:11 2011 type=SYSCALL msg=audit(1317142331.406:907): arch=c000003e syscall=2 success=yes exit=4 a0=31ca802dc0 a1=442 a2=1b6 a3=0 items=0 ppid=1 pid=3805 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=1 comm="gfs_controld" exe="/usr/sbin/gfs_controld" subj=unconfined_u:system_r:gfs_controld_t:s0 key=(null) type=AVC msg=audit(1317142331.406:907): avc: denied { read } for pid=3805 comm="gfs_controld" name="gfs_controld.log" dev=dm-0 ino=131172 scontext=unconfined_u:system_r:gfs_controld_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file
Did you run the tools by hand creating log files with the incorrect labels? restorecon -R -v /var/log Will fix the labels.
Please execute restorecon -R -v /var/log and then re-test it. Milos, could you try to run your cluster tests?
After few repeated executions of my automated test I see failing luci service but no AVCs.
I believe this is an issue with testing. Corey, please try to do steps in the comment #7.
I haven't been able to reproduce the AVC issues lately.
I'm seeing this again with the latest 6.3 rpms. [root@taft-01 ~]# service cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... corosync died: Could not read cluster configuration Check cluster logs for details [FAILED] Stopping cluster: Leaving fence domain... [ OK ] Stopping gfs_controld... [ OK ] Stopping dlm_controld... [ OK ] Stopping fenced... [ OK ] Stopping cman... [ OK ] Unloading kernel modules... [ OK ] Unmounting configfs... [ OK ] Apr 26 15:40:35 taft-01 corosync[2102]: parse error in config: Can't open logfile '/var/log/cluster/corosync.log' for reason: Permission denied (13).#012. Apr 26 15:40:35 taft-01 corosync[2102]: [MAIN ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service. Apr 26 15:40:35 taft-01 corosync[2102]: [MAIN ] Corosync built-in features: nss dbus rdma snmp Apr 26 15:40:35 taft-01 corosync[2102]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf Apr 26 15:40:35 taft-01 corosync[2102]: [MAIN ] Successfully parsed cman config Apr 26 15:40:35 taft-01 corosync[2102]: [MAIN ] parse error in config: Can't open logfile '/var/log/cluster/corosync.log' for reason: Permission denied (13).#012. Apr 26 15:40:35 taft-01 corosync[2102]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1686. type=AVC msg=audit(1335472835.211:29): avc: denied { read } for pid=2102 comm="corosync" name="corosync.log" dev=dm-0 ino=661435 scontext=unconfined_u:system_r:corosync_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file type=AVC msg=audit(1335382763.814:23): avc: denied { read } for pid=7607 comm="corosync" name="corosync.log" dev=dm-0 ino=661435 scontext=unconfined_u:system_r:corosync_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file type=AVC msg=audit(1334853491.795:43): avc: denied { connectto } for pid=2923 comm="fence_tool" path=0066656E6365645F736F636B scontext=unconfined_u:system_r:fenced_t:s0 tcontext=system_u:system_r:inetd_child_t:s0-s0:c0.c1023 tclass=unix_stream_socket
Executing the cmd listed in comment #7 appears to fix the issue, so what does that mean for automated test scripts? [root@taft-01 ~]# restorecon -R -v /var/log restorecon reset /var/log/yum.log context system_u:object_r:var_log_t:s0->system_u:object_r:rpm_log_t:s0 restorecon reset /var/log/cluster/gfs_controld.log context system_u:object_r:var_log_t:s0->system_u:object_r:gfs_controld_var_log_t:s0 restorecon reset /var/log/cluster/corosync.log context system_u:object_r:var_log_t:s0->system_u:object_r:corosync_var_log_t:s0 restorecon reset /var/log/cluster/dlm_controld.log context system_u:object_r:var_log_t:s0->system_u:object_r:dlm_controld_var_log_t:s0 restorecon reset /var/log/cluster/fenced.log context system_u:object_r:var_log_t:s0->system_u:object_r:fenced_var_log_t:s0 [root@taft-01 ~]# service cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] Waiting for quorum... [ OK ] Starting fenced... [ OK ] Starting dlm_controld... [ OK ] Starting gfs_controld... [ OK ] Unfencing self... [ OK ] Joining fence domain... [ OK ]
Is the automated test creating these directories? Who is creating them with the wrong label?
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux.
Can we reproduce it this issue? If yes, please reopen the bug. Thank you.