Bug 693725

Summary: selinux policies do not allow cluster to run
Product: Red Hat Enterprise Linux 6 Reporter: Miroslav Grepl <mgrepl>
Component: clusterAssignee: Fabio Massimo Di Nitto <fdinitto>
Status: CLOSED NOTABUG QA Contact: Cluster QE <mspqa-list>
Severity: low Docs Contact:
Priority: low    
Version: 6.1CC: ccaulfie, cluster-maint, cmarthal, dwalsh, lhh, nstraz, rpeterso, teigland
Target Milestone: rcKeywords: TestBlocker
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 692571 Environment:
Last Closed: 2011-04-05 16:35:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 692571    
Bug Blocks:    

Description Miroslav Grepl 2011-04-05 12:40:25 UTC
+++ This bug was initially created as a clone of Bug #692571 +++

Description of problem:

[root@grant-01 ~]# service cman start
Starting cluster:
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman... corosync died: Could not read cluster configuration Check cluster logs for details
                                                           [FAILED]


Mar 31 10:19:03 grant-01 corosync[2034]: parse error in config: parse error in config: .
Mar 31 10:19:03 grant-01 corosync[2034]:   [MAIN  ] Corosync Cluster Engine ('1.2.3'): started and ready to provide service.
Mar 31 10:19:03 grant-01 corosync[2034]:   [MAIN  ] Corosync built-in features: nss dbus rdma snmp
Mar 31 10:19:03 grant-01 corosync[2034]:   [MAIN  ] Successfully read config from /etc/cluster/cluster.conf
Mar 31 10:19:03 grant-01 corosync[2034]:   [MAIN  ] Successfully parsed cman config
Mar 31 10:19:03 grant-01 corosync[2034]:   [MAIN  ] parse error in config: parse error in config: .
Mar 31 10:19:03 grant-01 corosync[2034]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1651.



[root@grant-01 ~]# xmllint --relaxng /usr/share/cluster/cluster.rng /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="2" name="GRANT">
  <dlm log_debug="1"/>
  <cman>
                </cman>
  <fence_daemon clean_start="0" post_join_delay="30"/>
  <clusternodes>
    <clusternode name="grant-01" nodeid="1">
      <fence>
        <method name="IPMI">
          <device name="grant-01-ipmi"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="grant-02" nodeid="2">
      <fence>
        <method name="IPMI">
          <device name="grant-02-ipmi"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="grant-03" nodeid="3">
      <fence>
        <method name="APCEE">
          <device name="apc1" port="5" switch="1"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" ipaddr="grant-01-ipmi" login="root" name="grant-01-ipmi" passwd="password"/>
    <fencedevice agent="fence_ipmilan" ipaddr="grant-02-ipmi" login="root" name="grant-02-ipmi" passwd="password"/>
    <fencedevice agent="fence_apc" ipaddr="link-apc" login="apc" name="apc1" passwd="apc"/>
  </fencedevices>
</cluster>
/etc/cluster/cluster.conf validates



Version-Release number of selected component (if applicable):
[root@grant-01 ~]# uname -ar
Linux grant-01 2.6.32-125.el6.x86_64 #1 SMP Mon Mar 21 10:06:08 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@grant-01 ~]# rpm -q corosync
corosync-1.2.3-31.el6.x86_64



How reproducible:
Often

--- Additional comment from cmarthal on 2011-03-31 12:10:14 EDT ---

Looks like it's dying during the initial start up attempt, and then the subsequent parse errors may just be symptoms of it being already dead.

Mar 31 11:04:56 grant-01 qarshd[2152]: Running cmdline: service cman start 2>&1
Mar 31 11:04:56 grant-01 corosync[2190]:   [MAIN  ] Corosync Cluster Engine ('1.2.3'): started and ready to provide service.
Mar 31 11:04:56 grant-01 corosync[2190]:   [MAIN  ] Corosync built-in features: nss dbus rdma snmp
Mar 31 11:04:56 grant-01 corosync[2190]:   [MAIN  ] Successfully read config from /etc/cluster/cluster.conf
Mar 31 11:04:56 grant-01 corosync[2190]:   [MAIN  ] Successfully parsed cman config
Mar 31 11:04:56 grant-01 corosync[2190]:   [TOTEM ] Initializing transport (UDP/IP Multicast).
Mar 31 11:04:56 grant-01 corosync[2190]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Mar 31 11:04:56 grant-01 corosync[2190]:   [TOTEM ] The network interface [10.15.89.151] is now up.
Mar 31 11:04:57 grant-01 corosync[2190]:   [QUORUM] Using quorum provider quorum_cman
Mar 31 11:04:57 grant-01 corosync[2190]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Mar 31 11:04:57 grant-01 corosync[2190]:   [CMAN  ] CMAN 3.0.12 (built Mar 22 2011 05:32:49) started
Mar 31 11:04:57 grant-01 corosync[2190]:   [SERV  ] Service engine loaded: corosync CMAN membership service 2.90
Mar 31 11:04:57 grant-01 corosync[2190]:   [SERV  ] Service engine loaded: openais checkpoint service B.01.01
Mar 31 11:04:57 grant-01 corosync[2190]:   [SERV  ] Service engine loaded: corosync extended virtual synchrony service
Mar 31 11:04:57 grant-01 corosync[2190]:   [SERV  ] Service engine loaded: corosync configuration service
Mar 31 11:04:57 grant-01 corosync[2190]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01
Mar 31 11:04:57 grant-01 corosync[2190]:   [SERV  ] Service engine loaded: corosync cluster config database access v1.01
Mar 31 11:04:57 grant-01 corosync[2190]:   [SERV  ] Service engine loaded: corosync profile loading service
Mar 31 11:04:57 grant-01 corosync[2190]:   [QUORUM] Using quorum provider quorum_cman
Mar 31 11:04:57 grant-01 corosync[2190]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Mar 31 11:04:57 grant-01 corosync[2190]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Mar 31 11:04:57 grant-01 corosync[2190]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 31 11:04:57 grant-01 corosync[2190]:   [QUORUM] Members[1]: 1
Mar 31 11:04:57 grant-01 corosync[2190]:   [QUORUM] Members[1]: 1
Mar 31 11:04:57 grant-01 corosync[2190]:   [CPG   ] downlist received left_list: 0
Mar 31 11:04:57 grant-01 corosync[2190]:   [CPG   ] chosen downlist from node r(0) ip(10.15.89.151)
Mar 31 11:04:57 grant-01 corosync[2190]:   [MAIN  ] Completed service synchronization, ready to provide service.
Mar 31 11:04:57 grant-01 corosync[2190]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 31 11:04:57 grant-01 corosync[2190]:   [CMAN  ] quorum regained, resuming activity
Mar 31 11:04:57 grant-01 corosync[2190]:   [QUORUM] This node is within the primary component and will provide service.
Mar 31 11:04:57 grant-01 corosync[2190]:   [QUORUM] Members[2]: 1 2
Mar 31 11:04:57 grant-01 corosync[2190]:   [QUORUM] Members[2]: 1 2
Mar 31 11:04:57 grant-01 corosync[2190]:   [CPG   ] downlist received left_list: 0
Mar 31 11:04:57 grant-01 corosync[2190]:   [CPG   ] downlist received left_list: 0
Mar 31 11:04:57 grant-01 corosync[2190]:   [CPG   ] chosen downlist from node r(0) ip(10.15.89.151)
Mar 31 11:04:57 grant-01 corosync[2190]:   [MAIN  ] Completed service synchronization, ready to provide service.
Mar 31 11:04:59 grant-01 corosync[2190]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 31 11:04:59 grant-01 corosync[2190]:   [QUORUM] Members[3]: 1 2 3
Mar 31 11:04:59 grant-01 corosync[2190]:   [QUORUM] Members[3]: 1 2 3
Mar 31 11:04:59 grant-01 corosync[2190]:   [TOTEM ] Retransmit List: 16
Mar 31 11:04:59 grant-01 corosync[2190]:   [CPG   ] downlist received left_list: 0
Mar 31 11:04:59 grant-01 corosync[2190]:   [CPG   ] downlist received left_list: 0
Mar 31 11:04:59 grant-01 corosync[2190]:   [CPG   ] downlist received left_list: 0
Mar 31 11:04:59 grant-01 corosync[2190]:   [CPG   ] chosen downlist from node r(0) ip(10.15.89.151)
Mar 31 11:04:59 grant-01 corosync[2190]:   [TOTEM ] Retransmit List: 1f
Mar 31 11:04:59 grant-01 corosync[2190]:   [MAIN  ] Completed service synchronization, ready to provide service.
Mar 31 11:05:00 grant-01 fenced[2246]: fenced 3.0.12 started
Mar 31 11:05:01 grant-01 dlm_controld[2265]: dlm_controld 3.0.12 started
Mar 31 11:05:03 grant-01 gfs_controld[2323]: gfs_controld 3.0.12 started
Mar 31 11:05:03 grant-01 abrt[2342]: saved core dump of pid 2190 (/usr/sbin/corosync) to /var/spool/abrt/ccpp-1301587503-2190.new/coredump (51826688 bytes)
Mar 31 11:05:03 grant-01 abrtd: Directory 'ccpp-1301587503-2190' creation detected
Mar 31 11:05:03 grant-01 fenced[2246]: cluster is down, exiting
Mar 31 11:05:03 grant-01 dlm_controld[2265]: cluster is down, exiting
Mar 31 11:05:03 grant-01 dlm_controld[2265]: daemon cpg_dispatch error 2
Mar 31 11:05:03 grant-01 abrtd: Registered Database plugin 'SQLite3'
Mar 31 11:05:03 grant-01 xinetd[1785]: EXIT: qarsh status=0 pid=2152 duration=7(sec)
Mar 31 11:05:04 grant-01 abrtd: New crash /var/spool/abrt/ccpp-1301587503-2190, processing
Mar 31 11:05:05 grant-01 gfs_controld[2323]: daemon cpg_initialize error 6
Mar 31 11:05:05 grant-01 kernel: dlm: closing connection to node 3
Mar 31 11:05:05 grant-01 kernel: dlm: closing connection to node 2
Mar 31 11:05:05 grant-01 kernel: dlm: closing connection to node 1

--- Additional comment from cmarthal on 2011-03-31 12:14:25 EDT ---

FYI - comment #1 happened after upgrading to 1.2.3-33. comment #0 occurred with 1.2.3-31.

--- Additional comment from fdinitto on 2011-03-31 13:45:08 EDT ---

There have been no changes on our side and disabling selinux makes the world
work again.

Corey can provide a collection of audit.log from the nodes.

It appears that the issue is similar to one that has been filed recently in
fedora regarding /dev/dlm, but it needs to be double checked because I am no
selinux expert.

Once dlm_controld starts, everything goes south (at least based on cman init
startup sequence).

--- Additional comment from dwalsh on 2011-04-01 13:13:22 EDT ---

Could you attach the AVC messages?

I think this is caused because the /dev/dlm labels are wrong.

If you change the context I think this will work

# semanage fcontext -a -t dlm_control_device_t '/dev/dlm.*'
# restorecon -R -v /dev/dlm*

Miroslav this is the default labeling we now have in Fedora, can you put this into RHEL6,1

--- Additional comment from dwalsh on 2011-04-01 13:15:07 EDT ---

Actually this fix looks like it is in selinux-policy-3.7.19-80.el6

Preview is available on http://people.redhat.com/dwalsh/SELinux/RHEL6

--- Additional comment from cmarthal on 2011-04-01 15:12:36 EDT ---

type=AVC msg=audit(1301684886.261:50088): avc:  denied  { write } for  pid=6538 comm="cman_tool" name="cman_client" dev=dm-0 ino=1178785 scontext=unconfined_u:system_r:corosync_t:s0 tcontext=system_u:object_r:var_run_t:s0 tclass=sock_file
type=SYSCALL msg=audit(1301684886.261:50088): arch=c000003e syscall=42 success=no exit=-13 a0=3 a1=7fff1eb64a40 a2=6e a3=7fff1eb647c0 items=0 ppid=6518 pid=6538 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm="cman_tool" exe="/usr/sbin/cman_tool" subj=unconfined_u:system_r:corosync_t:s0 key=(null)
type=AVC msg=audit(1301684886.357:50089): avc:  denied  { write } for  pid=6541 comm="cman_tool" name="cman_admin" dev=dm-0 ino=1178786 scontext=unconfined_u:system_r:corosync_t:s0 tcontext=system_u:object_r:var_run_t:s0 tclass=sock_file
type=SYSCALL msg=audit(1301684886.357:50089): arch=c000003e syscall=42 success=no exit=-13 a0=3 a1=7fffa310cb00 a2=6e a3=7fffa310c880 items=0 ppid=6518 pid=6541 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm="cman_tool" exe="/usr/sbin/cman_tool" subj=unconfined_u:system_r:corosync_t:s0 key=(null)
type=AVC msg=audit(1301684886.471:50090): avc:  denied  { read } for  pid=6552 comm="corosync" name="corosync.log" dev=dm-0 ino=1178782 scontext=unconfined_u:system_r:corosync_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file
type=SYSCALL msg=audit(1301684886.471:50090): arch=c000003e syscall=2 success=no exit=-13 a0=18b9210 a1=442 a2=1b6 a3=0 items=0 ppid=6541 pid=6552 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=2 comm="corosync" exe="/usr/sbin/corosync" subj=unconfined_u:system_r:corosync_t:s0 key=(null)
type=AVC msg=audit(1301684889.430:50091): avc:  denied  { write } for  pid=6570 comm="cman_tool" name="cman_client" dev=dm-0 ino=1178785 scontext=unconfined_u:system_r:corosync_t:s0 tcontext=system_u:object_r:var_run_t:s0 tclass=sock_file
type=SYSCALL msg=audit(1301684889.430:50091): arch=c000003e syscall=42 success=no exit=-13 a0=3 a1=7fff7e39a850 a2=6e a3=7fff7e39a5d0 items=0 ppid=6518 pid=6570 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm="cman_tool" exe="/usr/sbin/cman_tool" subj=unconfined_u:system_r:corosync_t:s0 key=(null)

--- Additional comment from cmarthal on 2011-04-01 15:14:48 EDT ---

I'm seeing this issue with the latest selinux-policy.

[root@taft-01 ~]# rpm -q selinux-policy
selinux-policy-3.7.19-80.el6.noarch

--- Additional comment from cmarthal on 2011-04-01 15:39:29 EDT ---

How sure are we that this is selinux? I'm able to reproduce this even in
permissive mode.

[root@taft-01 ~]# service cman start
Starting cluster:
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman...                                        [  OK  ]
   Waiting for quorum... cman_tool: Cannot open connection to cman, is it
running ?
                                                           [FAILED]
[root@taft-01 ~]# getenforce
Permissive

--- Additional comment from mgrepl on 2011-04-04 03:27:40 EDT ---

Corey,
could you turn on full auditing using 

# echo "-w /etc/shadow -p w" >> /etc/audit/audit.rules
# service auditd restart

and then try to re-test it and add AVC msgs in permissive mode. I would like to know where corosync.log, cman_admin, cman_client objects are located.

Also what does

# ls -lZ /var/run/cman*

--- Additional comment from cmarthal on 2011-04-04 19:14:54 EDT ---

[root@taft-01 ~]# service cman start
Starting cluster:
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman...                                        [  OK  ]
   Waiting for quorum...                                   [  OK  ]
   Starting fenced...                                      [  OK  ]
   Starting dlm_controld...                                [  OK  ]
   Starting gfs_controld...                                [  OK  ]
   Unfencing self...                                       [  OK  ]
   Joining fence domain...                                 [  OK  ]

[root@taft-01 ~]# ls -lZ /var/run/cman*
srw-------. root root unconfined_u:object_r:corosync_var_run_t:s0 /var/run/cman_admin
srw-rw----. root root unconfined_u:object_r:corosync_var_run_t:s0 /var/run/cman_client
-rw-r--r--. root root unconfined_u:object_r:initrc_var_run_t:s0 /var/run/cman.pid

type=AVC msg=audit(1301958789.590:45): avc:  denied  { read } for  pid=2119 comm="fenced" name="fenced.log" dev=dm-0 ino=921008 scontext=unconfined_u:system_r:fenced_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file
type=SYSCALL msg=audit(1301958789.590:45): arch=c000003e syscall=2 success=yes exit=4 a0=34fc002dc0 a1=442 a2=1b6 a3=0 items=1 ppid=1 pid=2119 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=2 comm="fenced" exe="/usr/sbin/fenced" subj=unconfined_u:system_r:fenced_t:s0 key=(null)
type=CWD msg=audit(1301958789.590:45):  cwd="/"
type=PATH msg=audit(1301958789.590:45): item=0 name="/var/log/cluster/fenced.log" inode=921008 dev=fd:00 mode=0100666 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:var_log_t:s0
type=AVC msg=audit(1301958789.745:46): avc:  denied  { read } for  pid=2139 comm="dlm_controld" name="dlm_controld.log" dev=dm-0 ino=921010 scontext=unconfined_u:system_r:dlm_controld_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file
type=SYSCALL msg=audit(1301958789.745:46): arch=c000003e syscall=2 success=yes exit=4 a0=34fc002dc0 a1=442 a2=1b6 a3=0 items=1 ppid=1 pid=2139 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=2 comm="dlm_controld" exe="/usr/sbin/dlm_controld" subj=unconfined_u:system_r:dlm_controld_t:s0 key=(null)
type=CWD msg=audit(1301958789.745:46):  cwd="/"
type=PATH msg=audit(1301958789.745:46): item=0 name="/var/log/cluster/dlm_controld.log" inode=921010 dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:var_log_t:s0
type=AVC msg=audit(1301958790.838:47): avc:  denied  { read } for  pid=2194 comm="gfs_controld" name="gfs_controld.log" dev=dm-0 ino=921011 scontext=unconfined_u:system_r:gfs_controld_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file
type=SYSCALL msg=audit(1301958790.838:47): arch=c000003e syscall=2 success=yes exit=4 a0=34fc002dc0 a1=442 a2=1b6 a3=0 items=1 ppid=1 pid=2194 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=2 comm="gfs_controld" exe="/usr/sbin/gfs_controld" subj=unconfined_u:system_r:gfs_controld_t:s0 key=(null)
type=CWD msg=audit(1301958790.838:47):  cwd="/"
type=PATH msg=audit(1301958790.838:47): item=0 name="/var/log/cluster/gfs_controld.log" inode=921011 dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:var_log_t:s0

--- Additional comment from mgrepl on 2011-04-05 04:37:54 EDT ---

Created attachment 489940 [details]
cman init script patch

--- Additional comment from mgrepl on 2011-04-05 04:39:24 EDT ---

Corey,
your cluster log files are mislabeled. Not sure why. Did you run cluster services directly without using the cman service script?

Execute

# restorecon -R -v /var/log/cluster


The next issue is with the cman init script which contains

pidof /usr/sbin/corosync > /var/run/cman.pid

This causes "cmain.pid" is labeled as initrc_var_run_t. So I need to add label to policy and the attached patch is needed for cman init script.

You can test it using

# semanage fcontext -a -t corosync_var_run_t "/var/run/cman.pid"

and apply the patch. Then try to start cluster services again.

Comment 2 RHEL Program Management 2011-04-05 12:43:29 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 3 Fabio Massimo Di Nitto 2011-04-05 13:08:39 UTC
What problem are we trying to solve exactly?

cman.pid has been around since before 6.0, it doesn´t trigger any denial/AVC or generate any issues related to the original bug report.

I´d like to understand what problem is causing before fixing it.

thanks

Comment 4 Corey Marthaler 2011-04-05 16:26:31 UTC
Why are there two bugs opened for this issue now?