1236980 – [SELinux]: RHEL7.1CTDB node goes to DISCONNECTED/BANNED state when multiple nodes are rebooted

Bug 1236980 - [SELinux]: RHEL7.1CTDB node goes to DISCONNECTED/BANNED state when multiple nodes are rebooted

Summary: [SELinux]: RHEL7.1CTDB node goes to DISCONNECTED/BANNED state when multiple n...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	samba
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.1.0
Assignee:	Jose A. Rivera
QA Contact:	surabhi
Docs Contact:
URL:
Whiteboard:
Depends On:	1224879
Blocks:	1202842 1212796 1241095
TreeView+	depends on / blocked

Reported:	2015-06-30 06:21 UTC by surabhi
Modified:	2015-08-04 14:14 UTC (History)
CC List:	12 users (show)
Fixed In Version:	glusterfs-3.7.1-10, selinux-policy-3.13.1-33.el7
Doc Type:	Bug Fix
Doc Text:	After multiple CTDB cluster nodes were rebooted one after another while I/O from a Windows client was set, the status of the cluster was incorrectly displayed as UNHEALTHY and the status of the nodes as BANNED or DISCONNECTED. With this update, the related SELinux policy no longer prevents signal transmission between the CTDB cluster and certain Samba processes. As a result, the status of the cluster and the nodes displays properly in the above situation.
Clone Of:
Clones:	1241095 (view as bug list)
Environment:
Last Closed:	2015-07-29 05:08:25 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1495	0	normal	SHIPPED_LIVE	Important: Red Hat Gluster Storage 3.1 update	2015-07-29 08:26:26 UTC

Description surabhi 2015-06-30 06:21:24 UTC

Description of problem:

CTDB cluster doesn't come to healthy state when multiple nodes are rebooted one after the other and I/O's are running from windows client.

1st time:
**************
Out of 4 node CTDB cluster, when rebooted two nodes one after the other, the node comes back and remains in UNHEALTHY state and two other nodes goes to BANNING state.

2nd time:
************
Out of 4 node CTDB cluster, when rebooted two nodes one after the other, the node comes back and remains in UNHEALTHY state and two other nodes goes to DISCONNECTED state.

It happens even without running the I/O's.

Version-Release number of selected component (if applicable):
ctdb2.5-2.5.5-2.el7rhgs.x86_64

How reproducible:
Always

Steps to Reproduce:
1.Create a CTDB setup
2. Mount volume using VIP
3. start i/o's from windows client
4. reboot node 1, check ctdb status
5. reboot node 3 , check ctdb status
6. wait for both nodes to come up, check ctdb status
7. ctdb status shows the nodes in UNHEALTHY/DISCONNECTED state.
8. In one scenario node goes to banned state.

Actual results:
CTDB cluster UNHEALTHY.
Node goes to banned/DISCONNECTED state

Expected results:

Once all the nodes come up, the cluster should be up and all nodes should be in OK state.

Additional info:

When test was run in SELinux enforcing mode, there were AVC's related to ctdb and iptables.
type=AVC msg=audit(06/30/2015 01:25:33.897:367) : avc: denied { read } for pid=4431 comm=iptables path=/var/lib/ctdb/iptables-ctdb.flock dev="dm-0" ino=67681652 scontext=system_u:system_r:iptables_t:s0 tcontext=system_u:object_r:ctdbd_var_lib_t:s0 tclass=file

Switched the SELinux to permissive mode.
Still cluster not coming to healthy state.

Will provide the sosreports.

Comment 4 surabhi 2015-07-03 08:53:22 UTC

Even with the new build CTDB2.5.5-3 , the nodes are not coming to healthy state. after reboot.

Seeing following AVC's when a system is rebooted and trying to failback.
 type=AVC msg=audit(07/03/2015 01:30:25.839:154) : avc:  denied  { block_suspend } for  pid=31332 comm=smbd capability=block_suspend  scontext=system_u:system_r:smbd_t:s0 tcontext=system_u:system_r:smbd_t:s0 tclass=capability2

Comment 5 surabhi 2015-07-03 08:59:19 UTC

Worked with smb-dev and SELinux team to root cause this and seems like SELinux issue.
The fix has to come in the next build of Selinux for RHEL7.1.
The SELinux bz for RHEL7.1 is https://bugzilla.redhat.com/show_bug.cgi?id=1224879

Comment 11 surabhi 2015-07-08 12:33:11 UTC

With the policy provided in #C9 ,
With multiple reboots of nodes,all nodes comes to OK state.

No AVC's are seen related to iptables, winbind and ctdb.
Please include these policies in RHEL7.1 selinux policy build.

Comment 12 surabhi 2015-07-09 07:19:29 UTC

With #C25 in BZ : https://bugzilla.redhat.com/show_bug.cgi?id=1224879 , All the AVC's are fixed now.Need RHEL7 SELinux policy build to verify the bug.

Comment 13 surabhi 2015-07-15 09:37:00 UTC

With SELinux policy build :

selinux-policy-targeted-3.13.1-32.el7.noarch
selinux-policy-3.13.1-32.el7.noarch

I am seeing following AVC's which were not seen in the earlier build.
Worked with Milos on the same and found that the rule 
allow ctdbd_t systemd_systemctl_exec_t : file { ioctl read getattr lock execute execute_no_trans open } ;  is present in .31el7 build but is missing from .32el7 build.

Updated RHEL policy BZ : https://bugzilla.redhat.com/show_bug.cgi?id=1224879

Comment 14 Miroslav Grepl 2015-07-15 11:02:04 UTC

It is strange.

Lukas,
can you check it?

Comment 15 Lukas Vrabec 2015-07-15 11:04:50 UTC

This is very strange. 
Actually, I'm working on this issue.

Comment 17 Lukas Vrabec 2015-07-15 13:17:42 UTC

commit ce652d6c62c6d38d1dab05b862cecc863075d28c
Author: Lukas Vrabec <lvrabec>
Date:   Wed Jul 15 14:01:16 2015 +0200

    Allow ctdbd_t send signull to samba_unconfined_net_t.

commit 4aea5f1b161c8e711f593cf123de3b155ba71229
Author: Lukas Vrabec <lvrabec>
Date:   Wed Jul 15 14:00:39 2015 +0200

    Add samba_signull_unconfined_net()

commit 645b04ea4006f4f25f606662cdf9b526df7226e5
Author: Lukas Vrabec <lvrabec>
Date:   Wed Jul 15 13:44:41 2015 +0200

    Add samba_signull_winbind()

Comment 18 Lukas Vrabec 2015-07-15 14:48:15 UTC

I make new selinux-policy build with fixes.

Comment 19 Rejy M Cyriac 2015-07-15 15:18:07 UTC

We need a RHEL 7.1.z build for the BZ to be moved to ON_QA

The fix is to be tested with the new selinux-policy-3.13.1-33.el7 build

Comment 26 surabhi 2015-07-16 06:35:15 UTC

with the build selinux-policy-3.13.1-33.el7.noarch 
selinux-policy-targeted-3.13.1-33.el7.noarch

There is no AVC seen and all ctdb nodes comes to OK state after rebooting multiple nodes.

Need 7.1.z build for this bug.
Moving it to verified with this build which is for 7.2.

Comment 27 errata-xmlrpc 2015-07-29 05:08:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Note You need to log in before you can comment on or make changes to this bug.