Bug 1236980

Summary:	[SELinux]: RHEL7.1CTDB node goes to DISCONNECTED/BANNED state when multiple nodes are rebooted
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	surabhi <sbhaloth>
Component:	samba	Assignee:	Jose A. Rivera <jarrpa>
Status:	CLOSED ERRATA	QA Contact:	surabhi <sbhaloth>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	rhgs-3.1	CC:	gdeschner, jherrman, lvrabec, madam, mgrepl, mmalik, nlevinki, nsathyan, pprakash, rcyriac, sbhaloth, vagarwal
Target Milestone:	---	Keywords:	Regression
Target Release:	RHGS 3.1.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.7.1-10, selinux-policy-3.13.1-33.el7	Doc Type:	Bug Fix
Doc Text:	After multiple CTDB cluster nodes were rebooted one after another while I/O from a Windows client was set, the status of the cluster was incorrectly displayed as UNHEALTHY and the status of the nodes as BANNED or DISCONNECTED. With this update, the related SELinux policy no longer prevents signal transmission between the CTDB cluster and certain Samba processes. As a result, the status of the cluster and the nodes displays properly in the above situation.	Story Points:	---
Clone Of:
Clones:	1241095 (view as bug list)		Environment:
Last Closed:	2015-07-29 05:08:25 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1224879
Bug Blocks:	1202842, 1212796, 1241095

Description surabhi 2015-06-30 06:21:24 UTC

Description of problem:

CTDB cluster doesn't come to healthy state when multiple nodes are rebooted one after the other and I/O's are running from windows client.

1st time:
**************
Out of 4 node CTDB cluster, when rebooted two nodes one after the other, the node comes back and remains in UNHEALTHY state and two other nodes goes to BANNING state.

2nd time:
************
Out of 4 node CTDB cluster, when rebooted two nodes one after the other, the node comes back and remains in UNHEALTHY state and two other nodes goes to DISCONNECTED state.

It happens even without running the I/O's.

Version-Release number of selected component (if applicable):
ctdb2.5-2.5.5-2.el7rhgs.x86_64

How reproducible:
Always

Steps to Reproduce:
1.Create a CTDB setup
2. Mount volume using VIP
3. start i/o's from windows client
4. reboot node 1, check ctdb status
5. reboot node 3 , check ctdb status
6. wait for both nodes to come up, check ctdb status
7. ctdb status shows the nodes in UNHEALTHY/DISCONNECTED state.
8. In one scenario node goes to banned state.

Actual results:
CTDB cluster UNHEALTHY.
Node goes to banned/DISCONNECTED state

Expected results:

Once all the nodes come up, the cluster should be up and all nodes should be in OK state.

Additional info:

When test was run in SELinux enforcing mode, there were AVC's related to ctdb and iptables.
type=AVC msg=audit(06/30/2015 01:25:33.897:367) : avc: denied { read } for pid=4431 comm=iptables path=/var/lib/ctdb/iptables-ctdb.flock dev="dm-0" ino=67681652 scontext=system_u:system_r:iptables_t:s0 tcontext=system_u:object_r:ctdbd_var_lib_t:s0 tclass=file

Switched the SELinux to permissive mode.
Still cluster not coming to healthy state.

Will provide the sosreports.

Comment 4 surabhi 2015-07-03 08:53:22 UTC

Even with the new build CTDB2.5.5-3 , the nodes are not coming to healthy state. after reboot.

Seeing following AVC's when a system is rebooted and trying to failback.
 type=AVC msg=audit(07/03/2015 01:30:25.839:154) : avc:  denied  { block_suspend } for  pid=31332 comm=smbd capability=block_suspend  scontext=system_u:system_r:smbd_t:s0 tcontext=system_u:system_r:smbd_t:s0 tclass=capability2

Comment 5 surabhi 2015-07-03 08:59:19 UTC

Worked with smb-dev and SELinux team to root cause this and seems like SELinux issue.
The fix has to come in the next build of Selinux for RHEL7.1.
The SELinux bz for RHEL7.1 is https://bugzilla.redhat.com/show_bug.cgi?id=1224879

Comment 11 surabhi 2015-07-08 12:33:11 UTC

With the policy provided in #C9 ,
With multiple reboots of nodes,all nodes comes to OK state.

No AVC's are seen related to iptables, winbind and ctdb.
Please include these policies in RHEL7.1 selinux policy build.

Comment 12 surabhi 2015-07-09 07:19:29 UTC

With #C25 in BZ : https://bugzilla.redhat.com/show_bug.cgi?id=1224879 , All the AVC's are fixed now.Need RHEL7 SELinux policy build to verify the bug.

Comment 13 surabhi 2015-07-15 09:37:00 UTC

With SELinux policy build :

selinux-policy-targeted-3.13.1-32.el7.noarch
selinux-policy-3.13.1-32.el7.noarch

I am seeing following AVC's which were not seen in the earlier build.
Worked with Milos on the same and found that the rule 
allow ctdbd_t systemd_systemctl_exec_t : file { ioctl read getattr lock execute execute_no_trans open } ;  is present in .31el7 build but is missing from .32el7 build.

Updated RHEL policy BZ : https://bugzilla.redhat.com/show_bug.cgi?id=1224879

Comment 14 Miroslav Grepl 2015-07-15 11:02:04 UTC

It is strange.

Lukas,
can you check it?

Comment 15 Lukas Vrabec 2015-07-15 11:04:50 UTC

This is very strange. 
Actually, I'm working on this issue.

Comment 17 Lukas Vrabec 2015-07-15 13:17:42 UTC

commit ce652d6c62c6d38d1dab05b862cecc863075d28c
Author: Lukas Vrabec <lvrabec>
Date:   Wed Jul 15 14:01:16 2015 +0200

    Allow ctdbd_t send signull to samba_unconfined_net_t.

commit 4aea5f1b161c8e711f593cf123de3b155ba71229
Author: Lukas Vrabec <lvrabec>
Date:   Wed Jul 15 14:00:39 2015 +0200

    Add samba_signull_unconfined_net()

commit 645b04ea4006f4f25f606662cdf9b526df7226e5
Author: Lukas Vrabec <lvrabec>
Date:   Wed Jul 15 13:44:41 2015 +0200

    Add samba_signull_winbind()

Comment 18 Lukas Vrabec 2015-07-15 14:48:15 UTC

I make new selinux-policy build with fixes.

Comment 19 Rejy M Cyriac 2015-07-15 15:18:07 UTC

We need a RHEL 7.1.z build for the BZ to be moved to ON_QA

The fix is to be tested with the new selinux-policy-3.13.1-33.el7 build

Comment 26 surabhi 2015-07-16 06:35:15 UTC

with the build selinux-policy-3.13.1-33.el7.noarch 
selinux-policy-targeted-3.13.1-33.el7.noarch

There is no AVC seen and all ctdb nodes comes to OK state after rebooting multiple nodes.

Need 7.1.z build for this bug.
Moving it to verified with this build which is for 7.2.

Comment 27 errata-xmlrpc 2015-07-29 05:08:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html