Bug 1619672

Summary: After upgrade to OCP 3.9, existing Openshift nodes do not have the SELinux boolean container_manage_cgroup enabled, but new nodes added to the cluster have it enabled
Product: OpenShift Container Platform Reporter: Sylvain Chen <sychen>
Component: Cluster Version OperatorAssignee: Michael Gugino <mgugino>
Status: CLOSED ERRATA QA Contact: Weihua Meng <wmeng>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.9.0CC: ansverma, aos-bugs, cshereme, jokerman, mgugino, mmccomas, scuppett, sdodson, smunilla, sychen
Target Milestone: ---Keywords: Reopened
Target Release: 3.9.z   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-13 19:27:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sylvain Chen 2018-08-21 12:46:08 UTC
Description of problem:
From the upgrade from 3.7 to 3.9, the SELinux boolean container_manage_cgroup will not be enabled for the existing nodes. However, when we scale up the cluster, this boolean will be activated by default by the openshift node playbook.

Version-Release number of selected component (if applicable):
3.9.40

How reproducible:
On RHEL 7.4/7.5

Steps to Reproduce:
0. Check SELinux boolean container_manage_cgroup, it should not be enabled by default on all the nodes
1. Upgrade cluster from OCP 3.7 to 3.9
2. Scale up cluster
3. Check SELinux boolean container_manage_cgroup

Actual results:
Existing Openshift nodes do not have the SELinux boolean container_manage_cgroup  enabled.
New Openshift nodes have by default this SELinux boolean enabled.

Expected results:
Remove discrepancies between the nodes.

NB: Users should have the flexibility to enable or not this boolean in the ansible inventory host file since not all of the users are using systemd containers.

Comment 1 Scott Dodson 2018-08-21 14:52:44 UTC
Please manually set the boolean on existing nodes as a workaround.

Comment 2 Michael Gugino 2018-08-22 15:02:06 UTC
(In reply to Scott Dodson from comment #1)
> Please manually set the boolean on existing nodes as a workaround.

Why wouldn't we fix this?

Comment 3 Anshul Verma 2018-08-23 10:44:46 UTC
Hello Team,

We need to change the upgrade playbook to enable the "container_manage_cgroup" boolean.

Also, we should add in the documentation the remark that there is this bug going on currently.

Regards,
Anshul Verma

Comment 4 Scott Dodson 2018-08-23 12:20:48 UTC
(In reply to Michael Gugino from comment #2)
> (In reply to Scott Dodson from comment #1)
> > Please manually set the boolean on existing nodes as a workaround.
> 
> Why wouldn't we fix this?

Because we didn't break it and the problem can be introduced entirely outside of the installer. You `yum upgrade` your selinux policy and now your cluster is broken without any involvement of openshift-ansible.

If you have time to fix it in the upgrade go for it, please make sure it's addressed in 3.10 too.

Comment 6 Michael Gugino 2018-08-23 17:22:23 UTC
(In reply to Scott Dodson from comment #4)
> (In reply to Michael Gugino from comment #2)
> > (In reply to Scott Dodson from comment #1)
> > > Please manually set the boolean on existing nodes as a workaround.
> > 
> > Why wouldn't we fix this?
> 
> Because we didn't break it and the problem can be introduced entirely
> outside of the installer. You `yum upgrade` your selinux policy and now your
> cluster is broken without any involvement of openshift-ansible.
> 
> If you have time to fix it in the upgrade go for it, please make sure it's
> addressed in 3.10 too.

Yeah, we're in a tough spot.  This seems like one of those problems that we have to be quite reactive to as it's certainly nothing the users are doing to break themselves other than properly patching their hosts (which should be encouraged).

I will try to take this on.

Comment 7 Michael Gugino 2018-08-29 18:42:47 UTC
PR Created in master: https://github.com/openshift/openshift-ansible/pull/9824

Comment 8 Michael Gugino 2018-09-24 21:19:31 UTC
3.9 merged: https://github.com/openshift/openshift-ansible/pull/9832

Comment 9 Scott Dodson 2018-09-25 14:22:33 UTC
In openshift-ansible-3.9.42-1 and later

Comment 10 Weihua Meng 2018-10-16 08:40:07 UTC
fixed.
openshift-ansible-3.9.47-1.git.0.8180c87.el7.noarch

before upgrade
atomic-openshift version: v3.7.68
# getsebool -a | grep container_manage_cgroup
container_manage_cgroup --> off

after upgrade to 3.9
openshift v3.9.47
# getsebool container_manage_cgroup
container_manage_cgroup --> on
This value is consistent with v3.9 fresh install now.

Comment 16 errata-xmlrpc 2018-12-13 19:27:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3748