1619672 – After upgrade to OCP 3.9, existing Openshift nodes do not have the SELinux boolean container_manage_cgroup enabled, but new nodes added to the cluster have it enabled

Bug 1619672 - After upgrade to OCP 3.9, existing Openshift nodes do not have the SELinux boolean container_manage_cgroup enabled, but new nodes added to the cluster have it enabled

Summary: After upgrade to OCP 3.9, existing Openshift nodes do not have the SELinux bo...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	3.9.0
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.9.z
Assignee:	Michael Gugino
QA Contact:	Weihua Meng
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-08-21 12:46 UTC by Sylvain Chen
Modified:	2022-03-13 15:26 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-12-13 19:27:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:3748	0	None	None	None	2018-12-13 19:27:21 UTC

Description Sylvain Chen 2018-08-21 12:46:08 UTC

Description of problem:
From the upgrade from 3.7 to 3.9, the SELinux boolean container_manage_cgroup will not be enabled for the existing nodes. However, when we scale up the cluster, this boolean will be activated by default by the openshift node playbook.

Version-Release number of selected component (if applicable):
3.9.40

How reproducible:
On RHEL 7.4/7.5

Steps to Reproduce:
0. Check SELinux boolean container_manage_cgroup, it should not be enabled by default on all the nodes
1. Upgrade cluster from OCP 3.7 to 3.9
2. Scale up cluster
3. Check SELinux boolean container_manage_cgroup

Actual results:
Existing Openshift nodes do not have the SELinux boolean container_manage_cgroup  enabled.
New Openshift nodes have by default this SELinux boolean enabled.

Expected results:
Remove discrepancies between the nodes.

NB: Users should have the flexibility to enable or not this boolean in the ansible inventory host file since not all of the users are using systemd containers.

Comment 1 Scott Dodson 2018-08-21 14:52:44 UTC

Please manually set the boolean on existing nodes as a workaround.

Comment 2 Michael Gugino 2018-08-22 15:02:06 UTC

(In reply to Scott Dodson from comment #1)
> Please manually set the boolean on existing nodes as a workaround.

Why wouldn't we fix this?

Comment 3 Anshul Verma 2018-08-23 10:44:46 UTC

Hello Team,

We need to change the upgrade playbook to enable the "container_manage_cgroup" boolean.

Also, we should add in the documentation the remark that there is this bug going on currently.

Regards,
Anshul Verma

Comment 4 Scott Dodson 2018-08-23 12:20:48 UTC

(In reply to Michael Gugino from comment #2)
> (In reply to Scott Dodson from comment #1)
> > Please manually set the boolean on existing nodes as a workaround.
> 
> Why wouldn't we fix this?

Because we didn't break it and the problem can be introduced entirely outside of the installer. You `yum upgrade` your selinux policy and now your cluster is broken without any involvement of openshift-ansible.

If you have time to fix it in the upgrade go for it, please make sure it's addressed in 3.10 too.

Comment 6 Michael Gugino 2018-08-23 17:22:23 UTC

(In reply to Scott Dodson from comment #4)
> (In reply to Michael Gugino from comment #2)
> > (In reply to Scott Dodson from comment #1)
> > > Please manually set the boolean on existing nodes as a workaround.
> > 
> > Why wouldn't we fix this?
> 
> Because we didn't break it and the problem can be introduced entirely
> outside of the installer. You `yum upgrade` your selinux policy and now your
> cluster is broken without any involvement of openshift-ansible.
> 
> If you have time to fix it in the upgrade go for it, please make sure it's
> addressed in 3.10 too.

Yeah, we're in a tough spot.  This seems like one of those problems that we have to be quite reactive to as it's certainly nothing the users are doing to break themselves other than properly patching their hosts (which should be encouraged).

I will try to take this on.

Comment 7 Michael Gugino 2018-08-29 18:42:47 UTC

PR Created in master: https://github.com/openshift/openshift-ansible/pull/9824

Comment 8 Michael Gugino 2018-09-24 21:19:31 UTC

3.9 merged: https://github.com/openshift/openshift-ansible/pull/9832

Comment 9 Scott Dodson 2018-09-25 14:22:33 UTC

In openshift-ansible-3.9.42-1 and later

Comment 10 Weihua Meng 2018-10-16 08:40:07 UTC

fixed.
openshift-ansible-3.9.47-1.git.0.8180c87.el7.noarch

before upgrade
atomic-openshift version: v3.7.68
# getsebool -a | grep container_manage_cgroup
container_manage_cgroup --> off

after upgrade to 3.9
openshift v3.9.47
# getsebool container_manage_cgroup
container_manage_cgroup --> on
This value is consistent with v3.9 fresh install now.

Comment 16 errata-xmlrpc 2018-12-13 19:27:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3748

Note You need to log in before you can comment on or make changes to this bug.