1942725 – [SCC] openshift-apiserver degraded when creating new pod after installing Stackrox which creates a less privileged SCC [4.8]

Bug 1942725 - [SCC] openshift-apiserver degraded when creating new pod after installing Stackrox which creates a less privileged SCC [4.8]

Summary: [SCC] openshift-apiserver degraded when creating new pod after installing Sta...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	openshift-apiserver
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Standa Laznicka
QA Contact:	Xingxing Xia
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1942552 1942744 (view as bug list)
Depends On:
Blocks:	1955502
TreeView+	depends on / blocked

Reported:	2021-03-24 19:30 UTC by oarribas
Modified:	2024-06-14 01:00 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Custom SCCs may have higher priority than the ones in the default set, which may lead to these being matched to openshift-apiserver pods and break their ability to write in their root filesystem. Consequence: This might lead to an outage of some of the OpenShift APIs. Fix: Explicitly mention in the openshift-apiserver pods that the root filesystem should be writable. Result: Custom SCCs should not prevent openshift-apiserver pods from running.
Clone Of:
Environment:
Last Closed:	2021-07-27 22:55:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-openshift-apiserver-operator pull 437	0	None	open	Bug 1942725: explicitly allow apiserver pods to write to their root FS	2021-04-14 13:01:59 UTC
Red Hat Bugzilla	1824800	1	high	CLOSED	openshift authentication operator is in a crashbackoffloop	2024-12-20 19:02:37 UTC
Red Hat Knowledge Base (Solution)	5911951	0	None	None	None	2021-04-08 06:56:55 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 22:55:47 UTC

Internal Links: 1960680

Description oarribas 2021-03-24 19:30:00 UTC

Description of problem:

openshift-apiserver pod in CrashLoopBackOff state after recreated. Stackrox 3.0.55.0 was installed in the cluster one wee ago.

openshift-apiserver logs
~~~
2021-03-24T00:00:00.000000000Z Copying system trust bundle
2021-03-24T00:00:00.000000000Z cp: 2021-03-24T00:00:00.000000000Z cannot remove '/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem'2021-03-24T00:00:00.000000000Z : Read-only file system
~~~

Found similar bugzilla, for the authentication-operator [1].


I have seen that in OCP 4.6, there is no scc in the openshift-apiserver pods:
~~~
$ oc version
Client Version: 4.6.19
Server Version: 4.6.19
Kubernetes Version: v1.19.0+8d12420

$ oc get pods -n openshift-apiserver
NAME                         READY   STATUS    RESTARTS   AGE
apiserver-78f7976f7c-2mvjq   2/2     Running   0          27h
apiserver-78f7976f7c-8wznp   2/2     Running   0          27h
apiserver-78f7976f7c-mmwlg   2/2     Running   0          27h

$ oc get pod apiserver-78f7976f7c-8wznp -n openshift-apiserver -o yaml | grep scc | wc -l
0
~~~

But in OCP 4.7, the `node-exporter` scc is in the openshift-apiserver pods:
~~~
$ oc version
Client Version: 4.7.3
Server Version: 4.7.3
Kubernetes Version: v1.20.0+bafe72f

$ oc get pods -n openshift-apiserver
NAME                         READY   STATUS    RESTARTS   AGE
apiserver-5d48bfb684-dhzbg   2/2     Running   0          3h15m
apiserver-5d48bfb684-lfd4k   2/2     Running   0          3h14m
apiserver-5d48bfb684-r4h5w   2/2     Running   0          3h12m

$ oc get pod apiserver-5d48bfb684-dhzbg -n openshift-apiserver -o yaml | grep scc
    openshift.io/scc: node-exporter
~~~

In the failing pod, the scc is `openshift.io/scc: collector`:
~~~
$ oc get pod apiserver-5d48bfb684-lnbnr -n openshift-apiserver -o yaml | grep scc
    openshift.io/scc: collector
~~~



Version-Release number of selected component (if applicable):

OCP 4.7


How reproducible:
Always after the scc is created, and a openshift-apiserver pod is recreated.


Steps to Reproduce:
1. Create the collector scc.
2. Check the scc in the openshift-apiserver pods.
3. Delete one of the openshift-apiserver pods.
4. Check the status and the scc of the new pod.


Actual results:

Pod in CrashLoopBackOff due to the scc change.


Expected results:

No changes in the scc used by OpenShift internal pods.


Additional info:

Maybe other pods affected by the same scc.



https://bugzilla.redhat.com/show_bug.cgi?id=1824800

Comment 2 Robert Bohne 2021-03-26 11:10:45 UTC

*** Bug 1942744 has been marked as a duplicate of this bug. ***

Comment 5 John Call 2021-04-09 18:02:28 UTC

The resolution described in https://access.redhat.com/solutions/5911951 is only a workaround.  It's not a long-term fix.  Deleting the SCC allows the `apiserver` pod to start, but recreating the SCC will prevent future `apiserver` pods from starting.  Future pods would be started (with the collector SCC) if a Control Plane node failed or if the apiserver operator is upgraded, etc...


$ oc get pods
NAME                         READY   STATUS             RESTARTS   AGE
apiserver-85d7bdf578-jms2d   2/2     Running            0          3d22h
apiserver-85d7bdf578-ndpf8   2/2     Running            0          3d22h
apiserver-fd5cf6b66-6bsp7    0/2     CrashLoopBackOff   874        36h

$ oc get pods/apiserver-fd5cf6b66-6bsp7 -o  yaml | grep -i scc
    openshift.io/scc: collector

$ oc get pods -o  yaml | grep -i scc
      openshift.io/scc: node-exporter
      openshift.io/scc: node-exporter
      openshift.io/scc: collector

$ oc get scc/collector -o yaml > stackrox_scc_collector.yaml

$ oc delete scc/collector
securitycontextconstraints.security.openshift.io "collector" deleted

$ oc delete pod/apiserver-fd5cf6b66-6bsp7
pod "apiserver-fd5cf6b66-6bsp7" deleted

$ oc get pods
NAME                        READY   STATUS    RESTARTS   AGE
apiserver-fd5cf6b66-42wfv   2/2     Running   0          3m15s
apiserver-fd5cf6b66-6kcc7   2/2     Running   0          104s
apiserver-fd5cf6b66-qdrxw   2/2     Running   0          2m57s

$ oc get pods -o yaml | grep scc        ???? CRAZY, WHY USE NVIDIA SCC ????
      openshift.io/scc: nvidia-dcgm-exporter
      openshift.io/scc: nvidia-dcgm-exporter
      openshift.io/scc: nvidia-dcgm-exporter

$ sed -i '/creationTimestamp:/d; /generation:/d; /resourceVersion:/d; /selfLink:/d; /uid:/d' stackrox_scc_collector.yaml

$ oc apply -f stackrox_scc_collector.yaml
securitycontextconstraints.security.openshift.io/collector created

$ oc delete pod/apiserver-fd5cf6b66-42wfv
pod "apiserver-fd5cf6b66-42wfv" deleted

$ oc get pods
NAME                        READY   STATUS             RESTARTS   AGE
apiserver-fd5cf6b66-6kcc7   2/2     Running            0          11m
apiserver-fd5cf6b66-7mj6w   0/2     CrashLoopBackOff   6          2m27s
apiserver-fd5cf6b66-qdrxw   2/2     Running            0          12m

$ oc get pods -o yaml | grep scc
      openshift.io/scc: nvidia-dcgm-exporter
      openshift.io/scc: collector
      openshift.io/scc: nvidia-dcgm-exporter

$ oc get scc/node-exporter scc/nvidia-dcgm-exporter scc/collector
NAME                   PRIV   CAPS         SELINUX    RUNASUSER   FSGROUP    SUPGROUP   PRIORITY     READONLYROOTFS   VOLUMES
node-exporter          true   <no value>   RunAsAny   RunAsAny    RunAsAny   RunAsAny   <no value>   false            ["*"]
nvidia-dcgm-exporter   true   ["*"]        RunAsAny   RunAsAny    RunAsAny   RunAsAny   <no value>   false            ["*"]
collector              true   []           RunAsAny   RunAsAny    RunAsAny   RunAsAny   0            true             ["configMap","downwardAPI","emptyDir","hostPath","secret"]

Comment 6 John Call 2021-04-09 18:06:22 UTC

It is very odd to me that the re-created apiserver pods decided to attach to the "nvidia-dcgm-exporter" SCC

Comment 12 Standa Laznicka 2021-04-15 11:45:47 UTC

*** Bug 1942552 has been marked as a duplicate of this bug. ***

Comment 13 Neil Carpenter 2021-05-04 20:26:17 UTC

'It is very odd to me that the re-created apiserver pods decided to attach to the "nvidia-dcgm-exporter" SCC'

That's not odd at all -- SCC assignment is working as designed, although there's one caveat that I'm not sure is clearly documented:  if a pod is deployed by a clusteradmin, the admission controller evaluates all SCCs for that pod (and not just those to which the deploying user & pod serviceaccount are assigned).

When evaluating SCCs, the admission controller runs through them by priority.  A priority of 'nil' is equal to a priority of 0, which is the highest priority; therefore, all of these SCCs are at the top of the list.  From there, they're evaluated from most restrictive to least restrictive until an SCC matches the requests in the pod's SecurityContext and applies the first one that matches.

The root cause here is that the API server's security context specifies it needs to be privileged but it does _not_ specify that it needs a read/write root file system.  So, if the StackRox SCC is in place, that's the most restrictive, priority 0 SCC and it gets applied.  Later, when the API server tries to write something, it fails and bad things happen.

With the StackRox SCC absent, the next most restrictive SCC that matches the request (privileged: true) is the nvidia-dcgm-exporter.

Comment 14 Xingxing Xia 2021-05-06 10:05:11 UTC

Sorry was not able to afford time on this, was busy on other tasks & various KAS & OAS bugs verification & filing.

Today tested in 4.8.0-0.nightly-2021-05-06-003426:
In fresh env, checked OAS pods:
$ oc get po apiserver-9998f75b9-bbngm -n openshift-apiserver -o yaml
metadata:
  annotations:
    ...
    openshift.io/scc: node-exporter
...
  containers:
  - args:
    ...
    name: openshift-apiserver
    ...
    securityContext:
      privileged: true
      readOnlyRootFilesystem: false
...

OAS container SC now has readOnlyRootFilesystem. Pods are applied with SCC node-exporter because it is matched. "readOnlyRootFilesystem: false" can be seen in `oc get scc node-exporter -o yaml`.

Creating above collector SCC, then deleting OAS pod:
$ oc delete po apiserver-9998f75b9-bbngm -n openshift-apiserver
Checking new pod, it is Running, its yaml shows it still uses SCC node-exporter:
$ oc get po -n openshift-apiserver
apiserver-9998f75b9-md5r6   2/2     Running   0          92s
...

(In reply to Neil Carpenter from comment #13)
> I'm not sure is clearly documented
Yeah, not clearly documented. There was a bug 1830392 which had doc PR still open.

Comment 18 errata-xmlrpc 2021-07-27 22:55:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.