Hide Forgot
Description of problem: openshift-apiserver pod in CrashLoopBackOff state after recreated. Stackrox 3.0.55.0 was installed in the cluster one wee ago. openshift-apiserver logs ~~~ 2021-03-24T00:00:00.000000000Z Copying system trust bundle 2021-03-24T00:00:00.000000000Z cp: 2021-03-24T00:00:00.000000000Z cannot remove '/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem'2021-03-24T00:00:00.000000000Z : Read-only file system ~~~ Found similar bugzilla, for the authentication-operator [1]. I have seen that in OCP 4.6, there is no scc in the openshift-apiserver pods: ~~~ $ oc version Client Version: 4.6.19 Server Version: 4.6.19 Kubernetes Version: v1.19.0+8d12420 $ oc get pods -n openshift-apiserver NAME READY STATUS RESTARTS AGE apiserver-78f7976f7c-2mvjq 2/2 Running 0 27h apiserver-78f7976f7c-8wznp 2/2 Running 0 27h apiserver-78f7976f7c-mmwlg 2/2 Running 0 27h $ oc get pod apiserver-78f7976f7c-8wznp -n openshift-apiserver -o yaml | grep scc | wc -l 0 ~~~ But in OCP 4.7, the `node-exporter` scc is in the openshift-apiserver pods: ~~~ $ oc version Client Version: 4.7.3 Server Version: 4.7.3 Kubernetes Version: v1.20.0+bafe72f $ oc get pods -n openshift-apiserver NAME READY STATUS RESTARTS AGE apiserver-5d48bfb684-dhzbg 2/2 Running 0 3h15m apiserver-5d48bfb684-lfd4k 2/2 Running 0 3h14m apiserver-5d48bfb684-r4h5w 2/2 Running 0 3h12m $ oc get pod apiserver-5d48bfb684-dhzbg -n openshift-apiserver -o yaml | grep scc openshift.io/scc: node-exporter ~~~ In the failing pod, the scc is `openshift.io/scc: collector`: ~~~ $ oc get pod apiserver-5d48bfb684-lnbnr -n openshift-apiserver -o yaml | grep scc openshift.io/scc: collector ~~~ Version-Release number of selected component (if applicable): OCP 4.7 How reproducible: Always after the scc is created, and a openshift-apiserver pod is recreated. Steps to Reproduce: 1. Create the collector scc. 2. Check the scc in the openshift-apiserver pods. 3. Delete one of the openshift-apiserver pods. 4. Check the status and the scc of the new pod. Actual results: Pod in CrashLoopBackOff due to the scc change. Expected results: No changes in the scc used by OpenShift internal pods. Additional info: Maybe other pods affected by the same scc. https://bugzilla.redhat.com/show_bug.cgi?id=1824800
*** Bug 1942744 has been marked as a duplicate of this bug. ***
The resolution described in https://access.redhat.com/solutions/5911951 is only a workaround. It's not a long-term fix. Deleting the SCC allows the `apiserver` pod to start, but recreating the SCC will prevent future `apiserver` pods from starting. Future pods would be started (with the collector SCC) if a Control Plane node failed or if the apiserver operator is upgraded, etc... $ oc get pods NAME READY STATUS RESTARTS AGE apiserver-85d7bdf578-jms2d 2/2 Running 0 3d22h apiserver-85d7bdf578-ndpf8 2/2 Running 0 3d22h apiserver-fd5cf6b66-6bsp7 0/2 CrashLoopBackOff 874 36h $ oc get pods/apiserver-fd5cf6b66-6bsp7 -o yaml | grep -i scc openshift.io/scc: collector $ oc get pods -o yaml | grep -i scc openshift.io/scc: node-exporter openshift.io/scc: node-exporter openshift.io/scc: collector $ oc get scc/collector -o yaml > stackrox_scc_collector.yaml $ oc delete scc/collector securitycontextconstraints.security.openshift.io "collector" deleted $ oc delete pod/apiserver-fd5cf6b66-6bsp7 pod "apiserver-fd5cf6b66-6bsp7" deleted $ oc get pods NAME READY STATUS RESTARTS AGE apiserver-fd5cf6b66-42wfv 2/2 Running 0 3m15s apiserver-fd5cf6b66-6kcc7 2/2 Running 0 104s apiserver-fd5cf6b66-qdrxw 2/2 Running 0 2m57s $ oc get pods -o yaml | grep scc ???? CRAZY, WHY USE NVIDIA SCC ???? openshift.io/scc: nvidia-dcgm-exporter openshift.io/scc: nvidia-dcgm-exporter openshift.io/scc: nvidia-dcgm-exporter $ sed -i '/creationTimestamp:/d; /generation:/d; /resourceVersion:/d; /selfLink:/d; /uid:/d' stackrox_scc_collector.yaml $ oc apply -f stackrox_scc_collector.yaml securitycontextconstraints.security.openshift.io/collector created $ oc delete pod/apiserver-fd5cf6b66-42wfv pod "apiserver-fd5cf6b66-42wfv" deleted $ oc get pods NAME READY STATUS RESTARTS AGE apiserver-fd5cf6b66-6kcc7 2/2 Running 0 11m apiserver-fd5cf6b66-7mj6w 0/2 CrashLoopBackOff 6 2m27s apiserver-fd5cf6b66-qdrxw 2/2 Running 0 12m $ oc get pods -o yaml | grep scc openshift.io/scc: nvidia-dcgm-exporter openshift.io/scc: collector openshift.io/scc: nvidia-dcgm-exporter $ oc get scc/node-exporter scc/nvidia-dcgm-exporter scc/collector NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP PRIORITY READONLYROOTFS VOLUMES node-exporter true <no value> RunAsAny RunAsAny RunAsAny RunAsAny <no value> false ["*"] nvidia-dcgm-exporter true ["*"] RunAsAny RunAsAny RunAsAny RunAsAny <no value> false ["*"] collector true [] RunAsAny RunAsAny RunAsAny RunAsAny 0 true ["configMap","downwardAPI","emptyDir","hostPath","secret"]
It is very odd to me that the re-created apiserver pods decided to attach to the "nvidia-dcgm-exporter" SCC
*** Bug 1942552 has been marked as a duplicate of this bug. ***
'It is very odd to me that the re-created apiserver pods decided to attach to the "nvidia-dcgm-exporter" SCC' That's not odd at all -- SCC assignment is working as designed, although there's one caveat that I'm not sure is clearly documented: if a pod is deployed by a clusteradmin, the admission controller evaluates all SCCs for that pod (and not just those to which the deploying user & pod serviceaccount are assigned). When evaluating SCCs, the admission controller runs through them by priority. A priority of 'nil' is equal to a priority of 0, which is the highest priority; therefore, all of these SCCs are at the top of the list. From there, they're evaluated from most restrictive to least restrictive until an SCC matches the requests in the pod's SecurityContext and applies the first one that matches. The root cause here is that the API server's security context specifies it needs to be privileged but it does _not_ specify that it needs a read/write root file system. So, if the StackRox SCC is in place, that's the most restrictive, priority 0 SCC and it gets applied. Later, when the API server tries to write something, it fails and bad things happen. With the StackRox SCC absent, the next most restrictive SCC that matches the request (privileged: true) is the nvidia-dcgm-exporter.
Sorry was not able to afford time on this, was busy on other tasks & various KAS & OAS bugs verification & filing. Today tested in 4.8.0-0.nightly-2021-05-06-003426: In fresh env, checked OAS pods: $ oc get po apiserver-9998f75b9-bbngm -n openshift-apiserver -o yaml metadata: annotations: ... openshift.io/scc: node-exporter ... containers: - args: ... name: openshift-apiserver ... securityContext: privileged: true readOnlyRootFilesystem: false ... OAS container SC now has readOnlyRootFilesystem. Pods are applied with SCC node-exporter because it is matched. "readOnlyRootFilesystem: false" can be seen in `oc get scc node-exporter -o yaml`. Creating above collector SCC, then deleting OAS pod: $ oc delete po apiserver-9998f75b9-bbngm -n openshift-apiserver Checking new pod, it is Running, its yaml shows it still uses SCC node-exporter: $ oc get po -n openshift-apiserver apiserver-9998f75b9-md5r6 2/2 Running 0 92s ... (In reply to Neil Carpenter from comment #13) > I'm not sure is clearly documented Yeah, not clearly documented. There was a bug 1830392 which had doc PR still open.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438