Bug 1960680 - [SCC] openshift-apiserver degraded when a SCC with high priority is created [NEEDINFO]
Summary: [SCC] openshift-apiserver degraded when a SCC with high priority is created
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: openshift-apiserver
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.9.0
Assignee: Sergiusz Urbaniak
QA Contact: Xingxing Xia
URL:
Whiteboard: EmergencyRequest
: 1968511 (view as bug list)
Depends On: 1955502
Blocks: 1996044
TreeView+ depends on / blocked
 
Reported: 2021-05-14 14:56 UTC by oarribas
Modified: 2021-10-18 17:31 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1996051 (view as bug list)
Environment:
Last Closed: 2021-10-18 17:31:06 UTC
Target Upstream Version:
Embargoed:
mfojtik: needinfo?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-openshift-apiserver-operator pull 465 0 None None None 2021-08-20 09:17:05 UTC
Red Hat Bugzilla 1942725 1 high CLOSED [SCC] openshift-apiserver degraded when creating new pod after installing Stackrox which creates a less privileged SCC [... 2021-07-27 22:55:47 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:31:31 UTC

Description oarribas 2021-05-14 14:56:44 UTC
Description of problem:

openshift-apiserver pod in Pending or CrashLoopBackOff state after recreated. An SCC with priority 0 was in place.
~~~
$ oc get pods -n openshift-apiserver -o yaml | grep -i scc
      openshift.io/scc: k10-k10
      openshift.io/scc: k10-k10
      openshift.io/scc: k10-k10
~~~

The error shown:
~~~
      waiting:
        message: 'container has runAsNonRoot and image will run as root (pod: "apiserver-6599bb4956-4kz4s_openshift-apiserver(d8a8ba2f-3b71-4c7d-bcd0-cd8adc21f1f4)",
          container: fix-audit-permissions)'
        reason: CreateContainerConfigError
  phase: Pending
~~~

The SCC main content:
~~~
k10-k10.yaml


allowHostDirVolumePlugin: true
allowHostIPC: true
allowHostNetwork: true
allowHostPID: true
allowHostPorts: true
allowPrivilegeEscalation: true
allowPrivilegedContainer: true
allowedCapabilities: []
apiVersion: security.openshift.io/v1
defaultAddCapabilities: []
fsGroup:
  type: RunAsAny
groups: []
kind: SecurityContextConstraints
metadata:
[...]
[...]
priority: 0
readOnlyRootFilesystem: false
requiredDropCapabilities:
- CHOWN
- KILL
- MKNOD
- SETUID
- SETGID
runAsUser:
  type: MustRunAsNonRoot
seLinuxContext:
  type: RunAsAny
supplementalGroups:
  type: RunAsAny
users:
- system:serviceaccount:k10:k10-k10
- system:serviceaccount:k10:k10-k10
volumes:
- '*'
~~~



Version-Release number of selected component (if applicable):

OCP 4.7


How reproducible:

Always


Steps to Reproduce:
1. Create the k10-k10 SCC
2. Restart an apiserver pod


Actual results:

apiserver pods in Pending or CrashLoopBackOff state.


Expected results:

apiserver pods not affected by SCC with high priority


Additional info:


Related to BZ 1824800 and BZ 1942725

Comment 3 Michal Fojtik 2021-08-20 06:45:29 UTC
** A NOTE ABOUT USING URGENT **

This BZ has been set to urgent severity and priority. When a BZ is marked urgent priority Engineers are asked to stop whatever they are doing, putting everything else on hold.
Please be prepared to have reasonable justification ready to discuss, and ensure your own and engineering management are aware and agree this BZ is urgent. Keep in mind, urgent bugs are very expensive and have maximal management visibility.

NOTE: This bug was automatically assigned to an engineering manager with the severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity.

** INFORMATION REQUIRED **

Please answer these questions before escalation to engineering:

1. Has a link to must-gather output been provided in this BZ? We cannot work without. If must-gather fails to run, attach all relevant logs and provide the error message of must-gather.
2. Give the output of "oc get clusteroperators -o yaml".
3. In case of degraded/unavailable operators, have all their logs and the logs of the operands been analyzed [yes/no]
4. List the top 5 relevant errors from the logs of the operators and operands in (3).
5. Order the list of degraded/unavailable operators according to which is likely the cause of the failure of the other, root-cause at the top.
6. Explain why (5) is likely the right order and list the information used for that assessment.
7. Explain why Engineering is necessary to make progress.

Comment 9 Xingxing Xia 2021-08-23 04:13:57 UTC
Tested in 4.9.0-0.nightly-2021-08-22-070405:
Create above k10-k10 SCC. Then delete one pod under openshift-apiserver project. Check the new created pod, it can be Running.
Check YAML of new pod under openshift-apiserver project, it uses system scc:
$ oc get po apiserver-576b474fb5-fx49h -n openshift-apiserver -o yaml | grep scc
    openshift.io/scc: node-exporter
Check YAML of all pods under openshift-apiserver project, they set 'runAsUser: 0' for containers 'openshift-apiserver' and 'fix-audit-permissions'

Comment 11 Sergiusz Urbaniak 2021-09-21 10:37:40 UTC
*** Bug 1968511 has been marked as a duplicate of this bug. ***

Comment 14 errata-xmlrpc 2021-10-18 17:31:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.