Bug 1962687 - openshift-kube-storage-version-migrator pod failed due to Error: container has runAsNonRoot and image will run as root
Summary: openshift-kube-storage-version-migrator pod failed due to Error: container ha...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-storage-version-migrator
Version: 4.6.z
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Sergiusz Urbaniak
QA Contact: liyao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-20 13:35 UTC by Matt
Modified: 2021-07-27 23:09 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:09:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
update-history (39.21 KB, image/jpeg)
2021-05-20 13:35 UTC, Matt
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-storage-version-migrator-operator pull 58 0 None open Bug 1962687: images/ci/Dockerfile: specify non-root user 2021-05-21 14:56:33 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:09:55 UTC

Description Matt 2021-05-20 13:35:37 UTC
Created attachment 1785179 [details]
update-history

Description of problem:

openshift-kube-storage-version-migrator container failed to start and so when I performed `oc describe` on the pod it shows:

 - Successfully assigned openshift-kube-storage-version-migrator/migrator-5f77bc7f9-nqknj to 10.112.78.51
 - Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6c9edfc5c399afece4c206182974b0bb2b525b20a2886a36a91173355edaf954"
 - Error: container has runAsNonRoot and image will run as root

This is because when it was deployed, the deployed run via an SCC someone in the team had created. Upon further digging, it appears that it was unable to run under this SCC because it failed the runAsUser security context check:

"Pods which have specified neither runAsNonRoot nor runAsUser settings will be mutated to set runAsNonRoot=true, thus requiring a defined non-zero numeric USER directive in the container."

Which would imply there is no "defined non-zero numeric USER directive in the container".

So either the deployment/pod specification needs to state the container needs to run as root (uid=0) and then the correct SCC should be selected, or the container needs to provide a "non-zero numeric USER directive in the container".

Version-Release number of selected component (if applicable):

Our cluster has been impacted since somewhere between 4.5.24 and 4.6.23.

How reproducible:

Create an SCC priority >10 that permits "non-root" containers then deploy the migrator.

Steps to Reproduce:
1.
2.
3.

Actual results:

Pod never launched.

Expected results:

Pod and upgrade should run without fault

Additional info:

The attachment shows that our upgrades have been impacted since 4.5.24 - however this could be due to creation date of the SCC.

Comment 1 Sergiusz Urbaniak 2021-05-20 15:25:34 UTC
generally: do you have a case number so we can get the must-gather? that would be helpful in debugging.

> Create an SCC priority >10 that permits "non-root" containers then deploy the migrator.

Did you create an scc overriding the default restricted SCC?

Comment 2 Matt 2021-05-20 15:28:27 UTC
Hi, I don't have a case number as this isn't a client project. The SCC was a copy of "nonroot" but with priority 11.

Matt

Comment 3 Matt 2021-05-20 15:33:55 UTC
I believe this was the SCC used:

apiVersion: security.openshift.io/v1
kind: SecurityContextConstraints
metadata:
  annotations:
    kubernetes.io/description: "This policy allows pods to run with any UID and GID except root and prevents access to the host."
    "helm.sh/hook": pre-install
  name: {{ .Release.Name }}-nonroot-scc
allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegedContainer: false
allowPrivilegeEscalation: false
allowedCapabilities: []
allowedFlexVolumes: []
allowedUnsafeSysctls: []
defaultAddCapabilities: []
defaultAllowPrivilegeEscalation: false
forbiddenSysctls:
  - "*"
fsGroup:
  type: RunAsAny
readOnlyRootFilesystem: false
requiredDropCapabilities:
- ALL
runAsUser:
  type: MustRunAsNonRoot
# This can be customized for your host machine
seLinuxContext:
  type: RunAsAny
# seLinuxOptions:
#   level:
#   user:
#   role:
#   type:
supplementalGroups:
  type: RunAsAny
# This can be customized for your host machine
volumes:
- configMap
- downwardAPI
- emptyDir
- persistentVolumeClaim
- projected
- secret
# If you want a priority on your SCC -- set for a value more than 0
priority: 11
users:
- system:serviceaccount:{{ .Release.Namespace }}:{{ $fullName }}

Comment 4 Sergiusz Urbaniak 2021-05-20 15:55:32 UTC
thank you, i can reproduce this on a fresh cluster by

1. creating a copy of the builtin nonroot scc with priority 11 set and
2. simply deleting the existing openshift-kube-storage-version-migrator/migrator pod to enforce redployment

the new pod gets assigned to the nonroot-copy SCC:

```
$ kubectl -n openshift-kube-storage-version-migrator get pod migrator-788699c75c-tpgfx  -o yaml | grep scc
    openshift.io/scc: nonroot-copy
```

I need to find out why this deployment is affected specifically.

Comment 5 Matt 2021-05-20 16:04:14 UTC
Great. Did the pod fail to start with the same error message?

The thing I don't understand is why the serviceAccount of migrator is being picked up by this SCC, given that we specified `users` on it.

Comment 6 Sergiusz Urbaniak 2021-05-20 16:14:12 UTC
yes, it's the same error, i need to investigate, using a clean copy with `users: []` it picks that one up as well.

assigning to work on this one this sprint.

Comment 7 Sergiusz Urbaniak 2021-05-21 15:02:32 UTC
I still need to dive deeper here, but the migrator (and the operator too) are matched against the elevated nonroot SCC. This causes their pod spec to be modified and `runAsNonRoot: true` being added to the security context pod spec. Since no stable user ID is specified, the pods fail to launch.

Other workload images which have a nonroot user specified in the docker image are not affected (random example is the openshift-marketplace/certified-operators pod) because the have a nonroot user specified in the image already.

I still want to understand why the openshift workload is being matched against the elevated SCC at all. Optimally we should not permit mutating existing core payload by user SCC changes.

Comment 9 liyao 2021-06-07 10:53:39 UTC
Tested in fresh cluster 4.8.0-0.nightly-2021-06-06-164529, new deployment pod status is expected.
1. create nonroot SCC with priority 11 set using the YAML in comment 3(replace the scc name value with test-nonroot-scc and users value with serviceaccount:default)

2. delete existing openshift-kube-storage-version-migrator/migrator pod
$ oc delete pod migrator-55f7fdd8c8-5shks -n openshift-kube-storage-version-migrator
pod "migrator-55f7fdd8c8-5shks" deleted

3. check redeployment pod status
$ oc get pods -n openshift-kube-storage-version-migrator
NAME                        READY   STATUS    RESTARTS   AGE
migrator-55f7fdd8c8-zgrjr   1/1     Running   0          19s

4. check scc assigned to the new pod
$ oc get pods migrator-55f7fdd8c8-zgrjr -n openshift-kube-storage-version-migrator -o yaml | grep scc
    openshift.io/scc: test-nonroot-scc

Comment 12 errata-xmlrpc 2021-07-27 23:09:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.