2114602 – Upgrade failing because restrictive scc is injected into version pod

Bug 2114602 - Upgrade failing because restrictive scc is injected into version pod

Summary: Upgrade failing because restrictive scc is injected into version pod

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.11.z
Assignee:	Over the Air Updates
QA Contact:	Yang Yang
Docs Contact:
URL:
Whiteboard:
Depends On:	2110590
Blocks:
TreeView+	depends on / blocked

Reported:	2022-08-02 22:17 UTC by OpenShift BugZilla Robot
Modified:	2022-08-23 15:11 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-23 15:10:51 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-version-operator pull 811	None	Merged	Bug 2114602: pkg/cvo/updatepayload: Set 'readOnlyRootFilesystem: false'	2022-10-12 06:15:57 UTC
Red Hat Issue Tracker	OCPBUGS-233	None	None	None	2022-08-18 15:19:06 UTC
Red Hat Product Errata	RHSA-2022:6103	None	None	None	2022-08-23 15:11:07 UTC

Comment 1 W. Trevor King 2022-08-02 22:20:42 UTC

We're really close to 4.11 GA, and this is a 4.10 issue, not a new-in-4.11 issue, so I'm punting it out to 4.11.z.

Comment 2 Yang Yang 2022-08-03 09:40:43 UTC

Verifying it before PR is merging.

1. Install a cluster using cluster-bot
   # oc get clusterversion
NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.ci.test-2022-08-03-082604-ci-ln-i9hxzi2-latest   True        False         24m     Cluster version is 4.11.0-0.ci.test-2022-08-03-082604-ci-ln-i9hxzi2-latest

2. Create a SCC
# cat << EOF | oc create -f -
> allowHostDirVolumePlugin: true
> allowHostIPC: false
> allowHostNetwork: false
> allowHostPID: false
> allowHostPorts: false
> allowPrivilegeEscalation: true
> allowPrivilegedContainer: true
> allowedCapabilities: []
> apiVersion: security.openshift.io/v1
> defaultAddCapabilities: []
> fsGroup:
>   type: MustRunAs
> groups: []
> kind: SecurityContextConstraints
> metadata:
>   annotations:
>     meta.helm.sh/release-name: azure-arc
>     meta.helm.sh/release-namespace: default
>   labels:
>     app.kubernetes.io/managed-by: Helm
>   name: kube-aad-proxy-scc
> priority: null
> readOnlyRootFilesystem: true
> requiredDropCapabilities: []
> runAsUser:
>   type: RunAsAny
> seLinuxContext:
>   type: MustRunAs
> supplementalGroups:
>   type: RunAsAny
> users:
> - system:serviceaccount:azure-arc:azure-arc-kube-aad-proxy-sa
> volumes:
> - configMap
> - hostPath
> - secret
> EOF
securitycontextconstraints.security.openshift.io/kube-aad-proxy-scc created

3. Upgrade the cluster
   # oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:2050173e8113faae2faadcff7b77346dab996705a68f2384fd5a2674c6e2a2ff --force --allow-explicit-upgrade
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Requesting update to release image registry.ci.openshift.org/ocp/release@sha256:2050173e8113faae2faadcff7b77346dab996705a68f2384fd5a2674c6e2a2ff

# oc get all
NAME                                            READY   STATUS      RESTARTS   AGE
pod/cluster-version-operator-68d8868586-gxgl5   1/1     Running     0          6s
pod/version--n4dqx-2vl6d                        0/1     Completed   0          23s

NAME                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/cluster-version-operator   ClusterIP   172.30.205.94   <none>        9099/TCP   59m

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cluster-version-operator   1/1     1            1           58m

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/cluster-version-operator-5c8fd57fc8   0         0         0       58m
replicaset.apps/cluster-version-operator-68d8868586   1         1         1       6s

NAME                       COMPLETIONS   DURATION   AGE
job.batch/version--n4dqx   1/1           14s        23s

# oc get pod/version--n4dqx-2vl6d -oyaml | grep scc
    openshift.io/scc: node-exporter

# oc adm upgrade 
info: An upgrade is in progress. Working towards 4.12.0-0.nightly-2022-08-01-151317: 105 of 802 done (13% complete)

warning: Cannot display available updates:
  Reason: NoChannel
  Message: The update channel has not been configured.

Upgrade is proceeded. Looks good to me.

Comment 3 Adrian 2022-08-15 10:34:49 UTC

Hello,

We have a new comment from this case-->03272159
---
This cluster has had this SCC installed since the initial install (4.8.13 back in November 2021), and has successfully upgraded without any issues through 4.8, 4.9 and to 4.10, until now. As the RCA has discovered, this is obviously a 4.10 bug, and a workaround or fix should be provided to enable the customer to upgrade. Not just when 4.12 lands, but also so that they can keep the platform up-to-date through 4.10 and 4.11.

Adjusting the custom SCC is not a viable workaround. This SCC has been deployed and configured by a 3rd party application (MS Azure Sentinel). This is used for monitoring of their Azure bound clusters, and hence a change will need to be raised with Microsoft. As their product is continuing to work as expected (and uses a pattern of least privilege that we recommend), it will be difficult to convince the customer to adjust this, to suit a bug on our part. It will also mean for any updates they will need to manually update the SCC and roll the change back afterwards. This is not a good experience.

Am raising an ACE to get some more eyes on this issue.
---

I see that the priority and severity are high, so is it possible to execute the backporting to 4.10?

Many thanks in advance,

Adrián.

Comment 6 Yang Yang 2022-08-18 03:20:04 UTC

Based on comment#2, moving it to verified state.

Comment 7 W. Trevor King 2022-08-18 03:35:46 UTC

We're considering a 4.10.z backport in [1].

[1]: https://issues.redhat.com//browse/OCPBUGS-233

Comment 9 errata-xmlrpc 2022-08-23 15:10:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.11.1 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6103

Note You need to log in before you can comment on or make changes to this bug.