Bug 2114602
| Summary: | Upgrade failing because restrictive scc is injected into version pod | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
| Component: | Cluster Version Operator | Assignee: | Over the Air Updates <aos-team-ota> |
| Status: | CLOSED ERRATA | QA Contact: | Yang Yang <yanyang> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.10 | CC: | acandelp, aos-team-ota, dhawker, gparente, lmohanty, oarribas, pkhaire, sdodson, wking, yanyang |
| Target Milestone: | --- | ||
| Target Release: | 4.11.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-23 15:10:51 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2110590 | ||
| Bug Blocks: | |||
|
Comment 1
W. Trevor King
2022-08-02 22:20:42 UTC
Verifying it before PR is merging.
1. Install a cluster using cluster-bot
# oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.ci.test-2022-08-03-082604-ci-ln-i9hxzi2-latest True False 24m Cluster version is 4.11.0-0.ci.test-2022-08-03-082604-ci-ln-i9hxzi2-latest
2. Create a SCC
# cat << EOF | oc create -f -
> allowHostDirVolumePlugin: true
> allowHostIPC: false
> allowHostNetwork: false
> allowHostPID: false
> allowHostPorts: false
> allowPrivilegeEscalation: true
> allowPrivilegedContainer: true
> allowedCapabilities: []
> apiVersion: security.openshift.io/v1
> defaultAddCapabilities: []
> fsGroup:
> type: MustRunAs
> groups: []
> kind: SecurityContextConstraints
> metadata:
> annotations:
> meta.helm.sh/release-name: azure-arc
> meta.helm.sh/release-namespace: default
> labels:
> app.kubernetes.io/managed-by: Helm
> name: kube-aad-proxy-scc
> priority: null
> readOnlyRootFilesystem: true
> requiredDropCapabilities: []
> runAsUser:
> type: RunAsAny
> seLinuxContext:
> type: MustRunAs
> supplementalGroups:
> type: RunAsAny
> users:
> - system:serviceaccount:azure-arc:azure-arc-kube-aad-proxy-sa
> volumes:
> - configMap
> - hostPath
> - secret
> EOF
securitycontextconstraints.security.openshift.io/kube-aad-proxy-scc created
3. Upgrade the cluster
# oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:2050173e8113faae2faadcff7b77346dab996705a68f2384fd5a2674c6e2a2ff --force --allow-explicit-upgrade
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Requesting update to release image registry.ci.openshift.org/ocp/release@sha256:2050173e8113faae2faadcff7b77346dab996705a68f2384fd5a2674c6e2a2ff
# oc get all
NAME READY STATUS RESTARTS AGE
pod/cluster-version-operator-68d8868586-gxgl5 1/1 Running 0 6s
pod/version--n4dqx-2vl6d 0/1 Completed 0 23s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cluster-version-operator ClusterIP 172.30.205.94 <none> 9099/TCP 59m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cluster-version-operator 1/1 1 1 58m
NAME DESIRED CURRENT READY AGE
replicaset.apps/cluster-version-operator-5c8fd57fc8 0 0 0 58m
replicaset.apps/cluster-version-operator-68d8868586 1 1 1 6s
NAME COMPLETIONS DURATION AGE
job.batch/version--n4dqx 1/1 14s 23s
# oc get pod/version--n4dqx-2vl6d -oyaml | grep scc
openshift.io/scc: node-exporter
# oc adm upgrade
info: An upgrade is in progress. Working towards 4.12.0-0.nightly-2022-08-01-151317: 105 of 802 done (13% complete)
warning: Cannot display available updates:
Reason: NoChannel
Message: The update channel has not been configured.
Upgrade is proceeded. Looks good to me.
Hello, We have a new comment from this case-->03272159 --- This cluster has had this SCC installed since the initial install (4.8.13 back in November 2021), and has successfully upgraded without any issues through 4.8, 4.9 and to 4.10, until now. As the RCA has discovered, this is obviously a 4.10 bug, and a workaround or fix should be provided to enable the customer to upgrade. Not just when 4.12 lands, but also so that they can keep the platform up-to-date through 4.10 and 4.11. Adjusting the custom SCC is not a viable workaround. This SCC has been deployed and configured by a 3rd party application (MS Azure Sentinel). This is used for monitoring of their Azure bound clusters, and hence a change will need to be raised with Microsoft. As their product is continuing to work as expected (and uses a pattern of least privilege that we recommend), it will be difficult to convince the customer to adjust this, to suit a bug on our part. It will also mean for any updates they will need to manually update the SCC and roll the change back afterwards. This is not a good experience. Am raising an ACE to get some more eyes on this issue. --- I see that the priority and severity are high, so is it possible to execute the backporting to 4.10? Many thanks in advance, Adrián. We're considering a 4.10.z backport in [1]. [1]: https://issues.redhat.com//browse/OCPBUGS-233 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.11.1 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6103 |