2110590 – Upgrade failing because restrictive scc is injected into version pod

Bug 2110590 - Upgrade failing because restrictive scc is injected into version pod

Summary: Upgrade failing because restrictive scc is injected into version pod

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.12.0
Assignee:	W. Trevor King
QA Contact:	Yang Yang
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2108631 (view as bug list)
Depends On:
Blocks:	2114602
TreeView+	depends on / blocked

Reported:	2022-07-25 16:54 UTC by Gabriel Meghnagi
Modified:	2023-09-18 04:42 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-01-17 19:53:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-version-operator pull 807	None	open	Bug 2110590: pkg/cvo/updatepayload: Set 'readOnlyRootFilesystem: false'	2022-07-29 03:09:54 UTC
Red Hat Issue Tracker	OTA-680	None	None	None	2022-07-28 10:18:42 UTC
Red Hat Knowledge Base (Solution)	6969777	None	None	None	2022-07-28 14:40:44 UTC
Red Hat Product Errata	RHSA-2022:7399	None	None	None	2023-01-17 19:53:58 UTC

Description Gabriel Meghnagi 2022-07-25 16:54:51 UTC

Description of problem:

OCP Upgrade failing

Version-Release number of the following components:

oc version
Client Version: 4.8.0-202108312109.p0.git.0d10c3f.assembly.stream-0d10c3f
Server Version: 4.10.13
Kubernetes Version: v1.23.5+b463d71

How reproducible: Always

Steps to Reproduce:
1. Create the following SCC (that has `with readOnlyRootFilesystem: true`):
~~~
cat << EOF | oc create -f -
allowHostDirVolumePlugin: true
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: true
allowedCapabilities: []
apiVersion: security.openshift.io/v1
defaultAddCapabilities: []
fsGroup:
  type: MustRunAs
groups: []
kind: SecurityContextConstraints
metadata:
  annotations:
    meta.helm.sh/release-name: azure-arc
    meta.helm.sh/release-namespace: default
  labels:
    app.kubernetes.io/managed-by: Helm
  name: kube-aad-proxy-scc
priority: null
readOnlyRootFilesystem: true
requiredDropCapabilities: []
runAsUser:
  type: RunAsAny
seLinuxContext:
  type: MustRunAs
supplementalGroups:
  type: RunAsAny
users:
- system:serviceaccount:azure-arc:azure-arc-kube-aad-proxy-sa
volumes:
- configMap
- hostPath
- secret
EOF
~~~

2. oc adm upgrade --to=4.10.20


Actual results:

SCC kube-aad-proxy-scc, which has readOnlyRootFilesystem is injected inside the pod version-4.10.20-smvt9-6vqwc, causing it to fail.
~~~
# oc get po -n openshift-cluster-version
NAME                                        READY   STATUS    RESTARTS   AGE
cluster-version-operator-6b5c8ff5c8-4bmxx   1/1     Running   0          33m
version-4.10.20-smvt9-6vqwc                 0/1     Error     0          10s
# oc logs version-4.10.20-smvt9-6vqwc -n openshift-cluster-version
oc logs version-4.10.20-smvt9-6vqwc 
mv: cannot remove '/manifests/0000_00_cluster-version-operator_00_namespace.yaml': Read-only file system
mv: cannot remove '/manifests/0000_00_cluster-version-operator_01_adminack_configmap.yaml': Read-only file system
mv: cannot remove '/manifests/0000_00_cluster-version-operator_01_admingate_configmap.yaml': Read-only file system
mv: cannot remove '/manifests/0000_00_cluster-version-operator_01_clusteroperator.crd.yaml': Read-only file system
mv: cannot remove '/manifests/0000_00_cluster-version-operator_01_clusterversion.crd.yaml': Read-only file system
mv: cannot remove '/manifests/0000_00_cluster-version-operator_02_roles.yaml': Read-only file system
mv: cannot remove '/manifests/0000_00_cluster-version-operator_03_deployment.yaml': Read-only file system
mv: cannot remove '/manifests/0000_90_cluster-version-operator_00_prometheusrole.yaml': Read-only file system
mv: cannot remove '/manifests/0000_90_cluster-version-operator_01_prometheusrolebinding.yaml': Read-only file system
mv: cannot remove '/manifests/0000_90_cluster-version-operator_02_servicemonitor.yaml': Read-only file system
mv: cannot remove '/manifests/0001_00_cluster-version-operator_03_service.yaml': Read-only file system
~~~

Expected results:

Pod version-4.10.20-smvt9-6vqwc should run fine

Additional info:

I don't know why, but SCC kube-aad-proxy-scc is injected inside pod version-4.10.20-smvt9-6vqwc:
~~~
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.129.0.70"
          ],
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.129.0.70"
          ],
          "default": true,
          "dns": {}
      }]
    openshift.io/scc: kube-aad-proxy-scc             ### HERE
  creationTimestamp: "2022-07-25T16:47:39Z"
  generateName: version-4.10.20-5xqtv-
  labels:
    controller-uid: ba707bbe-1825-4f80-89ce-f6bf2301a812
    job-name: version-4.10.20-5xqtv
  name: version-4.10.20-5xqtv-9gcwk
  namespace: openshift-cluster-version
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: version-4.10.20-5xqtv
    uid: ba707bbe-1825-4f80-89ce-f6bf2301a812
  resourceVersion: "40040"
  uid: 0d668d3d-7452-463f-a421-4dfee9c89c23
spec:
  containers:
  - args:
    - -c
    - mkdir -p /etc/cvo/updatepayloads/KsrCX7X9QbtoXkW3TkPcww && mv /manifests /etc/cvo/updatepayloads/KsrCX7X9QbtoXkW3TkPcww/manifests
      && mkdir -p /etc/cvo/updatepayloads/KsrCX7X9QbtoXkW3TkPcww && mv /release-manifests
      /etc/cvo/updatepayloads/KsrCX7X9QbtoXkW3TkPcww/release-manifests
    command:
    - /bin/sh
    image: quay.io/openshift-release-dev/ocp-release@sha256:b89ada9261a1b257012469e90d7d4839d0d2f99654f5ce76394fa3f06522b600
    imagePullPolicy: IfNotPresent
    name: payload
    resources:
      requests:
        cpu: 10m
        ephemeral-storage: 2Mi
        memory: 50Mi
    securityContext:
      privileged: true
      readOnlyRootFilesystem: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/cvo/updatepayloads
      name: payloads
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-fwblb
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: default-dockercfg-smmf4
  nodeName: ip-10-0-215-206.eu-central-1.compute.internal
  nodeSelector:
    node-role.kubernetes.io/master: ""
  preemptionPolicy: PreemptLowerPriority
  priority: 1000000000
  priorityClassName: openshift-user-critical
  restartPolicy: OnFailure
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000030000
    seLinuxOptions:
      level: s0:c6,c0
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - key: node-role.kubernetes.io/master
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  volumes:
  - hostPath:
      path: /etc/cvo/updatepayloads
      type: ""
    name: payloads
  - name: kube-api-access-fwblb
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
      - configMap:
          items:
          - key: service-ca.crt
            path: service-ca.crt
          name: openshift-service-ca.crt
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-07-25T16:47:39Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-07-25T16:47:39Z"
    message: 'containers with unready status: [payload]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-07-25T16:47:39Z"
    message: 'containers with unready status: [payload]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-07-25T16:47:39Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://ac6f6a5d8925620f1a2835a50fe26ea02d35e3a5c2d033015f38fde5206daf8c
    image: quay.io/openshift-release-dev/ocp-release@sha256:b89ada9261a1b257012469e90d7d4839d0d2f99654f5ce76394fa3f06522b600
    imageID: quay.io/openshift-release-dev/ocp-release@sha256:b89ada9261a1b257012469e90d7d4839d0d2f99654f5ce76394fa3f06522b600
    lastState:
      terminated:
        containerID: cri-o://fdac85e975eb00a3abd08e18061ae3673a857769ddfc87ca94a3527a8c7b83f3
        exitCode: 1
        finishedAt: "2022-07-25T16:47:42Z"
        reason: Error
        startedAt: "2022-07-25T16:47:42Z"
    name: payload
    ready: false
    restartCount: 2
    started: false
    state:
      terminated:
        containerID: cri-o://ac6f6a5d8925620f1a2835a50fe26ea02d35e3a5c2d033015f38fde5206daf8c
        exitCode: 1
        finishedAt: "2022-07-25T16:47:56Z"
        reason: Error
        startedAt: "2022-07-25T16:47:56Z"
  hostIP: 10.0.215.206
  phase: Running
  podIP: 10.129.0.70
  podIPs:
  - ip: 10.129.0.70
  qosClass: Burstable
  startTime: "2022-07-25T16:47:39Z"
~~~

Comment 1 W. Trevor King 2022-07-26 05:26:28 UTC

Associating a restrictive SCC with the version-... pods is not a supported operation.  We have [1] tracking an RFE for clearer reporting when this happens, but ideally folks fix impacted customers by fixing whatever component is making these SCC associations.

But from [2]:

> When the complete set of available SCCs are determined they are ordered by:
>
> 1. Highest priority first, nil is considered a 0 priority
> 2. If priorities are equal, the SCCs will be sorted from most restrictive to least restrictive
> 3. If both priorities and restrictions are equal the SCCs will be sorted by name

And from [3]:

> The set of SCCs that admission uses to authorize a pod are determined by the user identity and groups that the user belongs to. Additionally, if the pod specifies a service account, the set of allowable SCCs includes any constraints accessible to the service account.
>
> Admission uses the following approach to create the final security context for the pod:
>
> 1. Retrieve all SCCs available for use.
> 2. Generate field values for security context settings that were not specified on the request.
> 3. Validate the final settings against the available constraints.

Your kube-aad-proxy-scc has 'priority: null', which should put it at the bottom of the ranking of relevant-to-this-pod SCCs.  I'll poke around and see what the default SCCs look like...

[1]: https://issues.redhat.com/browse/OTA-680
[2]: https://docs.openshift.com/container-platform/4.10/authentication/managing-security-context-constraints.html#scc-prioritization_configuring-internal-oauth
[3]: https://docs.openshift.com/container-platform/4.10/authentication/managing-security-context-constraints.html#admission_configuring-internal-oauth

Comment 2 W. Trevor King 2022-07-26 05:36:34 UTC

Also likely relevant, 4.10 both grew pod-security.kubernetes.io/* annotations [1] and cleared the openshift.io/run-level annotation [2].

$ git --no-pager log --oneline -3 origin/release-4.10 -- install/0000_00_cluster-version-operator_00_namespace.yaml
539e9449 (origin/pr/623) Fix run-level label to empty string.
f58dd1c5 (origin/pr/686) install: Add description annotations to manifests
6e5e23e3 (origin/pr/668) podsecurity: enforce privileged for openshift-cluster-version namespace

None of those were in 4.9:

$ git --no-pager log --oneline -1 origin/release-4.9 -- install/0000_00_cluster-version-operator_00_namespace.yaml
70097361 (origin/pr/543) Add management workload annotations

And all of them landed in 4.10 via master (so they're in 4.10 before it GAed, and in 4.11 and later too):

$ git --no-pager log --oneline -4 origin/master -- install/0000_00_cluster-version-operator_00_namespace.yaml
539e9449 (origin/pr/623) Fix run-level label to empty string.

[1]: https://github.com/openshift/cluster-version-operator/pull/668
[2]: https://github.com/openshift/cluster-version-operator/pull/623

Comment 3 Gabriel Meghnagi 2022-07-26 07:26:23 UTC

Hi Trevor,

"Associating a restrictive SCC with the version-... pods is not a supported operation."
This is something that is done automatically, I don't know how, but the pod version-... is created with this SCC associated, this is not something under our control, and this is the motivation for which I opened this Bug, to investigate why this SCC is associated with the version-... pod.

I hope I have clarified the situation.

I remain available in case you have any other questions, thank you!

Comment 4 Scott Dodson 2022-07-27 13:04:15 UTC

*** Bug 2108631 has been marked as a duplicate of this bug. ***

Comment 5 Yang Yang 2022-07-28 08:13:19 UTC

Reproducing it from qe side:

1. Install a 4.10 cluster
2. Create SCC as mentioned in the Description
3. Upgrade the cluster
# oc adm upgrade --to-latest

# oc adm upgrade
info: An upgrade is in progress. Working towards 4.10.13: 60 of 771 done (7% complete)

ReleaseAccepted=False

  Reason: RetrievePayload
  Message: Retrieving payload failed version="4.10.23" image="quay.io/openshift-release-dev/ocp-release@sha256:e40e49d722cb36a95fa1c03002942b967ccbd7d68de10e003f0baa69abad457b" failure=Unable to download and prepare the update: deadline exceeded, reason: "DeadlineExceeded", message: "Job was active longer than specified deadline"

Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.10 (available channels: candidate-4.10, candidate-4.11, eus-4.10, fast-4.10, stable-4.10)

# oc get pod/version-4.10.23-bqwql-mhcgc -n openshift-cluster-version -oyaml | grep scc
openshift.io/scc: kube-aad-proxy-scc

# oc logs pod/version-4.10.23-bqwql-mhcgc -n openshift-cluster-version
mv: cannot remove '/manifests/0000_00_cluster-version-operator_00_namespace.yaml': Read-only file system
mv: cannot remove '/manifests/0000_00_cluster-version-operator_01_adminack_configmap.yaml': Read-only file system
mv: cannot remove '/manifests/0000_00_cluster-version-operator_01_admingate_configmap.yaml': Read-only file system
mv: cannot remove '/manifests/0000_00_cluster-version-operator_01_clusteroperator.crd.yaml': Read-only file system
mv: cannot remove '/manifests/0000_00_cluster-version-operator_01_clusterversion.crd.yaml': Read-only file system
mv: cannot remove '/manifests/0000_00_cluster-version-operator_02_roles.yaml': Read-only file system
mv: cannot remove '/manifests/0000_00_cluster-version-operator_03_deployment.yaml': Read-only file system
mv: cannot remove '/manifests/0000_90_cluster-version-operator_00_prometheusrole.yaml': Read-only file system
mv: cannot remove '/manifests/0000_90_cluster-version-operator_01_prometheusrolebinding.yaml': Read-only file system
mv: cannot remove '/manifests/0000_90_cluster-version-operator_02_servicemonitor.yaml': Read-only file system
mv: cannot remove '/manifests/0001_00_cluster-version-operator_03_service.yaml': Read-only file system

Okay, it's reproduced.

Comment 9 Lalatendu Mohanty 2022-07-28 18:43:23 UTC

I am not sure how the version-4.10.20-smvt9-6vqwc got created in the openshift-cluster-verison namespace. Also why do customers need the SCC (which caused the issue) is required?

Comment 10 Lalatendu Mohanty 2022-07-28 20:01:38 UTC

> I am not sure how the version-4.10.20-smvt9-6vqwc got created in the openshift-cluster-verison namespace.

CVO creates this as part of the upgrade process. I was not aware of this. But I am not still clear why SCC was attached to it.

Comment 11 W. Trevor King 2022-07-29 03:12:08 UTC

I'm still a bit fuzzy about the corners (which or both change from comment 2 made this an issue in 4.10+?), but Oscar's point at [1] was enough for me to open a pull request.  If I'm still misunderstanding something about the motivation, wording suggestions for the commit message are welcome :)

[1]: https://github.com/openshift/cluster-openshift-apiserver-operator/pull/437/files

Comment 12 W. Trevor King 2022-07-29 03:12:49 UTC

And it's pretty clear that this is a new-in-4.10 issue, so blocker- and we'll keep shipping releases until this fix goes out (hopefully soon).

Comment 14 Yang Yang 2022-08-01 04:14:01 UTC

Verifying on 4.12.0-0.nightly-2022-07-31-185642

1. Install a cluster with 4.12.0-0.nightly-2022-07-31-185642 payload
2. Create SCC
# cat << EOF | oc create -f -
> allowHostDirVolumePlugin: true
> allowHostIPC: false
> allowHostNetwork: false
> allowHostPID: false
> allowHostPorts: false
> allowPrivilegeEscalation: true
> allowPrivilegedContainer: true
> allowedCapabilities: []
> apiVersion: security.openshift.io/v1
> defaultAddCapabilities: []
> fsGroup:
>   type: MustRunAs
> groups: []
> kind: SecurityContextConstraints
> metadata:
>   annotations:
>     meta.helm.sh/release-name: azure-arc
>     meta.helm.sh/release-namespace: default
>   labels:
>     app.kubernetes.io/managed-by: Helm
>   name: kube-aad-proxy-scc
> priority: null
> readOnlyRootFilesystem: true
> requiredDropCapabilities: []
> runAsUser:
>   type: RunAsAny
> seLinuxContext:
>   type: MustRunAs
> supplementalGroups:
>   type: RunAsAny
> users:
> - system:serviceaccount:azure-arc:azure-arc-kube-aad-proxy-sa
> volumes:
> - configMap
> - hostPath
> - secret
> EOF
securitycontextconstraints.security.openshift.io/kube-aad-proxy-scc created

3. Check all the SCC
# oc get scc
NAME                              PRIV    CAPS                   SELINUX     RUNASUSER          FSGROUP     SUPGROUP    PRIORITY     READONLYROOTFS   VOLUMES
anyuid                            false   <no value>             MustRunAs   RunAsAny           RunAsAny    RunAsAny    10           false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
hostaccess                        false   <no value>             MustRunAs   MustRunAsRange     MustRunAs   RunAsAny    <no value>   false            ["configMap","downwardAPI","emptyDir","hostPath","persistentVolumeClaim","projected","secret"]
hostmount-anyuid                  false   <no value>             MustRunAs   RunAsAny           RunAsAny    RunAsAny    <no value>   false            ["configMap","downwardAPI","emptyDir","hostPath","nfs","persistentVolumeClaim","projected","secret"]
hostnetwork                       false   <no value>             MustRunAs   MustRunAsRange     MustRunAs   MustRunAs   <no value>   false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
hostnetwork-v2                    false   ["NET_BIND_SERVICE"]   MustRunAs   MustRunAsRange     MustRunAs   MustRunAs   <no value>   false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
kube-aad-proxy-scc                true    []                     MustRunAs   RunAsAny           MustRunAs   RunAsAny    <no value>   true             ["configMap","hostPath","secret"]
machine-api-termination-handler   false   <no value>             MustRunAs   RunAsAny           MustRunAs   MustRunAs   <no value>   false            ["downwardAPI","hostPath"]
node-exporter                     true    <no value>             RunAsAny    RunAsAny           RunAsAny    RunAsAny    <no value>   false            ["*"]
nonroot                           false   <no value>             MustRunAs   MustRunAsNonRoot   RunAsAny    RunAsAny    <no value>   false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
nonroot-v2                        false   ["NET_BIND_SERVICE"]   MustRunAs   MustRunAsNonRoot   RunAsAny    RunAsAny    <no value>   false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
privileged                        true    ["*"]                  RunAsAny    RunAsAny           RunAsAny    RunAsAny    <no value>   false            ["*"]
restricted                        false   <no value>             MustRunAs   MustRunAsRange     MustRunAs   RunAsAny    <no value>   false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
restricted-v2                     false   ["NET_BIND_SERVICE"]   MustRunAs   MustRunAsRange     MustRunAs   RunAsAny    <no value>   false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]

3. Upgrade the cluster
# oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4 --allow-explicit-upgrade --force
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Requesting update to release image registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4

version pod is completed.

# oc get po -n openshift-cluster-version
NAME                                        READY   STATUS        RESTARTS   AGE
cluster-version-operator-5c4695fb4b-5mj7f   1/1     Terminating   0          35m
version--zfh5q-gp5kb                        0/1     Completed     0          17s

node-exporter is selected and readOnlyRootFilesystem is set to false.

# oc get pod/version--zfh5q-gp5kb -n openshift-cluster-version -oyaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.129.0.49"
          ],
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.129.0.49"
          ],
          "default": true,
          "dns": {}
      }]
    openshift.io/scc: node-exporter        #### node-exporter is selected
  creationTimestamp: "2022-08-01T03:20:13Z"
  generateName: version--zfh5q-
  labels:
    controller-uid: 6300536b-1975-4716-9e81-7dfa32d0cbb8
    job-name: version--zfh5q
  name: version--zfh5q-gp5kb
  namespace: openshift-cluster-version
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: version--zfh5q
    uid: 6300536b-1975-4716-9e81-7dfa32d0cbb8
  resourceVersion: "34008"
  uid: 0d6daee6-2b07-4cf1-915b-a2a664620d8a
spec:
  containers:
  - command:
    - mv
    - /etc/cvo/updatepayloads/jyGyVbpNDo9kcnOEanKBeg-cvtb6
    - /etc/cvo/updatepayloads/jyGyVbpNDo9kcnOEanKBeg
    image: registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4
    imagePullPolicy: IfNotPresent
    name: rename-to-final-location
    resources:
      requests:
        cpu: 10m
        ephemeral-storage: 2Mi
        memory: 50Mi
    securityContext:
      privileged: true
      readOnlyRootFilesystem: false              #### It's explicitly false 
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/cvo/updatepayloads
      name: payloads
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-zpfsh
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: default-dockercfg-nv6r4
  initContainers:
  - command:
    - sh
    - -c
    - rm -fR ./*
    image: registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4
    imagePullPolicy: IfNotPresent
    name: cleanup
    resources:
      requests:
        cpu: 10m
        ephemeral-storage: 2Mi
        memory: 50Mi
    securityContext:
      privileged: true
      readOnlyRootFilesystem: false
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/cvo/updatepayloads
      name: payloads
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-zpfsh
      readOnly: true
    workingDir: /etc/cvo/updatepayloads/
  - command:
    - mkdir
    - /etc/cvo/updatepayloads/jyGyVbpNDo9kcnOEanKBeg-cvtb6
    image: registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4
    imagePullPolicy: IfNotPresent
    name: make-temporary-directory
    resources:
      requests:
        cpu: 10m
        ephemeral-storage: 2Mi
        memory: 50Mi
    securityContext:
      privileged: true
      readOnlyRootFilesystem: false
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/cvo/updatepayloads
      name: payloads
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-zpfsh
      readOnly: true
  - command:
    - mv
    - /manifests
    - /etc/cvo/updatepayloads/jyGyVbpNDo9kcnOEanKBeg-cvtb6/manifests
    image: registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4
    imagePullPolicy: IfNotPresent
    name: move-operator-manifests-to-temporary-directory
    resources:
      requests:
        cpu: 10m
        ephemeral-storage: 2Mi
        memory: 50Mi
    securityContext:
      privileged: true
      readOnlyRootFilesystem: false
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/cvo/updatepayloads
      name: payloads
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-zpfsh
      readOnly: true
  - command:
    - mv
    - /release-manifests
    - /etc/cvo/updatepayloads/jyGyVbpNDo9kcnOEanKBeg-cvtb6/release-manifests
    image: registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4
    imagePullPolicy: IfNotPresent
    name: move-release-manifests-to-temporary-directory
    resources:
      requests:
        cpu: 10m
        ephemeral-storage: 2Mi
        memory: 50Mi
    securityContext:
      privileged: true
      readOnlyRootFilesystem: false
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/cvo/updatepayloads
      name: payloads
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-zpfsh
      readOnly: true
  nodeName: yanyang-0801a-49hnl-master-1.c.openshift-qe.internal
  nodeSelector:
    node-role.kubernetes.io/master: ""
  preemptionPolicy: PreemptLowerPriority
  priority: 1000000000
  priorityClassName: openshift-user-critical
  restartPolicy: OnFailure
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - key: node-role.kubernetes.io/master
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  volumes:
  - hostPath:
      path: /etc/cvo/updatepayloads
      type: ""
    name: payloads
  - name: kube-api-access-zpfsh
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
      - configMap:
          items:
          - key: service-ca.crt
            path: service-ca.crt
          name: openshift-service-ca.crt
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-08-01T03:20:21Z"
    reason: PodCompleted
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-08-01T03:20:13Z"
    reason: PodCompleted
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-08-01T03:20:13Z"
    reason: PodCompleted
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-08-01T03:20:13Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://fa45c5aec28ab203d12e4103c7b9664e2bc2b097c67216f4a4b72c0acb2f1cf1
    image: registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4
    imageID: registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4
    lastState: {}
    name: rename-to-final-location
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: cri-o://fa45c5aec28ab203d12e4103c7b9664e2bc2b097c67216f4a4b72c0acb2f1cf1
        exitCode: 0
        finishedAt: "2022-08-01T03:20:21Z"
        reason: Completed
        startedAt: "2022-08-01T03:20:21Z"
  hostIP: 10.0.0.3
  initContainerStatuses:
  - containerID: cri-o://8dc3e78b1a1511ac00c487a5a3cca65acd11a4e7fc4e7bff64c885a1a47ba7fe
    image: registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4
    imageID: registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4
    lastState: {}
    name: cleanup
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://8dc3e78b1a1511ac00c487a5a3cca65acd11a4e7fc4e7bff64c885a1a47ba7fe
        exitCode: 0
        finishedAt: "2022-08-01T03:20:18Z"
        reason: Completed
        startedAt: "2022-08-01T03:20:18Z"
  - containerID: cri-o://9824f8a3b742ad8c208c6efe257adae4808d3ce5bb6312e1b40cfccf2ad33209
    image: registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4
    imageID: registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4
    lastState: {}
    name: make-temporary-directory
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://9824f8a3b742ad8c208c6efe257adae4808d3ce5bb6312e1b40cfccf2ad33209
        exitCode: 0
        finishedAt: "2022-08-01T03:20:18Z"
        reason: Completed
        startedAt: "2022-08-01T03:20:18Z"
  - containerID: cri-o://0019d96fc44b5b8fa0456ce88e1f278b2e06224158693246050e26ce77764d17
    image: registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4
    imageID: registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4
    lastState: {}
    name: move-operator-manifests-to-temporary-directory
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://0019d96fc44b5b8fa0456ce88e1f278b2e06224158693246050e26ce77764d17
        exitCode: 0
        finishedAt: "2022-08-01T03:20:19Z"
        reason: Completed
        startedAt: "2022-08-01T03:20:19Z"
  - containerID: cri-o://1858651805c9fb7216622a0270ec3de38a1cd83810b83afcf50becc063ac1041
    image: registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4
    imageID: registry.ci.openshift.org/ocp/release@sha256:36a78e1b1d004f1acc8ddc9ce5e5294a09719426ad977e7ec296779446380fd4
    lastState: {}
    name: move-release-manifests-to-temporary-directory
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://1858651805c9fb7216622a0270ec3de38a1cd83810b83afcf50becc063ac1041
        exitCode: 0
        finishedAt: "2022-08-01T03:20:21Z"
        reason: Completed
        startedAt: "2022-08-01T03:20:20Z"
  phase: Succeeded
  podIP: 10.129.0.49
  podIPs:
  - ip: 10.129.0.49
  qosClass: Burstable
  startTime: "2022-08-01T03:20:13Z"

Upgrade is successful.

# oc adm upgrade 
warning: Cannot display available updates:
  Reason: VersionNotFound
  Message: Unable to retrieve available updates: currently reconciling cluster version 4.12.0-0.nightly-2022-07-31-235028 not found in the "stable-4.11" channel

Cluster version is 4.12.0-0.nightly-2022-07-31-235028

Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.11

All cos are happy.

# oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.12.0-0.nightly-2022-07-31-235028   True        False         False      68m     
baremetal                                  4.12.0-0.nightly-2022-07-31-235028   True        False         False      82m     
cloud-controller-manager                   4.12.0-0.nightly-2022-07-31-235028   True        False         False      85m     
cloud-credential                           4.12.0-0.nightly-2022-07-31-235028   True        False         False      85m     
cluster-autoscaler                         4.12.0-0.nightly-2022-07-31-235028   True        False         False      82m     
config-operator                            4.12.0-0.nightly-2022-07-31-235028   True        False         False      83m     
console                                    4.12.0-0.nightly-2022-07-31-235028   True        False         False      73m     
csi-snapshot-controller                    4.12.0-0.nightly-2022-07-31-235028   True        False         False      83m     
dns                                        4.12.0-0.nightly-2022-07-31-235028   True        False         False      82m     
etcd                                       4.12.0-0.nightly-2022-07-31-235028   True        False         False      81m     
image-registry                             4.12.0-0.nightly-2022-07-31-235028   True        False         False      75m     
ingress                                    4.12.0-0.nightly-2022-07-31-235028   True        False         False      74m     
insights                                   4.12.0-0.nightly-2022-07-31-235028   True        False         False      76m     
kube-apiserver                             4.12.0-0.nightly-2022-07-31-235028   True        False         False      78m     
kube-controller-manager                    4.12.0-0.nightly-2022-07-31-235028   True        False         False      80m     
kube-scheduler                             4.12.0-0.nightly-2022-07-31-235028   True        False         False      79m     
kube-storage-version-migrator              4.12.0-0.nightly-2022-07-31-235028   True        False         False      83m     
machine-api                                4.12.0-0.nightly-2022-07-31-235028   True        False         False      76m     
machine-approver                           4.12.0-0.nightly-2022-07-31-235028   True        False         False      82m     
machine-config                             4.12.0-0.nightly-2022-07-31-235028   True        False         False      76m     
marketplace                                4.12.0-0.nightly-2022-07-31-235028   True        False         False      82m     
monitoring                                 4.12.0-0.nightly-2022-07-31-235028   True        False         False      73m     
network                                    4.12.0-0.nightly-2022-07-31-235028   True        False         False      84m     
node-tuning                                4.12.0-0.nightly-2022-07-31-235028   True        False         False      40m     
openshift-apiserver                        4.12.0-0.nightly-2022-07-31-235028   True        False         False      76m     
openshift-controller-manager               4.12.0-0.nightly-2022-07-31-235028   True        False         False      79m     
openshift-samples                          4.12.0-0.nightly-2022-07-31-235028   True        False         False      43m     
operator-lifecycle-manager                 4.12.0-0.nightly-2022-07-31-235028   True        False         False      83m     
operator-lifecycle-manager-catalog         4.12.0-0.nightly-2022-07-31-235028   True        False         False      83m     
operator-lifecycle-manager-packageserver   4.12.0-0.nightly-2022-07-31-235028   True        False         False      76m     
service-ca                                 4.12.0-0.nightly-2022-07-31-235028   True        False         False      83m     
storage                                    4.12.0-0.nightly-2022-07-31-235028   True        False         False      82m  

Looks good to me. Moving it to verified state.

Comment 16 Standa Laznicka 2022-08-23 11:51:20 UTC

As an easy workaround for the issue, remove the SCCs for the time period when CVO pods are being redeployed until they are running. You can recreate the SCC afterwards.

Comment 17 harprsin 2022-10-11 05:24:59 UTC

is there any plans to get a fix in OCP 4.10 ?

Comment 18 W. Trevor King 2022-10-12 06:20:24 UTC

If you click "Show advanced fields", you can see:

  Blocks: 2114602

Heading back to the 4.11.z bug 2114602, you can see it shipped in 4.11.1 [1].  And there's also an exciting transition to Jira for 4.10.z [2].  Following along, [3] has a comment linking to [4].  And clicking through to [4] (no convenient inline version numbers in Jira errata link comments yet), we can see that the fix shipped in 4.10.30.  Folks can use comment 16's mitigation to get themselves out to release with 'readOnlyRootFilesystem: false', but after that, no further workarounds should be required.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2114602#c9
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=2114602#c7
[3]: https://issues.redhat.com//browse/OCPBUGS-233
[4]: https://access.redhat.com/errata/RHSA-2022:6133

Comment 21 errata-xmlrpc 2023-01-17 19:53:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399

Comment 22 Red Hat Bugzilla 2023-09-18 04:42:56 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.