I'm trying the pre-merge test by launching a cluster using cluster-bot. launch openshift/cluster-version-operator#657 gcp # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.ci.test-2021-09-16-032740-ci-ln-zrdr9p2-latest True False 4m37s Cluster version is 4.9.0-0.ci.test-2021-09-16-032740-ci-ln-zrdr9p2-latest 4.9 deployment does not have configmap cluster-autoscaler-operator-ca as a volume. # oc get deploy -n openshift-machine-api cluster-autoscaler-operator -o json | jq .spec.template.spec.volumes[] { "name": "cert", "secret": { "defaultMode": 420, "items": [ { "key": "tls.crt", "path": "tls.crt" }, { "key": "tls.key", "path": "tls.key" } ], "secretName": "cluster-autoscaler-operator-cert" } } { "configMap": { "defaultMode": 420, "name": "kube-rbac-proxy-cluster-autoscaler-operator" }, "name": "auth-proxy-config" } 4.9 cluster does not have configmaps "cluster-autoscaler-operator-ca". [root@preserve-yangyangmerrn-1 tmp]# oc get cm cluster-autoscaler-operator-ca -n openshift-machine-api Error from server (NotFound): configmaps "cluster-autoscaler-operator-ca" not found [root@preserve-yangyangmerrn-1 tmp]# oc project openshift-machine-api Now using project "openshift-machine-api" on server "https://api.ci-ln-zrdr9p2-f76d1.origin-ci-int-gce.dev.openshift.com:6443". Inject the volume and volumemount to the cluster-autoscaler-operator deployment. # oc edit deployment.apps/cluster-autoscaler-operator deployment.apps/cluster-autoscaler-operator edited 108 name: cluster-autoscaler-operator 109 ports: 110 - containerPort: 8443 111 protocol: TCP 112 resources: 113 requests: 114 cpu: 20m 115 memory: 50Mi 116 terminationMessagePath: /dev/termination-log 117 terminationMessagePolicy: FallbackToLogsOnError 118 volumeMounts: 119 - name: ca-cert 120 mountPath: /etc/cluster-autoscaler-operator/tls/service-ca 121 readOnly: true 122 - mountPath: /etc/cluster-autoscaler-operator/tls 123 name: cert 124 readOnly: true 139 volumes: 140 - name: ca-cert 141 configMap: 142 name: cluster-autoscaler-operator-ca 143 items: 144 - key: service-ca.crt 145 path: ca-cert.pem Yeah, we have configMap cluster-autoscaler-operator-ca as a volume now. # oc get deploy -n openshift-machine-api cluster-autoscaler-operator -o json | jq .spec.template.spec.volumes[] { "configMap": { "defaultMode": 420, "items": [ { "key": "service-ca.crt", "path": "ca-cert.pem" } ], "name": "cluster-autoscaler-operator-ca" }, "name": "ca-cert" } { "name": "cert", "secret": { "defaultMode": 420, "items": [ { "key": "tls.crt", "path": "tls.crt" }, { "key": "tls.key", "path": "tls.key" } ], "secretName": "cluster-autoscaler-operator-cert" } } { "configMap": { "defaultMode": 420, "name": "kube-rbac-proxy-cluster-autoscaler-operator" }, "name": "auth-proxy-config" } Watching the cluster-autoscaler-operator pod [root@preserve-yangyangmerrn-1 tmp]# oc get po --watch NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-584764d849-gx8x9 0/2 ContainerCreating 0 21s cluster-autoscaler-operator-6448c6b7fd-t624p 2/2 Running 0 44m cluster-baremetal-operator-9dbcfcff9-t4448 2/2 Running 1 (76m ago) 81m machine-api-controllers-56f7897445-d9k8z 7/7 Running 0 76m machine-api-operator-5fc7876cdf-25g75 2/2 Running 0 81m cluster-autoscaler-operator-584764d849-gx8x9 0/2 Terminating 0 112s cluster-autoscaler-operator-584764d849-gx8x9 0/2 Terminating 0 2m4s cluster-autoscaler-operator-584764d849-gx8x9 0/2 Terminating 0 2m4s # oc get po NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-6448c6b7fd-t624p 2/2 Running 0 52m cluster-baremetal-operator-9dbcfcff9-t4448 2/2 Running 1 (83m ago) 88m machine-api-controllers-56f7897445-d9k8z 7/7 Running 0 83m machine-api-operator-5fc7876cdf-25g75 2/2 Running 0 88m The new pod is terminated because the configmap "cluster-autoscaler-operator-ca" not found. # oc get event -n openshift-machine-api | grep cluster-autoscaler-operator-584764d849-gx8x9 6m57s Normal Scheduled pod/cluster-autoscaler-operator-584764d849-gx8x9 Successfully assigned openshift-machine-api/cluster-autoscaler-operator-584764d849-gx8x9 to ci-ln-zrdr9p2-f76d1-cpg5p-master-0 5m53s Warning FailedMount pod/cluster-autoscaler-operator-584764d849-gx8x9 MountVolume.SetUp failed for volume "ca-cert" : configmap "cluster-autoscaler-operator-ca" not found 4m54s Warning FailedMount pod/cluster-autoscaler-operator-584764d849-gx8x9 Unable to attach or mount volumes: unmounted volumes=[ca-cert], unattached volumes=[kube-api-access-tzh9w ca-cert auth-proxy-config cert]: timed out waiting for the condition 6m57s Normal SuccessfulCreate replicaset/cluster-autoscaler-operator-584764d849 Created pod: cluster-autoscaler-operator-584764d849-gx8x9 5m5s Normal SuccessfulDelete replicaset/cluster-autoscaler-operator-584764d849 Deleted pod: cluster-autoscaler-operator-584764d849-gx8x9 The volume "ca-cert" : configmap "cluster-autoscaler-operator-ca" is removed from the deployment. # oc get deploy -n openshift-machine-api cluster-autoscaler-operator -o json | jq .spec.template.spec.volumes[] { "name": "cert", "secret": { "defaultMode": 420, "items": [ { "key": "tls.crt", "path": "tls.crt" }, { "key": "tls.key", "path": "tls.key" } ], "secretName": "cluster-autoscaler-operator-cert" } } { "configMap": { "defaultMode": 420, "name": "kube-rbac-proxy-cluster-autoscaler-operator" }, "name": "auth-proxy-config" }
To prove the procedure in comment#1 is suitable to test the change, I perform the similar test on a 4.8 cluster. # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-09-15-162303 True False 129m Cluster version is 4.8.0-0.nightly-2021-09-15-162303 Inject the volume ca-cert configMap: cluster-autoscaler-operator-ca to the deployment of cluster-autoscaler-operator. # oc get pod NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-695cfd657-rfss9 2/2 Running 0 155m cluster-autoscaler-operator-766f6648bd-sh2f7 0/2 ContainerCreating 0 10m cluster-baremetal-operator-6468998c6b-tdwpt 2/2 Running 1 155m machine-api-controllers-58cb4f598-hzm4t 7/7 Running 2 148m machine-api-operator-b8cc66c9b-xj7gn 2/2 Running 1 155m # oc get deploy -n openshift-machine-api cluster-autoscaler-operator -o json | jq .spec.template.spec.volumes[] { "configMap": { "defaultMode": 420, "items": [ { "key": "service-ca.crt", "path": "ca-cert.pem" } ], "name": "cluster-autoscaler-operator-ca" }, "name": "ca-cert" } { "name": "cert", "secret": { "defaultMode": 420, "items": [ { "key": "tls.crt", "path": "tls.crt" }, { "key": "tls.key", "path": "tls.key" } ], "secretName": "cluster-autoscaler-operator-cert" } } { "configMap": { "defaultMode": 420, "name": "kube-rbac-proxy-cluster-autoscaler-operator" }, "name": "auth-proxy-config" } The cluster-autoscaler-operator pod gets stuck on the ContainerCreating and the deployment doesn't get the volume removed.
Following comment#1 to verify it with 4.9.0-0.nightly-2021-09-16-215330 and passed. Moving it to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759