Description of problem: Upgrade to OCP 4.9.21 halted due to Operator "cluster-cloud-controller-manager" not able to start and into crashloop-backoff Version-Release number of selected component (if applicable): 4.9.21 How reproducible: Steps to Reproduce: 1. Try upgrade cluster from 4.8.24 to 4.9.21 2. 3. Actual results: cluster-cloud-controller-manager into crash loop-backoff and throwing below error: ~~~ 2022-03-15T20:14:16.859746210Z I0315 20:14:16.858913 1 deleg.go:130] CCMOperator/controller-runtime/metrics "msg"="metrics server is starting to listen" "addr"=":8080" 2022-03-15T20:14:16.859746210Z E0315 20:14:16.859117 1 deleg.go:144] CCMOperator/controller-runtime/metrics "msg"="metrics server failed to listen. You may want to disable the metrics server or use another port if it is due to conflicts" "error"="error listening on :8080: listen tcp :8080: bind: address already in use" 2022-03-15T20:14:16.859746210Z E0315 20:14:16.859134 1 deleg.go:144] CCMOperator/setup "msg"="unable to start manager" "error"="error listening on :8080: listen tcp :8080: bind: address already in use" ~~~ Expected results: cluster-cloud-controller-manager pod should start without any error. Additional info: Here OCP cluster is in stuck into middle of upgrade from 4.8.24 to 4.9.21 due to cluster-cloud-controller-manager into crashloopback-off, upon checked we noticed that the pod is scheduled on a master2 node where port 8080 is already used by kube-apiserver.
Hi team, Can we have some updates as when this fix will be backported to 4.9 ? Regards, Nirupma
This is in the queue for QE to test, they should get to it soon
Upgraded cluster from 4.8.39 to 4.9.0-0.nightly-2022-05-11-100812 . .. . . . 05-12 12:40:35.091 clusteroperators: 05-12 12:40:35.091 NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE 05-12 12:40:35.091 authentication 4.9.0-0.nightly-2022-05-11-100812 True False False 39m 05-12 12:40:35.091 baremetal 4.9.0-0.nightly-2022-05-11-100812 True False False 126m 05-12 12:40:35.091 cloud-controller-manager 4.9.0-0.nightly-2022-05-11-100812 True False False 60m 05-12 12:40:35.091 cloud-credential 4.9.0-0.nightly-2022-05-11-100812 True False False 135m 05-12 12:40:35.091 cluster-autoscaler 4.9.0-0.nightly-2022-05-11-100812 True False False 126m 05-12 12:40:35.091 config-operator 4.9.0-0.nightly-2022-05-11-100812 True False False 127m 05-12 12:40:35.091 console 4.9.0-0.nightly-2022-05-11-100812 True False False 38m 05-12 12:40:35.091 csi-snapshot-controller 4.9.0-0.nightly-2022-05-11-100812 True False False 126m 05-12 12:40:35.091 dns 4.9.0-0.nightly-2022-05-11-100812 True False False 126m 05-12 12:40:35.091 etcd 4.9.0-0.nightly-2022-05-11-100812 True False False 126m 05-12 12:40:35.091 image-registry 4.9.0-0.nightly-2022-05-11-100812 True False False 120m 05-12 12:40:35.091 ingress 4.9.0-0.nightly-2022-05-11-100812 True False False 119m 05-12 12:40:35.091 insights 4.9.0-0.nightly-2022-05-11-100812 True False False 120m 05-12 12:40:35.091 kube-apiserver 4.9.0-0.nightly-2022-05-11-100812 True False False 124m 05-12 12:40:35.091 kube-controller-manager 4.9.0-0.nightly-2022-05-11-100812 True False False 124m . . . No backoff error : oc get pod/cluster-cloud-controller-manager-operator-65b77dc777-nkqcs -n openshift-cloud-controller-manager-operator -o yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: "2022-05-12T07:27:26Z" generateName: cluster-cloud-controller-manager-operator-65b77dc777- labels: k8s-app: cloud-manager-operator pod-template-hash: 65b77dc777 name: cluster-cloud-controller-manager-operator-65b77dc777-nkqcs namespace: openshift-cloud-controller-manager-operator ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: ReplicaSet name: cluster-cloud-controller-manager-operator-65b77dc777 uid: 0ff1a37f-f2b1-49c1-9632-2fbb5f88f23b resourceVersion: "105115" uid: 624ce15c-2d1a-4d4b-b7b6-915f73c7dd89 spec: containers: - command: - /bin/bash - -c - | #!/bin/bash set -o allexport if [[ -f /etc/kubernetes/apiserver-url.env ]]; then source /etc/kubernetes/apiserver-url.env else URL_ONLY_KUBECONFIG=/etc/kubernetes/kubeconfig fi exec /cluster-controller-manager-operator \ --leader-elect=true \ --leader-elect-lease-duration=137s \ --leader-elect-renew-deadline=107s \ --leader-elect-retry-period=26s \ --leader-elect-resource-namespace=openshift-cloud-controller-manager-operator \ "--images-json=/etc/cloud-controller-manager-config/images.json" \ --metrics-bind-address=:9258 \ --health-addr=127.0.0.1:9259 env: - name: RELEASE_VERSION value: 4.9.0-0.nightly-2022-05-11-100812 image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:375606eb429ffe7ef890295bf55c5122c300ad3879577629827dd9ddbdc191a9 imagePullPolicy: IfNotPresent name: cluster-cloud-controller-manager ports: - containerPort: 9258 hostPort: 9258 name: metrics protocol: TCP - containerPort: 9259 hostPort: 9259 name: healthz protocol: TCP resources: requests: cpu: 10m memory: 50Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/cloud-controller-manager-config/ name: images - mountPath: /etc/kubernetes name: host-etc-kube readOnly: true - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-9j79z readOnly: true - command: - /bin/bash - -c - | #!/bin/bash set -o allexport if [[ -f /etc/kubernetes/apiserver-url.env ]]; then source /etc/kubernetes/apiserver-url.env else URL_ONLY_KUBECONFIG=/etc/kubernetes/kubeconfig fi exec /config-sync-controllers \ --leader-elect=true \ --leader-elect-lease-duration=137s \ --leader-elect-renew-deadline=107s \ --leader-elect-retry-period=26s \ --leader-elect-resource-namespace=openshift-cloud-controller-manager-operator \ --health-addr=127.0.0.1:9260 image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:375606eb429ffe7ef890295bf55c5122c300ad3879577629827dd9ddbdc191a9 imagePullPolicy: IfNotPresent name: config-sync-controllers ports: - containerPort: 9260 hostPort: 9260 name: healthz protocol: TCP resources: requests: cpu: 10m memory: 25Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/kubernetes name: host-etc-kube readOnly: true - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-9j79z readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true hostNetwork: true imagePullSecrets: - name: cluster-cloud-controller-manager-dockercfg-ms79t nodeName: ip-10-0-52-239.us-east-2.compute.internal nodeSelector: node-role.kubernetes.io/master: "" preemptionPolicy: PreemptLowerPriority priority: 2000001000 priorityClassName: system-node-critical restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: cluster-cloud-controller-manager serviceAccountName: cluster-cloud-controller-manager terminationGracePeriodSeconds: 30 tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 120 - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 120 - effect: NoSchedule key: node.cloudprovider.kubernetes.io/uninitialized operator: Exists - effect: NoSchedule key: node.kubernetes.io/not-ready operator: Exists - effect: NoSchedule key: node.kubernetes.io/memory-pressure operator: Exists volumes: - configMap: defaultMode: 420 name: cloud-controller-manager-images name: images - hostPath: path: /etc/kubernetes type: Directory name: host-etc-kube - name: kube-api-access-9j79z projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace - configMap: items: - key: service-ca.crt path: service-ca.crt name: openshift-service-ca.crt status: conditions: - lastProbeTime: null lastTransitionTime: "2022-05-12T07:27:26Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2022-05-12T07:27:28Z" status: "True" type: Ready - lastProbeTime: null lastTransitionTime: "2022-05-12T07:27:28Z" status: "True" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2022-05-12T07:27:26Z" status: "True" type: PodScheduled containerStatuses: - containerID: cri-o://8934c13b3c99a255ef7bef9bd3a1f91b1efc07bbdefcf030c899994d6575d307 image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:375606eb429ffe7ef890295bf55c5122c300ad3879577629827dd9ddbdc191a9 imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:375606eb429ffe7ef890295bf55c5122c300ad3879577629827dd9ddbdc191a9 lastState: {} name: cluster-cloud-controller-manager ready: true restartCount: 0 started: true state: running: startedAt: "2022-05-12T07:27:27Z" - containerID: cri-o://9dfc15a88d27619ef47aba70ea06f15e4d16f288da09e873b1f36ce5eae8f845 image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:375606eb429ffe7ef890295bf55c5122c300ad3879577629827dd9ddbdc191a9 imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:375606eb429ffe7ef890295bf55c5122c300ad3879577629827dd9ddbdc191a9 lastState: {} name: config-sync-controllers ready: true restartCount: 0 started: true state: running: startedAt: "2022-05-12T07:27:27Z" hostIP: 10.0.52.239 phase: Running podIP: 10.0.52.239 podIPs: - ip: 10.0.52.239 qosClass: Burstable startTime: "2022-05-12T07:27:26Z" Moving to verified based on these results .
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.9.33 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2206