Description of problem: During an OCP upgrade on a single node cluster the newer cluster-cloud-controller-manager-operator deployment doesn't start because the new deployment pods fail with this error message: "message": "0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.", Version-Release number of selected component (if applicable): Since 4.9.0-0.ci-2021-08-04-140439 How reproducible: Happens during every upgrade, see full grid here: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic-ci-openshift-release-master-ci-4.9-e2e-aws-upgrade-single-node 2 example jobs: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-upgrade-single-node/1422999335032852480 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-upgrade-single-node/1423724381233745920 Steps to Reproduce: 1. Run an OCP upgrade Actual results: Operator should upgrade successfully Expected results: Operator doesn't upgrade because deployment can't start Additional info: Seems related to https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/101
SNO upgrades are still basically permafailing: See https://sippy.ci.openshift.org/sippy-ng/jobs/4.9/analysis?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22single-node%22%7D%2C%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22upgrade%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D&testFilters=%7B%22items%22%3A%5B%7B%22id%22%3A99%2C%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D&tests=%5Bsig-cluster-lifecycle%5D%20Cluster%20completes%20upgrade Example job: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-azure-upgrade-single-node/1432619589488873472 The tests are all failing on: s: "Cluster did not complete upgrade: timed out waiting for the condition: deployment openshift-cloud-controller-manager-operator/cluster-cloud-controller-manager-operator is Progressing=False: ProgressDeadlineExceeded: ReplicaSet \"cluster-cloud-controller-manager-operator-66bbccf8f9\" has timed out progressing.", CCMO logs show: 2021-08-31T11:48:31.175290540Z I0831 11:48:31.175230 1 clusteroperator_controller.go:123] FeatureGate cluster is not specifying external cloud provider requirement. Skipping... 2021-08-31T11:49:44.634278816Z I0831 11:49:44.634138 1 clusteroperator_controller.go:123] FeatureGate cluster is not specifying external cloud provider requirement. Skipping... 2021-08-31T11:52:44.103475792Z I0831 11:52:44.103359 1 clusteroperator_controller.go:123] FeatureGate cluster is not specifying external cloud provider requirement. Skipping...
@stbenjam https://bugzilla.redhat.com/show_bug.cgi?id=1999018 was identified earlier and a fix has merged, looking at the linked job, that one doesn't include the new fix yet. IIUC the tests should start passing again once a new nightly is built including the new fix.
Upgraded cluster successfully [miyadav@miyadav ~]$ oc adm upgrade --to-image registry.ci.openshift.org/ocp/release:4.9.0-0.nightly-2021-09-08-162532 --allow-explicit-upgrade --force warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade to the update to proceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Updating to release image registry.ci.openshift.org/ocp/release:4.9.0-0.nightly-2021-09-08-162532 [miyadav@miyadav ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-09-07-201519 True True 10s Working towards 4.9.0-0.nightly-2021-09-08-162532: 9 of 734 done (1% complete) . . [miyadav@miyadav ~]$ oc project openshift-cloud-controller-manager-operator oc Now using project "openshift-cloud-controller-manager-operator" on server "https://api.ci-ln-5p870kt-002ac.ci.azure.devcluster.openshift.com:6443". . . [miyadav@miyadav ~]$ oc get pods -w NAME READY STATUS RESTARTS AGE cluster-cloud-controller-manager-operator-7b78466947-92sg4 2/2 Running 0 34m cluster-cloud-controller-manager-operator-7b78466947-92sg4 2/2 Terminating 0 34m cluster-cloud-controller-manager-operator-7b78466947-92sg4 0/2 Terminating 0 34m cluster-cloud-controller-manager-operator-7b78466947-92sg4 0/2 Terminating 0 34m cluster-cloud-controller-manager-operator-7b78466947-92sg4 0/2 Terminating 0 34m cluster-cloud-controller-manager-operator-845966f9df-6vn2s 0/2 Pending 0 0s cluster-cloud-controller-manager-operator-845966f9df-6vn2s 0/2 Pending 0 0s cluster-cloud-controller-manager-operator-845966f9df-6vn2s 0/2 ContainerCreating 0 0s cluster-cloud-controller-manager-operator-845966f9df-6vn2s 2/2 Running 0 2s . . Moving to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759