2001816 – SNO upgrade from 4.9.0-fc.0 to 4.9.0-fc.1 stuck due to openshift-cloud-controller-manager-operator pod going into pending state

Bug 2001816 - SNO upgrade from 4.9.0-fc.0 to 4.9.0-fc.1 stuck due to openshift-cloud-controller-manager-operator pod going into pending state

Summary: SNO upgrade from 4.9.0-fc.0 to 4.9.0-fc.1 stuck due to openshift-cloud-contro...

Keywords:
Status:	CLOSED DUPLICATE of bug 1998466
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Joel Speed
QA Contact:	sunzhaohua
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-09-07 09:14 UTC by Abhijeet Sadawarte
Modified:	2022-04-11 08:33 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-09-14 12:28:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	6062461	0	None	None	None	2021-09-08 17:18:45 UTC

Description Abhijeet Sadawarte 2021-09-07 09:14:19 UTC

Created attachment 1821094 [details]
must-gather

Description of problem:
I have deployed 4.9.0-fc.0 SNO cluster, however, during the upgrade to 4.9.0-fc.1, the upgrade is stuck with:

~~~
# oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-fc.0   True        True          73m     Unable to apply 4.9.0-fc.1: the workload openshift-cloud-controller-manager-operator/cluster-cloud-controller-manager-operator cannot roll out
~~~

- I can see that the pod cluster-cloud-controller-manager-operator is in a Pending state and in the events, it shows:

~~~
2m27s       Warning   FailedScheduling    pod/cluster-cloud-controller-manager-operator-7646c69d7c-kjmn7    0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.
~~~

Version-Release number of selected component (if applicable):
- 4.9.0-fc.0

How reproducible:
- Deploy the 4.9.0-fc.0 cluster and try upgrading it to 4.9.0-fc.1 using the candidate-4.9 channel.


Actual results:
- The upgrade gets stuck at 18%

Expected results:
-The cluster should upgrade successfully.

Additional info:
4.9.0-fc.1 to 4.9.0-rc.0 gets stuck too with the same issue.

Comment 5 Jean-Francois Saucier 2021-09-09 12:28:56 UTC

I did have the same issue on my end and simply deleting the previous pod got the new one (that was in pending state) running correctly and did resume the upgrade from fc.0 to rc.0.

Comment 6 Udi Kalifon 2021-09-09 16:05:55 UTC

The upgrade path 4.9.0-fc.0 to 4.9.0-fc.1 is in the candidate channel, so not supported. I will close this bug, however if you think this should be fixed then please re-open the bug and state why it is important. Please check that upgrading from 4.8 to the latest fc of 4.9 is working.

Comment 8 daniel 2021-09-10 10:05:15 UTC

I can confirm update of SNO from 4.8.9 to 4.9.0-fc.0 works

Comment 9 Abhijeet Sadawarte 2021-09-10 18:19:54 UTC

I just tested SNO upgrade from 4.8.9 to 4.9.0-rc.0 successfully:

~~~
  status:
    availableUpdates: null
    conditions:
    - lastTransitionTime: "2021-09-10T16:24:23Z"
      message: Done applying 4.9.0-rc.0
      status: "True"
      type: Available
    - lastTransitionTime: "2021-09-10T17:00:19Z"
      status: "False"
      type: Failing
    - lastTransitionTime: "2021-09-10T17:50:51Z"
      message: Cluster version is 4.9.0-rc.0
      status: "False"
      type: Progressing
    - lastTransitionTime: "2021-09-10T16:05:45Z"
      status: "True"
      type: RetrievedUpdates
    desired:
      channels:
      - candidate-4.9
      image: quay.io/openshift-release-dev/ocp-release@sha256:d1c1401fdbfe0820036dd3f3cc5df1539b5a101fe9f21f1845e55d8655000f66
      version: 4.9.0-rc.0
    history:
    - completionTime: "2021-09-10T17:50:51Z"
      image: quay.io/openshift-release-dev/ocp-release@sha256:d1c1401fdbfe0820036dd3f3cc5df1539b5a101fe9f21f1845e55d8655000f66
      startedTime: "2021-09-10T16:42:55Z"
      state: Completed
      verified: true
      version: 4.9.0-rc.0
    - completionTime: "2021-09-10T16:24:23Z"
      image: quay.io/openshift-release-dev/ocp-release@sha256:5fb4b4225498912357294785b96cde6b185eaed20bbf7a4d008c462134a4edfd
      startedTime: "2021-09-10T15:49:42Z"
      state: Completed
      verified: false
      version: 4.8.9
    observedGeneration: 3
    versionHash: C_5ZobhRcD0=
~~~

Comment 12 Udi Kalifon 2021-09-14 05:56:46 UTC

I got a clarification from the developers, that updates between fc versions is supposed to be supported. Thanks.

Comment 13 Joel Speed 2021-09-14 12:28:25 UTC

I mentioned this in a private comment, but for the sake of completeness, this bug was already reported and has been verified in https://bugzilla.redhat.com/show_bug.cgi?id=1998466. Closing now as we don't need to track a bug that has already been fixed.

*** This bug has been marked as a duplicate of bug 1998466 ***

Comment 14 W. Trevor King 2021-09-15 16:07:48 UTC

I'm removing UpgradeBlocker to drop this from the needs-an-impact-statement queue [1], because:

* This is about a candidate -> candidate update edge, and we don't block those because they aren't supported, and we want to hear about folks hitting different bugs while updating their candidate-level clusters.
* The bug was closed as a DUP, so any UpgradeBlocker discussion would be better suited for bug 1998466.

If folks want to push back, please follow up in bug 1998466.

[1]: https://github.com/openshift/enhancements/pull/475

Note You need to log in before you can comment on or make changes to this bug.