|Summary:||CVO creating cloud-controller-manager too early causing upgrade failures|
|Product:||OpenShift Container Platform||Reporter:||OpenShift BugZilla Robot <openshift-bugzilla-robot>|
|Component:||Cluster Version Operator||Assignee:||W. Trevor King <wking>|
|Status:||CLOSED ERRATA||QA Contact:||Yang Yang <yanyang>|
|Version:||4.8||CC:||aos-bugs, jokerman, vrutkovs, wking, yanyang|
|Fixed In Version:||Doc Type:||Bug Fix|
Cause: The cluster-version operator had been pre-creating ClusterOperator resources at the beginning of the update phase. Consequence: For new second-level operators who were deeper in in the update, that could leave their ClusterOperator without status conditions for many minutes. After ten minutes without conditions, the ClusterOperatorDown and/or ClusterOperatorDegraded alerts would begin firing. Fix: Now the cluster-version operator pre-creates ClusterOperator resources much more closely to the creation of the new second-level operator's deployment. Result: The new second-level operator has time to come up and set status.conditions on the ClusterOperator before alarms start firing.
|Last Closed:||2021-06-01 04:50:27 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:||1957775|
Comment 2 Yang Yang 2021-05-25 06:04:06 UTC
CCM is not included in nightly build. So verifying with ci build 4.7.0-0.ci-2021-05-22-053635. # oc get co | grep cloud-controller-manager null cloud-controller-manager is not included in 4.7 ci build Upgrade to 4.8 ci build # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.ci-2021-05-22-053635 True True 10m Working towards 4.8.0-0.ci-2021-05-24-203349: 93 of 694 done (13% complete) Watching the cloud-controller-manager, # oc get co -w | grep cloud-controller-manager cloud-controller-manager cloud-controller-manager cloud-controller-manager 4.8.0-0.ci-2021-05-24-203349 True False False 0s CVO creates cloud-controller-manager and soon condition status is visible. # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.ci-2021-05-24-203349 True False 60m Cluster version is 4.8.0-0.ci-2021-05-24-203349 There are no alert ClusterOperatorDown and/or ClusterOperatorDegraded for cloud-controller-manager fired. And upgrade is successful. The pr works. Pending regression test against nightly build once the build is available.
Comment 4 Yang Yang 2021-05-26 09:33:18 UTC
Performing regression test against 4.7.0-0.nightly-2021-05-25-192356 by upgrading to 4.8.0-0.nightly-2021-05-25-223219. # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-05-25-192356 True True 44m Working towards 4.8.0-0.nightly-2021-05-25-223219: 549 of 675 done (81% complete), waiting up to 40 minutes on dns # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-05-25-223219 True False 5m24s Cluster version is 4.8.0-0.nightly-2021-05-25-223219 There are no alert ClusterOperatorDown and/or ClusterOperatorDegraded fired. And upgrade is successful. Moving it to verified state.
Comment 6 errata-xmlrpc 2021-06-01 04:50:27 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.13 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2121