Hide Forgot
Description of problem: Since around the time the patch https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/152 went in, all baremetal jobs are failing to finish cluster creation because CVO is failing to complete with this error: Cluster operator cloud-controller-manager Degraded is True with SyncingFailed: Failed when progressing towards operator: 4.10.0-0.ci.test-2021-12-16-204517-ci-op-d7q95m29-latest because &{%!e(string=failed to apply resources because CloudConfigControllerDegraded condition is set to True)} Looking at the cluster-version-operator operator resource, it contains an error message ("Cloud Config Controller failed to sync cloud config") that was added for the first time in the above commit: Last Transition Time: 2021-12-16T21:38:34Z Message: Cloud Config Controller failed to sync cloud config Reason: SyncingFailed Status: False Type: CloudConfigControllerAvailable Last Transition Time: 2021-12-16T21:38:34Z Message: Cloud Config Controller failed to sync cloud config Reason: SyncingFailed Status: True Type: CloudConfigControllerDegraded Last Transition Time: 2021-12-16T21:38:34Z Message: Failed when progressing towards operator: 4.10.0-0.ci.test-2021-12-16-210713-ci-ln-k4gx5wb-latest because &{%!e(string=failed to apply resources because CloudConfigControllerDegraded condition is set to True)} Reason: SyncingFailed Status: True Type: Degraded Last Transition Time: 2021-12-16T21:38:34Z Reason: AsExpected Status: False Type: Upgradeable Last Transition Time: 2021-12-16T21:38:34Z Message: Trusted CA Bundle Controller works as expected Reason: AsExpected Status: True Type: TrustedCABundleControllerControllerAvailable Last Transition Time: 2021-12-16T21:38:34Z Message: Trusted CA Bundle Controller works as expected Reason: AsExpected Status: False Type: TrustedCABundleControllerControllerDegraded Version-Release number of selected component (if applicable): How reproducible: ~100% https://search.ci.openshift.org/chart?maxAge=12h&type=build-log&search=failed%20to%20apply%20resources%20because%20CloudConfigControllerDegraded Steps to Reproduce: 1. deploy a baremetal cluster 2. wait for CVO to finish Actual results: CVO never reaches intended version because the cloud-controller-manager is Degraded Expected results: No operators are degraded and CVO reaches the desired version Additional info: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-baremetal-operator/208/pull-ci-openshift-cluster-baremetal-operator-master-e2e-metal-ipi-ovn-ipv6/1471581173510574080 https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-baremetal-operator/208/pull-ci-openshift-cluster-baremetal-operator-master-e2e-metal-ipi/1471581173460242432 https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/1241/pull-ci-openshift-cluster-network-operator-master-e2e-metal-ipi-ovn-ipv6/1471597466745835520
Also https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/openshift-origin-26703-nightly-4.10-e2e-metal-ipi/1471598282605072384
I tested Mike's patch on metal and it worked. We're starting to see more oVirt failures with the same issue, so it may need a similar fix. Its non-presence in the list of supported and unsupported platforms is suspicious, when the default is to enable: https://github.com/openshift/cluster-cloud-controller-manager-operator/blob/master/README.md#supported-platforms Note that there also seems to be an issue with posting events to the correct namespace (either that or with RBAC), judging by these log messages: 2021-12-16T21:16:04.957877011Z E1216 21:16:04.957646 1 event.go:264] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"cloud-controller-manager.16c158ca3c14e1a3", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"ClusterOperator", Namespace:"", Name:"cloud-controller-manager", UID:"10c9879a-aee8-44ac-8cef-3b58cc92f170", APIVersion:"config.openshift.io/v1", ResourceVersion:"3068", FieldPath:""}, Reason:"Status degraded", Message:"failed to apply resources because CloudConfigControllerDegraded condition is set to True", Source:v1.EventSource{Component:"cloud-controller-manager-operator", Host:""}, FirstTimestamp:time.Date(2021, time.December, 16, 21, 16, 4, 954210723, time.Local), LastTimestamp:time.Date(2021, time.December, 16, 21, 16, 4, 954210723, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events is forbidden: User "system:serviceaccount:openshift-cloud-controller-manager-operator:cluster-cloud-controller-manager" cannot create resource "events" in API group "" in the namespace "default"' (will not retry!)
*** Bug 2033722 has been marked as a duplicate of this bug. ***
Verified clusterversion: 4.10.0-0.nightly-2021-12-20-231053 BM cluster could be installed successfully. cloud-config sync is skiped on bm. $ oc get co [13:24:41] NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.10.0-0.nightly-2021-12-20-231053 True False False 84m baremetal 4.10.0-0.nightly-2021-12-20-231053 True False False 101m cloud-controller-manager 4.10.0-0.nightly-2021-12-20-231053 True False False 103m $ oc logs -f cluster-cloud-controller-manager-operator-6ffd6d8d9d-cr7sm -n openshift-cloud-controller-manager-operator -c config-sync-controllers I1221 05:18:48.864743 1 cloud_config_sync_controller.go:59] cloud-config sync is not needed, returning early
*** Bug 2036571 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056