Description of problem:
User should be notified when GC stops working because it cannot sync resources.
Also observed in https://bugzilla.redhat.com/show_bug.cgi?id=2050912
Steps to Reproduce:
- when resource discovery is not working properly / there are network disruptions
- or when some CRDs / APIServices are broken
no reporting today unless we count KCM logs
KubeControllerManager should degrade and GarbageCollectorSyncFailed alert should be fired
I will add instruction on how to reproduce this next week
Checked with latest payload:
oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2022-06-25-132614 True False 115m Error while reconciling 4.11.0-0.nightly-2022-06-25-132614: the cluster operator kube-controller-manager is degraded
Could see logs from KCM:
E0628 09:51:37.910863 1 shared_informer.go:258] unable to sync caches for garbage collector
E0628 09:51:37.910875 1 garbagecollector.go:245] timed out waiting for dependency graph builder sync during GC sync (attempt 176)
oc get co kube-controller-manager
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
kube-controller-manager 4.11.0-0.nightly-2022-06-25-132614 True False True 130m GarbageCollectorDegraded: alerts firing: GarbageCollectorSyncFailed
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.