During investigation of a bug we realized etcdHighNumberOfLeaderChanges was not firing in the fleet due to a copy-paste error when the query was refactored upstream. It was firing if rate was above 3/s, when it should be 3/15m. The alert should be fixed and backported since leader changes are symptomatic of insufficient resources or other disruption.
https://github.com/openshift/cluster-monitoring-operator/pull/591 is merged, moving to modified
Hello @paulfantom, could u help to take a look for this issue, thanks in advance!
Verified on 4.4 0303 etcdHighNumberOfLeaderChanges etcd cluster "etcd": 21.428571428571427 leader changes within the last 15 minutes. Frequent elections may be a sign of insufficient resources, high network latency, or disruptions by other components and should be investigated.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581