During investigation of a bug we realized etcdHighNumberOfLeaderChanges was not firing in the fleet due to a copy-paste error when the query was refactored upstream. It was firing if rate was above 3/s, when it should be 3/15m.
The alert should be fixed and backported since leader changes are symptomatic of insufficient resources or other disruption.
https://github.com/openshift/cluster-monitoring-operator/pull/591 is merged, moving to modified
Hello @paulfantom, could u help to take a look for this issue, thanks in advance!
Verified on 4.4 0303
etcd cluster "etcd": 21.428571428571427 leader changes within the last 15 minutes. Frequent elections may be a sign of insufficient resources, high network latency, or disruptions by other components and should be investigated.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.