GCP has never had a problem with etcd leadership changes, this needs triage to understand whether our alert is too tight, a bug happened, or a recent etcd change is now causing more leaser election changes. Since this is a significant source of recent problems, marking it as high and considering it a release blocker unless triaged. https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.4/1927 [Feature:Prometheus][Conformance] Prometheus when installed on the cluster [Top Level] [Feature:Prometheus][Conformance] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Suite:openshift/conformance/parallel/minimal] expand_less 1m22s fail [github.com/openshift/origin/test/extended/prometheus/prometheus_builds.go:163]: Expected <map[string]error | len:1>: { "ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards\",alertstate=\"firing\"} >= 1": { s: "promQL query: ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards\",alertstate=\"firing\"} >= 1 had reported incorrect results:\n[{\"metric\":{\"__name__\":\"ALERTS\",\"alertname\":\"etcdHighNumberOfLeaderChanges\",\"alertstate\":\"firing\",\"job\":\"etcd\",\"severity\":\"warning\"},\"value\":[1583543075.94,\"1\"]}]", }, } to be empty
Since 4.4 code is frozen, moving this BZ to 4.5.
We saw this in a production cluster moving from 4.4.4 to 4.4.10. Getting a must-gather now, but will attach to bug 1825000, which seems like the generic ticket tracking twitching etcdHighNumberOfLeaderChanges.