Even when running an idle OCP 4 cluster on Azure there are a lot of etcd leadership elections. Example: 2020-02-21 01:26:19.140279 I | raft: cf452c7e4ed8ffa9 is starting a new election at term 105 2020-02-21 01:26:20.440293 I | raft: cf452c7e4ed8ffa9 is starting a new election at term 106 2020-02-21 01:26:22.340240 I | raft: cf452c7e4ed8ffa9 is starting a new election at term 107 It seems that the fdatasync time on the Azure storage stack is regularly longer than the etcd hearbeat timeout configured by OCP. Please ensure that etcd is tuned appropriately for the characteristics of the underlying storage stack on Azure OCP clusters in order to reduce leadership elections. I don't know if more needs to be tuned than the heartbeat timeout; I also don't know what a suitable heartbeat timeout value for Azure is or what the tradeoff is between hard-coding an alternative value or making it tunable. I also don't know if there are any monitoring/alerting configuration changes that are needed if the heartbeat timeout is changed? Please ensure this work goes into 4.3.z.
*** Bug 1798785 has been marked as a duplicate of this bug. ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days