Description of problem: We see that at scale with many OVSDB clients that the raft leader is so busy in its event loop that it doesn't have time to send the raft heartbeat message. Due to this the election timer fires off, which leads to a leadership change. However then the new leader has the same problem and thus...endless election thrashing. OVSDB should be able to detect that it is approaching the deadline to send the heartbeat, break out of its client loop and send the heartbeat.
Anton had some PoC code for this at https://github.com/kot-begemot-uk/ovs/commit/056417310763e6a55468bd954ac5c47e45205390 Reassigning to him to finish that patch and push upstream. More context is at https://bugzilla.redhat.com/show_bug.cgi?id=1943631#c25
v1 posted upstream: http://patchwork.ozlabs.org/project/openvswitch/patch/20210608092708.15711-1-anton.ivanov@cambridgegreys.com/
v8 posted: http://patchwork.ozlabs.org/project/openvswitch/patch/20210817180035.10909-1-anton.ivanov@cambridgegreys.com/
v9 posted: http://patchwork.ozlabs.org/project/openvswitch/patch/20210820091630.7334-1-anton.ivanov@cambridgegreys.com/
Anton are you still going to work on this patch?
There is no interest upstream, no interest in reviewing and no interest in this. So why?
I am closing this since nothing has been done on the posted patch series in a while and it appears this has withered on the vine. I also suspect that the numerous performance improvements made to OVSDB since May 2021 have lessened the severity of this issue. Tim, if this is still an observed problem, then feel free to re-open.