Bug 1960393

Summary: OVSDB RAFT leader should not miss sending heartbeats due to client handling
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Tim Rozet <trozet>
Component: ovsdbAssignee: Anton Ivanov <anivanov>
Status: CLOSED WONTFIX QA Contact: Jianlin Shi <jishi>
Severity: urgent Docs Contact:
Priority: high    
Version: RHEL 8.0CC: anivanov, ctrautma, dcbw, jhsiao, jishi, mmichels, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-07-28 19:27:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1943631, 1959597, 1962951    

Description Tim Rozet 2021-05-13 18:49:35 UTC
Description of problem:
We see that at scale with many OVSDB clients that the raft leader is so busy in its event loop that it doesn't have time to send the raft heartbeat message. Due to this the election timer fires off, which leads to a leadership change. However then the new leader has the same problem and thus...endless election thrashing. OVSDB should be able to detect that it is approaching the deadline to send the heartbeat, break out of its client loop and send the heartbeat.

Comment 1 Dan Williams 2021-05-26 16:56:48 UTC
Anton had some PoC code for this at https://github.com/kot-begemot-uk/ovs/commit/056417310763e6a55468bd954ac5c47e45205390

Reassigning to him to finish that patch and push upstream.

More context is at https://bugzilla.redhat.com/show_bug.cgi?id=1943631#c25

Comment 5 Tim Rozet 2022-02-21 19:58:03 UTC
Anton are you still going to work on this patch?

Comment 6 Anton Ivanov 2022-02-22 09:06:45 UTC
There is no interest upstream, no interest in reviewing and no interest in this. So why?

Comment 7 Mark Michelson 2023-07-28 19:27:09 UTC
I am closing this since nothing has been done on the posted patch series in a while and it appears this has withered on the vine. I also suspect that the numerous performance improvements made to OVSDB since May 2021 have lessened the severity of this issue. Tim, if this is still an observed problem, then feel free to re-open.