Description of problem: While the raft leader is writing its snapshot, it may fail to send raft heartbeats. In order to alleviate this we should make OVSDB leader transfer leadership when it needs to write its snapshot.
Patch sent for review: https://patchwork.ozlabs.org/project/openvswitch/patch/20210506124731.3599531-1-i.maximets@ovn.org/
Using similar method from BZ#1964573 verified this RFE on openvswitch2.15-2.15.0-24.el8fdp.x86_64. For ovs2.15, we created much more resources than ovs2.13 to trigger a db compaction and snapshot. RPMs have been used: [root@wsfd-advnetlab35 ~]# rpm -qa |egrep "ovn|openvsw" openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch openvswitch2.15-2.15.0-24.el8fdp.x86_64 ovn-2021-21.03.0-40.el8fdp.x86_64 ovn-2021-host-21.03.0-40.el8fdp.x86_64 ovn-2021-central-21.03.0-40.el8fdp.x86_64 [root@wsfd-advnetlab35 ~]# Northbound db and Southbound db both had leadership transfer and snapshot events. [root@wsfd-advnetlab35 ~]# cat /var/log/ovn/ovsdb-server-nb.log | grep leader 2021-06-16T14:45:47.644Z|00003|raft|INFO|term 1: elected leader by 1+ of 1 servers 2021-06-16T16:24:28.365Z|00921|raft|INFO|Transferring leadership to write a snapshot. <-------- 2021-06-16T16:24:28.369Z|00922|raft|INFO|rejected append_reply (not leader) 2021-06-16T16:24:28.369Z|00923|raft|INFO|rejected append_reply (not leader) 2021-06-16T16:24:28.538Z|00925|raft|INFO|server 6496 is leader for term 3 [root@wsfd-advnetlab35 ~]# root@wsfd-advnetlab35 ~]# cat /var/log/ovn/ovsdb-server-sb.log | grep leader 2021-06-16T14:45:47.753Z|00003|raft|INFO|term 1: elected leader by 1+ of 1 servers 2021-06-16T15:52:17.278Z|00033|raft|INFO|Transferring leadership to write a snapshot. <-------- 2021-06-16T15:52:17.640Z|00034|raft|INFO|server 6881 is leader for term 2 [root@wsfd-advnetlab35 ~]# Captured Northbound db leadership transfer as below. We also observed that ovnnb_db.db did a compaction and reduced size from 10MB to 1.2MB. ############## Wed Jun 16 12:24:28 EDT 2021 59f6 Name: OVN_Northbound Cluster ID: eb61 (eb61e5f8-d97e-46f3-a796-f4f1b53e9b67) Server ID: 59f6 (59f6afaf-be98-4d23-8ae7-928f8245dd5d) Address: tcp:wsfd-advnetlab35.xyz:6643 Status: cluster member Role: leader Term: 1 Leader: self Vote: self Last Election started 5920500 ms ago, reason: timeout Last Election won: 5920499 ms ago Election timer: 1000 Log: [2, 20380] Entries not yet committed: 0 Entries not yet applied: 0 Connections: <-6496 ->6496 <-9585 ->9585 Disconnections: 0 Servers: 9585 (9585 at tcp:netqe6.xyz:6643) next_index=20380 match_index=20379 last msg 165 ms ago 6496 (6496 at tcp:netqe5.xyz:6643) next_index=20380 match_index=20379 last msg 165 ms ago 59f6 (59f6 at tcp:wsfd-advnetlab35.xyz:6643) (self) next_index=2 match_index=20379 total 27588 -rw-r-----. 1 root root 10485306 Jun 16 12:24 ovnnb_db.db -rw-r-----. 1 root root 10214438 Jun 16 12:24 ovnsb_db.db ############## Wed Jun 16 12:24:28 EDT 2021 59f6 Name: OVN_Northbound Cluster ID: eb61 (eb61e5f8-d97e-46f3-a796-f4f1b53e9b67) Server ID: 59f6 (59f6afaf-be98-4d23-8ae7-928f8245dd5d) Address: tcp:wsfd-advnetlab35.xyz:6643 Status: cluster member Role: follower Term: 3 Leader: 6496 Vote: 6496 Last Election started 5921008 ms ago, reason: timeout Last Election won: 5921007 ms ago Election timer: 1000 Log: [20381, 20383] Entries not yet committed: 0 Entries not yet applied: 0 Connections: <-6496 ->6496 <-9585 ->9585 Disconnections: 0 Servers: 9585 (9585 at tcp:netqe6.xyz:6643) last msg 119 ms ago 6496 (6496 at tcp:netqe5.xyz:6643) last msg 113 ms ago 59f6 (59f6 at tcp:wsfd-advnetlab35.xyz:6643) (self) total 18240 -rw-r-----. 1 root root 1224987 Jun 16 12:24 ovnnb_db.db -rw-r-----. 1 root root 10215771 Jun 16 12:24 ovnsb_db.db ##############
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (openvswitch2.15 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2509