Clone for ovsdb2.13 component. +++ This bug was initially created as a clone of Bug #1960391 +++ Description of problem: While the raft leader is writing its snapshot, it may fail to send raft heartbeats. In order to alleviate this we should make OVSDB leader transfer leadership when it needs to write its snapshot. --- Additional comment from Ilya Maximets on 2021-05-13 18:55:26 UTC --- Patch sent for review: https://patchwork.ozlabs.org/project/openvswitch/patch/20210506124731.3599531-1-i.maximets@ovn.org/
Using same method from BZ#1964573 verified this RFE on openvswitch2.13-2.13.0-95.el7fdp.x86_64 Test bed: a 3-host ovn raft cluster and a ovn chassis (a host installed ovn-controller). Method to trigger snapshot: To trigger a snapshot the rule is that database should grow more than 50% and be at least more than 10MB. After 10-20 minutes ovsdb-server will check and decide to compact/create a snapshot. In this test, the way to increase db is to add 3000 lsp in short period of time. RPMs have been used: [root@wsfd-advnetlab35 ~]# rpm -aq | egrep "ovn|openv" openvswitch-selinux-extra-policy-1.0-18.el7fdp.noarch ovn2.13-central-20.12.0-135.el7fdp.x86_64 ovn2.13-host-20.12.0-135.el7fdp.x86_64 openvswitch2.13-2.13.0-95.el7fdp.x86_64 ovn2.13-20.12.0-135.el7fdp.x86_64 [root@wsfd-advnetlab35 ~]# OVN_Southbound db leadership transfer: #cat /var/log/ovn/ovsdb-server-nb.log ... 2021-06-16T19:59:30.812Z|00041|raft|INFO|Transferring leadership to write a snapshot. 2021-06-16T19:59:30.812Z|00042|raft|INFO|rejected append_reply (not leader) ... 2021-06-16T19:59:31.860Z|00060|raft|INFO|rejected append_reply (not leader) 2021-06-16T19:59:31.860Z|00061|raft|INFO|server 037c is leader for term 2 2021-06-16T20:02:36.933Z|00062|raft|INFO|received leadership transfer from 037c in term 2 2021-06-16T20:02:36.933Z|00063|raft|INFO|term 3: starting election 2021-06-16T20:02:36.934Z|00064|raft|INFO|term 3: elected leader by 2+ of 3 servers ... # ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound ############## Wed Jun 16 15:59:30 EDT 2021 079b Name: OVN_Southbound Cluster ID: d908 (d9082509-96e1-4d97-b777-7fa7cc472cd1) Server ID: 079b (079b530e-1030-4204-90e0-3413beac73df) Address: tcp:wsfd-advnetlab35.xyz:6644 Status: cluster member Role: leader Term: 1 Leader: self Vote: self Election timer: 1000 Log: [2, 3002] Entries not yet committed: 0 Entries not yet applied: 0 Connections: <-037c ->037c <-1774 ->1774 Servers: 037c (037c at tcp:netqe5.xyz:6644) next_index=3002 match_index=3001 079b (079b at tcp:wsfd-advnetlab35.xyz:6644) (self) next_index=2 match_index=3001 1774 (1774 at tcp:netqe6.xyz:6644) next_index=3002 match_index=3001 ############## Wed Jun 16 15:59:30 EDT 2021 079b Name: OVN_Southbound Cluster ID: d908 (d9082509-96e1-4d97-b777-7fa7cc472cd1) Server ID: 079b (079b530e-1030-4204-90e0-3413beac73df) Address: tcp:wsfd-advnetlab35.xyz:6644 Status: cluster member Role: follower Term: 1 Leader: unknown Vote: self Election timer: 1000 Log: [3002, 3002] Entries not yet committed: 0 Entries not yet applied: 0 Connections: <-037c ->037c <-1774 ->1774 Servers: 037c (037c at tcp:netqe5.xyz:6644) 079b (079b at tcp:wsfd-advnetlab35.xyz:6644) (self) 1774 (1774 at tcp:netqe6.xyz:6644) ############## Wed Jun 16 15:59:32 EDT 2021 079b Name: OVN_Southbound Cluster ID: d908 (d9082509-96e1-4d97-b777-7fa7cc472cd1) Server ID: 079b (079b530e-1030-4204-90e0-3413beac73df) Address: tcp:wsfd-advnetlab35.xyz:6644 Status: cluster member Role: follower Term: 2 Leader: 037c Vote: 037c Election timer: 1000 Log: [3002, 3003] Entries not yet committed: 0 Entries not yet applied: 0 Connections: <-037c ->037c <-1774 ->1774 Servers: 037c (037c at tcp:netqe5.xyz:6644) 079b (079b at tcp:wsfd-advnetlab35.xyz:6644) (self) 1774 (1774 at tcp:netqe6.xyz:6644) ############## OVN_Northbound db leadership transfer: # cat /var/log/ovn/ovsdb-server-nb.log ... 2021-06-16T20:04:19.189Z|00224|raft|INFO|Transferring leadership to write a snapshot. 2021-06-16T20:04:19.190Z|00225|raft|INFO|rejected append_reply (not leader) 2021-06-16T20:04:19.703Z|00226|raft|INFO|rejected append_reply (not leader) 2021-06-16T20:04:19.705Z|00227|raft|INFO|server 2e70 is leader for term 2 ... # ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound ############## Wed Jun 16 16:04:18 EDT 2021 3793 Name: OVN_Northbound Cluster ID: d0d5 (d0d58acf-16c4-4a5b-b646-3cd9c2961a16) Server ID: 3793 (3793f874-4202-4ddb-871d-0544671483df) Address: tcp:wsfd-advnetlab35.xyz:6643 Status: cluster member Role: leader Term: 1 Leader: self Vote: self Election timer: 1000 Log: [2, 5999] Entries not yet committed: 0 Entries not yet applied: 0 Connections: <-7f27 ->7f27 <-2e70 ->2e70 Servers: 2e70 (2e70 at tcp:netqe6.xyz:6643) next_index=5999 match_index=5998 3793 (3793 at tcp:wsfd-advnetlab35.xyz:6643) (self) next_index=2 match_index=5998 7f27 (7f27 at tcp:netqe5.xyz:6643) next_index=5999 match_index=5998 ############## Wed Jun 16 16:04:19 EDT 2021 3793 Name: OVN_Northbound Cluster ID: d0d5 (d0d58acf-16c4-4a5b-b646-3cd9c2961a16) Server ID: 3793 (3793f874-4202-4ddb-871d-0544671483df) Address: tcp:wsfd-advnetlab35.xyz:6643 Status: cluster member Role: follower Term: 1 Leader: unknown Vote: self Election timer: 1000 Log: [5999, 5999] Entries not yet committed: 0 Entries not yet applied: 0 Connections: <-7f27 ->7f27 <-2e70 ->2e70 Servers: 2e70 (2e70 at tcp:netqe6.xyz:6643) 3793 (3793 at tcp:wsfd-advnetlab35.xyz:6643) (self) 7f27 (7f27 at tcp:netqe5.xyz:6643) ############## Wed Jun 16 16:04:20 EDT 2021 3793 Name: OVN_Northbound Cluster ID: d0d5 (d0d58acf-16c4-4a5b-b646-3cd9c2961a16) Server ID: 3793 (3793f874-4202-4ddb-871d-0544671483df) Address: tcp:wsfd-advnetlab35.xyz:6643 Status: cluster member Role: follower Term: 2 Leader: 2e70 Vote: 2e70 Election timer: 1000 Log: [5999, 6000] Entries not yet committed: 0 Entries not yet applied: 0 Connections: <-7f27 ->7f27 <-2e70 ->2e70 Servers: 2e70 (2e70 at tcp:netqe6.xyz:6643) 3793 (3793 at tcp:wsfd-advnetlab35.xyz:6643) (self) 7f27 (7f27 at tcp:netqe5.xyz:6643) ##############
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (openvswitch2.13 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2506