The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1960391 - [RFE] Transfer RAFT leadership during snapshot writing
Summary: [RFE] Transfer RAFT leadership during snapshot writing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovsdb2.15
Version: RHEL 8.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Ilya Maximets
QA Contact: Zhiqiang Fang
URL:
Whiteboard:
Depends On:
Blocks: 1943631 1963948
TreeView+ depends on / blocked
 
Reported: 2021-05-13 18:40 UTC by Tim Rozet
Modified: 2021-06-21 14:25 UTC (History)
7 users (show)

Fixed In Version: openvswitch2.15-2.15.0-21.el8fdp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1963948 (view as bug list)
Environment:
Last Closed: 2021-06-21 14:25:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:2509 0 None None None 2021-06-21 14:25:33 UTC

Description Tim Rozet 2021-05-13 18:40:10 UTC
Description of problem:
While the raft leader is writing its snapshot, it may fail to send raft heartbeats. In order to alleviate this we should make OVSDB leader transfer leadership when it needs to write its snapshot.

Comment 4 Zhiqiang Fang 2021-06-16 16:42:39 UTC
Using similar method from BZ#1964573 verified this RFE on openvswitch2.15-2.15.0-24.el8fdp.x86_64.
For ovs2.15, we created much more resources than ovs2.13 to trigger a db compaction and snapshot.


RPMs have been used:

[root@wsfd-advnetlab35 ~]# rpm -qa |egrep "ovn|openvsw"
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch
openvswitch2.15-2.15.0-24.el8fdp.x86_64
ovn-2021-21.03.0-40.el8fdp.x86_64
ovn-2021-host-21.03.0-40.el8fdp.x86_64
ovn-2021-central-21.03.0-40.el8fdp.x86_64
[root@wsfd-advnetlab35 ~]# 


Northbound db and Southbound db both had leadership transfer and snapshot events.


[root@wsfd-advnetlab35 ~]# cat /var/log/ovn/ovsdb-server-nb.log | grep leader
2021-06-16T14:45:47.644Z|00003|raft|INFO|term 1: elected leader by 1+ of 1 servers
2021-06-16T16:24:28.365Z|00921|raft|INFO|Transferring leadership to write a snapshot.           <--------
2021-06-16T16:24:28.369Z|00922|raft|INFO|rejected append_reply (not leader)
2021-06-16T16:24:28.369Z|00923|raft|INFO|rejected append_reply (not leader)
2021-06-16T16:24:28.538Z|00925|raft|INFO|server 6496 is leader for term 3
[root@wsfd-advnetlab35 ~]# 


root@wsfd-advnetlab35 ~]# cat /var/log/ovn/ovsdb-server-sb.log | grep leader
2021-06-16T14:45:47.753Z|00003|raft|INFO|term 1: elected leader by 1+ of 1 servers
2021-06-16T15:52:17.278Z|00033|raft|INFO|Transferring leadership to write a snapshot.           <--------
2021-06-16T15:52:17.640Z|00034|raft|INFO|server 6881 is leader for term 2
[root@wsfd-advnetlab35 ~]# 


Captured Northbound db leadership transfer as below. We also observed that ovnnb_db.db did a compaction and reduced size from 10MB to 1.2MB. 


##############
Wed Jun 16 12:24:28 EDT 2021
59f6
Name: OVN_Northbound
Cluster ID: eb61 (eb61e5f8-d97e-46f3-a796-f4f1b53e9b67)
Server ID: 59f6 (59f6afaf-be98-4d23-8ae7-928f8245dd5d)
Address: tcp:wsfd-advnetlab35.xyz:6643
Status: cluster member
Role: leader
Term: 1
Leader: self
Vote: self

Last Election started 5920500 ms ago, reason: timeout
Last Election won: 5920499 ms ago
Election timer: 1000
Log: [2, 20380]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: <-6496 ->6496 <-9585 ->9585
Disconnections: 0
Servers:
    9585 (9585 at tcp:netqe6.xyz:6643) next_index=20380 match_index=20379 last msg 165 ms ago
    6496 (6496 at tcp:netqe5.xyz:6643) next_index=20380 match_index=20379 last msg 165 ms ago
    59f6 (59f6 at tcp:wsfd-advnetlab35.xyz:6643) (self) next_index=2 match_index=20379
total 27588
-rw-r-----. 1 root root 10485306 Jun 16 12:24 ovnnb_db.db
-rw-r-----. 1 root root 10214438 Jun 16 12:24 ovnsb_db.db
##############
Wed Jun 16 12:24:28 EDT 2021
59f6
Name: OVN_Northbound
Cluster ID: eb61 (eb61e5f8-d97e-46f3-a796-f4f1b53e9b67)
Server ID: 59f6 (59f6afaf-be98-4d23-8ae7-928f8245dd5d)
Address: tcp:wsfd-advnetlab35.xyz:6643
Status: cluster member
Role: follower
Term: 3
Leader: 6496
Vote: 6496

Last Election started 5921008 ms ago, reason: timeout
Last Election won: 5921007 ms ago
Election timer: 1000
Log: [20381, 20383]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: <-6496 ->6496 <-9585 ->9585
Disconnections: 0
Servers:
    9585 (9585 at tcp:netqe6.xyz:6643) last msg 119 ms ago
    6496 (6496 at tcp:netqe5.xyz:6643) last msg 113 ms ago
    59f6 (59f6 at tcp:wsfd-advnetlab35.xyz:6643) (self)
total 18240
-rw-r-----. 1 root root  1224987 Jun 16 12:24 ovnnb_db.db
-rw-r-----. 1 root root 10215771 Jun 16 12:24 ovnsb_db.db
##############

Comment 6 errata-xmlrpc 2021-06-21 14:25:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (openvswitch2.15 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2509


Note You need to log in before you can comment on or make changes to this bug.