The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1963948 - [RHEL7] [RFE] Transfer RAFT leadership during snapshot writing
Summary: [RHEL7] [RFE] Transfer RAFT leadership during snapshot writing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovsdb2.13
Version: RHEL 8.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Ilya Maximets
QA Contact: Zhiqiang Fang
URL:
Whiteboard:
Depends On: 1960391
Blocks: 1943631
TreeView+ depends on / blocked
 
Reported: 2021-05-24 13:12 UTC by Ilya Maximets
Modified: 2021-06-21 14:44 UTC (History)
9 users (show)

Fixed In Version: openvswitch2.13-2.13.0-94.el7fdp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1960391
Environment:
Last Closed: 2021-06-21 14:44:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:2506 0 None None None 2021-06-21 14:44:21 UTC

Description Ilya Maximets 2021-05-24 13:12:13 UTC
Clone for ovsdb2.13 component.

+++ This bug was initially created as a clone of Bug #1960391 +++

Description of problem:
While the raft leader is writing its snapshot, it may fail to send raft heartbeats. In order to alleviate this we should make OVSDB leader transfer leadership when it needs to write its snapshot.

--- Additional comment from Ilya Maximets on 2021-05-13 18:55:26 UTC ---

Patch sent for review:
https://patchwork.ozlabs.org/project/openvswitch/patch/20210506124731.3599531-1-i.maximets@ovn.org/

Comment 4 Zhiqiang Fang 2021-06-17 03:01:19 UTC
Using same method from BZ#1964573 verified this RFE on openvswitch2.13-2.13.0-95.el7fdp.x86_64
Test bed: a 3-host ovn raft cluster and a ovn chassis (a host installed ovn-controller).
Method to trigger snapshot:
  To trigger a snapshot the rule is that database should grow more than 50% and be at least more than 10MB. After 10-20 minutes ovsdb-server will check and decide to compact/create a snapshot.
  In this test, the way to increase db is to add 3000 lsp in short period of time.

RPMs have been used:

[root@wsfd-advnetlab35 ~]# rpm -aq | egrep "ovn|openv"
openvswitch-selinux-extra-policy-1.0-18.el7fdp.noarch
ovn2.13-central-20.12.0-135.el7fdp.x86_64
ovn2.13-host-20.12.0-135.el7fdp.x86_64
openvswitch2.13-2.13.0-95.el7fdp.x86_64
ovn2.13-20.12.0-135.el7fdp.x86_64
[root@wsfd-advnetlab35 ~]# 


OVN_Southbound db leadership transfer:

#cat /var/log/ovn/ovsdb-server-nb.log
...
2021-06-16T19:59:30.812Z|00041|raft|INFO|Transferring leadership to write a snapshot.
2021-06-16T19:59:30.812Z|00042|raft|INFO|rejected append_reply (not leader)
...
2021-06-16T19:59:31.860Z|00060|raft|INFO|rejected append_reply (not leader)
2021-06-16T19:59:31.860Z|00061|raft|INFO|server 037c is leader for term 2
2021-06-16T20:02:36.933Z|00062|raft|INFO|received leadership transfer from 037c in term 2
2021-06-16T20:02:36.933Z|00063|raft|INFO|term 3: starting election
2021-06-16T20:02:36.934Z|00064|raft|INFO|term 3: elected leader by 2+ of 3 servers
...


# ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound

##############
Wed Jun 16 15:59:30 EDT 2021
079b
Name: OVN_Southbound
Cluster ID: d908 (d9082509-96e1-4d97-b777-7fa7cc472cd1)
Server ID: 079b (079b530e-1030-4204-90e0-3413beac73df)
Address: tcp:wsfd-advnetlab35.xyz:6644
Status: cluster member
Role: leader
Term: 1
Leader: self
Vote: self

Election timer: 1000
Log: [2, 3002]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: <-037c ->037c <-1774 ->1774
Servers:
    037c (037c at tcp:netqe5.xyz:6644) next_index=3002 match_index=3001
    079b (079b at tcp:wsfd-advnetlab35.xyz:6644) (self) next_index=2 match_index=3001
    1774 (1774 at tcp:netqe6.xyz:6644) next_index=3002 match_index=3001
##############
Wed Jun 16 15:59:30 EDT 2021
079b
Name: OVN_Southbound
Cluster ID: d908 (d9082509-96e1-4d97-b777-7fa7cc472cd1)
Server ID: 079b (079b530e-1030-4204-90e0-3413beac73df)
Address: tcp:wsfd-advnetlab35.xyz:6644
Status: cluster member
Role: follower
Term: 1
Leader: unknown
Vote: self

Election timer: 1000
Log: [3002, 3002]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: <-037c ->037c <-1774 ->1774
Servers:
    037c (037c at tcp:netqe5.xyz:6644)
    079b (079b at tcp:wsfd-advnetlab35.xyz:6644) (self)
    1774 (1774 at tcp:netqe6.xyz:6644)
##############
Wed Jun 16 15:59:32 EDT 2021
079b
Name: OVN_Southbound
Cluster ID: d908 (d9082509-96e1-4d97-b777-7fa7cc472cd1)
Server ID: 079b (079b530e-1030-4204-90e0-3413beac73df)
Address: tcp:wsfd-advnetlab35.xyz:6644
Status: cluster member
Role: follower
Term: 2
Leader: 037c
Vote: 037c

Election timer: 1000
Log: [3002, 3003]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: <-037c ->037c <-1774 ->1774
Servers:
    037c (037c at tcp:netqe5.xyz:6644)
    079b (079b at tcp:wsfd-advnetlab35.xyz:6644) (self)
    1774 (1774 at tcp:netqe6.xyz:6644)
##############



OVN_Northbound db leadership transfer:

# cat /var/log/ovn/ovsdb-server-nb.log
...
2021-06-16T20:04:19.189Z|00224|raft|INFO|Transferring leadership to write a snapshot.
2021-06-16T20:04:19.190Z|00225|raft|INFO|rejected append_reply (not leader)
2021-06-16T20:04:19.703Z|00226|raft|INFO|rejected append_reply (not leader)
2021-06-16T20:04:19.705Z|00227|raft|INFO|server 2e70 is leader for term 2
...



# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound

##############
Wed Jun 16 16:04:18 EDT 2021
3793
Name: OVN_Northbound
Cluster ID: d0d5 (d0d58acf-16c4-4a5b-b646-3cd9c2961a16)
Server ID: 3793 (3793f874-4202-4ddb-871d-0544671483df)
Address: tcp:wsfd-advnetlab35.xyz:6643
Status: cluster member
Role: leader
Term: 1
Leader: self
Vote: self

Election timer: 1000
Log: [2, 5999]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: <-7f27 ->7f27 <-2e70 ->2e70
Servers:
    2e70 (2e70 at tcp:netqe6.xyz:6643) next_index=5999 match_index=5998
    3793 (3793 at tcp:wsfd-advnetlab35.xyz:6643) (self) next_index=2 match_index=5998
    7f27 (7f27 at tcp:netqe5.xyz:6643) next_index=5999 match_index=5998
##############
Wed Jun 16 16:04:19 EDT 2021
3793
Name: OVN_Northbound
Cluster ID: d0d5 (d0d58acf-16c4-4a5b-b646-3cd9c2961a16)
Server ID: 3793 (3793f874-4202-4ddb-871d-0544671483df)
Address: tcp:wsfd-advnetlab35.xyz:6643
Status: cluster member
Role: follower
Term: 1
Leader: unknown
Vote: self

Election timer: 1000
Log: [5999, 5999]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: <-7f27 ->7f27 <-2e70 ->2e70
Servers:
    2e70 (2e70 at tcp:netqe6.xyz:6643)
    3793 (3793 at tcp:wsfd-advnetlab35.xyz:6643) (self)
    7f27 (7f27 at tcp:netqe5.xyz:6643)
##############
Wed Jun 16 16:04:20 EDT 2021
3793
Name: OVN_Northbound
Cluster ID: d0d5 (d0d58acf-16c4-4a5b-b646-3cd9c2961a16)
Server ID: 3793 (3793f874-4202-4ddb-871d-0544671483df)
Address: tcp:wsfd-advnetlab35.xyz:6643
Status: cluster member
Role: follower
Term: 2
Leader: 2e70
Vote: 2e70

Election timer: 1000
Log: [5999, 6000]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: <-7f27 ->7f27 <-2e70 ->2e70
Servers:
    2e70 (2e70 at tcp:netqe6.xyz:6643)
    3793 (3793 at tcp:wsfd-advnetlab35.xyz:6643) (self)
    7f27 (7f27 at tcp:netqe5.xyz:6643)
##############

Comment 6 errata-xmlrpc 2021-06-21 14:44:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (openvswitch2.13 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2506


Note You need to log in before you can comment on or make changes to this bug.