Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1876793

Summary: raft doesn't work when restart cluster after delete ovnsb_db.db
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Jianlin Shi <jishi>
Component: ovn2.13Assignee: OVN Team <ovnteam>
Status: CLOSED WONTFIX QA Contact: Jianlin Shi <jishi>
Severity: medium Docs Contact:
Priority: medium    
Version: FDP 20.ECC: ctrautma, jishi, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-02-14 21:11:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jianlin Shi 2020-09-08 07:56:47 UTC
Description of problem:
raft doesn't work when restart cluster after delete ovnsb_db.db

Version-Release number of selected component (if applicable):
ovn20.06.2-3

How reproducible:
Always

Steps to Reproduce:
1. start cluster

# master
ctl_cmd="/usr/share/ovn/scripts/ovn-ctl"
ip_s=1.1.1.16
ip_c1=1.1.1.17
ip_c2=1.1.1.18
$ctl_cmd --db-nb-addr=$ip_s --db-nb-create-insecure-remote=yes \
                        --db-sb-addr=$ip_s --db-sb-create-insecure-remote=yes \
                        --db-nb-cluster-local-addr=$ip_s --db-sb-cluster-local-addr=$ip_s \
                        --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \
                        --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd

# slave1
ctl_cmd=/usr/share/ovn/scripts/ovn-ctl                                                                
ip_s=1.1.1.16
ip_c1=1.1.1.17
ip_c2=1.1.1.18                                                                                        

$ctl_cmd --db-nb-addr=$ip_c1 --db-nb-create-insecure-remote=yes \                                     
                        --db-sb-addr=$ip_c1 --db-sb-create-insecure-remote=yes \
                        --db-nb-cluster-local-addr=$ip_c1 --db-sb-cluster-local-addr=$ip_c1 \
                        --db-nb-cluster-remote-addr=$ip_s --db-sb-cluster-remote-addr=$ip_s \
                        --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \
                        --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd
# slave 2
ctl_cmd=/usr/share/ovn/scripts/ovn-ctl
ip_s=1.1.1.16
ip_c1=1.1.1.17
ip_c2=1.1.1.18

$ctl_cmd --db-nb-addr=$ip_c2 --db-nb-create-insecure-remote=yes \
                        --db-sb-addr=$ip_c2 --db-sb-create-insecure-remote=yes \
                        --db-nb-cluster-local-addr=$ip_c2 --db-sb-cluster-local-addr=$ip_c2 \
                        --db-nb-cluster-remote-addr=$ip_s --db-sb-cluster-remote-addr=$ip_s \
                        --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \
                        --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd

2. delete ovnsb_db.db and restart cluster

ctl_cmd="/usr/share/ovn/scripts/ovn-ctl"                                                              
ip_s=1.1.1.16                                                                                         
ip_c1=1.1.1.17                                                                                        
ip_c2=1.1.1.18                                                                                        

rm /etc/ovn/ovnsb_db.db -f
ssh -q $ip_c1 rm /etc/ovn/ovnsb_db.db -f
ssh -q $ip_c2 rm /etc/ovn/ovnsb_db.db -f



ssh -q $ip_c1 $ctl_cmd --db-nb-addr=$ip_c1 --db-nb-create-insecure-remote=yes \
                        --db-sb-addr=$ip_c1 --db-sb-create-insecure-remote=yes \
                        --db-nb-cluster-local-addr=$ip_c1 --db-sb-cluster-local-addr=$ip_c1 \
                        --db-nb-cluster-remote-addr=$ip_s --db-sb-cluster-remote-addr=$ip_s \
                        --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \
                        --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 restart_northd


ssh -q $ip_c2 $ctl_cmd --db-nb-addr=$ip_c2 --db-nb-create-insecure-remote=yes \
                        --db-sb-addr=$ip_c2 --db-sb-create-insecure-remote=yes \
                        --db-nb-cluster-local-addr=$ip_c2 --db-sb-cluster-local-addr=$ip_c2 \
                        --db-nb-cluster-remote-addr=$ip_s --db-sb-cluster-remote-addr=$ip_s \
                        --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \
                        --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 restart_northd

$ctl_cmd --db-nb-addr=$ip_s --db-nb-create-insecure-remote=yes \
                        --db-sb-addr=$ip_s --db-sb-create-insecure-remote=yes \
                        --db-nb-cluster-local-addr=$ip_s --db-sb-cluster-local-addr=$ip_s \
                        --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \
                        --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 restart_northd

3. check sb status with: ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound

Actual results:
[root@wsfd-advnetlab17 bz1829109]# ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound
8caf
Name: OVN_Southbound                                                                                  
Cluster ID: 96cc (96cc6df5-ff59-4f15-9cbf-52446c6148bd)
Server ID: 8caf (8cafc269-8e76-47ca-9bd3-c0dc341b6b1a)                                                
Address: tcp:1.1.1.17:6644                                                                            
Status: cluster member                                                                                
Role: follower
Term: 226
Leader: unknown
Vote: 41b1

Election timer: 1000                                                                                  
Log: [2, 10]                                                                                          
Entries not yet committed: 0                                                                          
Entries not yet applied: 0                                                                            
Connections: ->0000 ->e87c <-41b1                                                                     
Servers:
    8caf (8caf at tcp:1.1.1.17:6644) (self)                                                           
    e87c (e87c at tcp:1.1.1.18:6644)                                                                  
    dcf6 (dcf6 at tcp:1.1.1.17:6644)   

<=== there are two for 1.1.1.17 and 1.1.1.18
                                                               
    902e (902e at tcp:1.1.1.16:6644)                                                                  
    41b1 (41b1 at tcp:1.1.1.18:6644)

Expected results:
raft works well

Additional info:

if restart master at first, the issue doesn't occur.

Comment 1 Jianlin Shi 2020-09-08 07:58:29 UTC
[root@wsfd-advnetlab16 bz1829109]# rpm -qa | grep -E "openvswitch|ovn"                                
ovn2.13-20.06.2-3.el8fdp.x86_64                                                                       
kernel-kernel-networking-openvswitch-ovn_ha-1.0-57.noarch                                             
ovn2.13-central-20.06.2-3.el8fdp.x86_64                                                               
openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch                                                 
ovn2.13-host-20.06.2-3.el8fdp.x86_64                                                                  
openvswitch2.13-2.13.0-58.el8fdp.x86_64                                                               
[root@wsfd-advnetlab16 bz1829109]# ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbou
nd                                                                                                    
497e                                                                                                  
Name: OVN_Southbound                                                                                  
Cluster ID: e1fe (e1fe58fa-6f10-4b4b-8357-f410c189b7bc)                                               
Server ID: 497e (497e68b8-b98b-4fc8-a332-55909876b83b)                                                
Address: tcp:1.1.1.16:6644                                                                            
Status: cluster member                                                                                
Role: leader                                                                                          
Term: 1                                                                                               
Leader: self                                                                                          
Vote: self                                                                                            
                                                                                                      
Election timer: 1000                                                                                  
Log: [2, 5]                                                                                           
Entries not yet committed: 0                                                                          
Entries not yet applied: 0                                                                            
Connections: <-0000 <-0000                                                                            
Servers:                                                                                              
    497e (497e at tcp:1.1.1.16:6644) (self) next_index=2 match_index=4

<== only one on master

Comment 2 Jianlin Shi 2020-09-08 08:02:14 UTC
even run ovn-appctl -t ovn-northd sb-cluster-state-reset on every node, raft still doesn't work

Comment 3 OVN Bot 2024-02-14 21:11:16 UTC
This issue is being closed as an automatic process due to the issue's age. If you wish to re-open this issue, please do so in Jira (https://issues.redhat.com) in the 'FDP' project. Please be sure to set the component to the latest OVN version where this issue is known to occur. If this is a feature request or improvement, please set the component to 'OVN'.