Description of problem: raft doesn't work when restart cluster after delete ovnsb_db.db Version-Release number of selected component (if applicable): ovn20.06.2-3 How reproducible: Always Steps to Reproduce: 1. start cluster # master ctl_cmd="/usr/share/ovn/scripts/ovn-ctl" ip_s=1.1.1.16 ip_c1=1.1.1.17 ip_c2=1.1.1.18 $ctl_cmd --db-nb-addr=$ip_s --db-nb-create-insecure-remote=yes \ --db-sb-addr=$ip_s --db-sb-create-insecure-remote=yes \ --db-nb-cluster-local-addr=$ip_s --db-sb-cluster-local-addr=$ip_s \ --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \ --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd # slave1 ctl_cmd=/usr/share/ovn/scripts/ovn-ctl ip_s=1.1.1.16 ip_c1=1.1.1.17 ip_c2=1.1.1.18 $ctl_cmd --db-nb-addr=$ip_c1 --db-nb-create-insecure-remote=yes \ --db-sb-addr=$ip_c1 --db-sb-create-insecure-remote=yes \ --db-nb-cluster-local-addr=$ip_c1 --db-sb-cluster-local-addr=$ip_c1 \ --db-nb-cluster-remote-addr=$ip_s --db-sb-cluster-remote-addr=$ip_s \ --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \ --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd # slave 2 ctl_cmd=/usr/share/ovn/scripts/ovn-ctl ip_s=1.1.1.16 ip_c1=1.1.1.17 ip_c2=1.1.1.18 $ctl_cmd --db-nb-addr=$ip_c2 --db-nb-create-insecure-remote=yes \ --db-sb-addr=$ip_c2 --db-sb-create-insecure-remote=yes \ --db-nb-cluster-local-addr=$ip_c2 --db-sb-cluster-local-addr=$ip_c2 \ --db-nb-cluster-remote-addr=$ip_s --db-sb-cluster-remote-addr=$ip_s \ --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \ --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd 2. delete ovnsb_db.db and restart cluster ctl_cmd="/usr/share/ovn/scripts/ovn-ctl" ip_s=1.1.1.16 ip_c1=1.1.1.17 ip_c2=1.1.1.18 rm /etc/ovn/ovnsb_db.db -f ssh -q $ip_c1 rm /etc/ovn/ovnsb_db.db -f ssh -q $ip_c2 rm /etc/ovn/ovnsb_db.db -f ssh -q $ip_c1 $ctl_cmd --db-nb-addr=$ip_c1 --db-nb-create-insecure-remote=yes \ --db-sb-addr=$ip_c1 --db-sb-create-insecure-remote=yes \ --db-nb-cluster-local-addr=$ip_c1 --db-sb-cluster-local-addr=$ip_c1 \ --db-nb-cluster-remote-addr=$ip_s --db-sb-cluster-remote-addr=$ip_s \ --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \ --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 restart_northd ssh -q $ip_c2 $ctl_cmd --db-nb-addr=$ip_c2 --db-nb-create-insecure-remote=yes \ --db-sb-addr=$ip_c2 --db-sb-create-insecure-remote=yes \ --db-nb-cluster-local-addr=$ip_c2 --db-sb-cluster-local-addr=$ip_c2 \ --db-nb-cluster-remote-addr=$ip_s --db-sb-cluster-remote-addr=$ip_s \ --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \ --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 restart_northd $ctl_cmd --db-nb-addr=$ip_s --db-nb-create-insecure-remote=yes \ --db-sb-addr=$ip_s --db-sb-create-insecure-remote=yes \ --db-nb-cluster-local-addr=$ip_s --db-sb-cluster-local-addr=$ip_s \ --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \ --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 restart_northd 3. check sb status with: ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound Actual results: [root@wsfd-advnetlab17 bz1829109]# ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound 8caf Name: OVN_Southbound Cluster ID: 96cc (96cc6df5-ff59-4f15-9cbf-52446c6148bd) Server ID: 8caf (8cafc269-8e76-47ca-9bd3-c0dc341b6b1a) Address: tcp:1.1.1.17:6644 Status: cluster member Role: follower Term: 226 Leader: unknown Vote: 41b1 Election timer: 1000 Log: [2, 10] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->0000 ->e87c <-41b1 Servers: 8caf (8caf at tcp:1.1.1.17:6644) (self) e87c (e87c at tcp:1.1.1.18:6644) dcf6 (dcf6 at tcp:1.1.1.17:6644) <=== there are two for 1.1.1.17 and 1.1.1.18 902e (902e at tcp:1.1.1.16:6644) 41b1 (41b1 at tcp:1.1.1.18:6644) Expected results: raft works well Additional info: if restart master at first, the issue doesn't occur.
[root@wsfd-advnetlab16 bz1829109]# rpm -qa | grep -E "openvswitch|ovn" ovn2.13-20.06.2-3.el8fdp.x86_64 kernel-kernel-networking-openvswitch-ovn_ha-1.0-57.noarch ovn2.13-central-20.06.2-3.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch ovn2.13-host-20.06.2-3.el8fdp.x86_64 openvswitch2.13-2.13.0-58.el8fdp.x86_64 [root@wsfd-advnetlab16 bz1829109]# ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbou nd 497e Name: OVN_Southbound Cluster ID: e1fe (e1fe58fa-6f10-4b4b-8357-f410c189b7bc) Server ID: 497e (497e68b8-b98b-4fc8-a332-55909876b83b) Address: tcp:1.1.1.16:6644 Status: cluster member Role: leader Term: 1 Leader: self Vote: self Election timer: 1000 Log: [2, 5] Entries not yet committed: 0 Entries not yet applied: 0 Connections: <-0000 <-0000 Servers: 497e (497e at tcp:1.1.1.16:6644) (self) next_index=2 match_index=4 <== only one on master
even run ovn-appctl -t ovn-northd sb-cluster-state-reset on every node, raft still doesn't work