Bug 1876793
| Summary: | raft doesn't work when restart cluster after delete ovnsb_db.db | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Jianlin Shi <jishi> |
| Component: | ovn2.13 | Assignee: | OVN Team <ovnteam> |
| Status: | NEW --- | QA Contact: | Jianlin Shi <jishi> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | FDP 20.E | CC: | ctrautma, jishi, ralongi |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
[root@wsfd-advnetlab16 bz1829109]# rpm -qa | grep -E "openvswitch|ovn" ovn2.13-20.06.2-3.el8fdp.x86_64 kernel-kernel-networking-openvswitch-ovn_ha-1.0-57.noarch ovn2.13-central-20.06.2-3.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch ovn2.13-host-20.06.2-3.el8fdp.x86_64 openvswitch2.13-2.13.0-58.el8fdp.x86_64 [root@wsfd-advnetlab16 bz1829109]# ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbou nd 497e Name: OVN_Southbound Cluster ID: e1fe (e1fe58fa-6f10-4b4b-8357-f410c189b7bc) Server ID: 497e (497e68b8-b98b-4fc8-a332-55909876b83b) Address: tcp:1.1.1.16:6644 Status: cluster member Role: leader Term: 1 Leader: self Vote: self Election timer: 1000 Log: [2, 5] Entries not yet committed: 0 Entries not yet applied: 0 Connections: <-0000 <-0000 Servers: 497e (497e at tcp:1.1.1.16:6644) (self) next_index=2 match_index=4 <== only one on master even run ovn-appctl -t ovn-northd sb-cluster-state-reset on every node, raft still doesn't work |
Description of problem: raft doesn't work when restart cluster after delete ovnsb_db.db Version-Release number of selected component (if applicable): ovn20.06.2-3 How reproducible: Always Steps to Reproduce: 1. start cluster # master ctl_cmd="/usr/share/ovn/scripts/ovn-ctl" ip_s=1.1.1.16 ip_c1=1.1.1.17 ip_c2=1.1.1.18 $ctl_cmd --db-nb-addr=$ip_s --db-nb-create-insecure-remote=yes \ --db-sb-addr=$ip_s --db-sb-create-insecure-remote=yes \ --db-nb-cluster-local-addr=$ip_s --db-sb-cluster-local-addr=$ip_s \ --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \ --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd # slave1 ctl_cmd=/usr/share/ovn/scripts/ovn-ctl ip_s=1.1.1.16 ip_c1=1.1.1.17 ip_c2=1.1.1.18 $ctl_cmd --db-nb-addr=$ip_c1 --db-nb-create-insecure-remote=yes \ --db-sb-addr=$ip_c1 --db-sb-create-insecure-remote=yes \ --db-nb-cluster-local-addr=$ip_c1 --db-sb-cluster-local-addr=$ip_c1 \ --db-nb-cluster-remote-addr=$ip_s --db-sb-cluster-remote-addr=$ip_s \ --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \ --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd # slave 2 ctl_cmd=/usr/share/ovn/scripts/ovn-ctl ip_s=1.1.1.16 ip_c1=1.1.1.17 ip_c2=1.1.1.18 $ctl_cmd --db-nb-addr=$ip_c2 --db-nb-create-insecure-remote=yes \ --db-sb-addr=$ip_c2 --db-sb-create-insecure-remote=yes \ --db-nb-cluster-local-addr=$ip_c2 --db-sb-cluster-local-addr=$ip_c2 \ --db-nb-cluster-remote-addr=$ip_s --db-sb-cluster-remote-addr=$ip_s \ --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \ --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd 2. delete ovnsb_db.db and restart cluster ctl_cmd="/usr/share/ovn/scripts/ovn-ctl" ip_s=1.1.1.16 ip_c1=1.1.1.17 ip_c2=1.1.1.18 rm /etc/ovn/ovnsb_db.db -f ssh -q $ip_c1 rm /etc/ovn/ovnsb_db.db -f ssh -q $ip_c2 rm /etc/ovn/ovnsb_db.db -f ssh -q $ip_c1 $ctl_cmd --db-nb-addr=$ip_c1 --db-nb-create-insecure-remote=yes \ --db-sb-addr=$ip_c1 --db-sb-create-insecure-remote=yes \ --db-nb-cluster-local-addr=$ip_c1 --db-sb-cluster-local-addr=$ip_c1 \ --db-nb-cluster-remote-addr=$ip_s --db-sb-cluster-remote-addr=$ip_s \ --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \ --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 restart_northd ssh -q $ip_c2 $ctl_cmd --db-nb-addr=$ip_c2 --db-nb-create-insecure-remote=yes \ --db-sb-addr=$ip_c2 --db-sb-create-insecure-remote=yes \ --db-nb-cluster-local-addr=$ip_c2 --db-sb-cluster-local-addr=$ip_c2 \ --db-nb-cluster-remote-addr=$ip_s --db-sb-cluster-remote-addr=$ip_s \ --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \ --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 restart_northd $ctl_cmd --db-nb-addr=$ip_s --db-nb-create-insecure-remote=yes \ --db-sb-addr=$ip_s --db-sb-create-insecure-remote=yes \ --db-nb-cluster-local-addr=$ip_s --db-sb-cluster-local-addr=$ip_s \ --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \ --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 restart_northd 3. check sb status with: ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound Actual results: [root@wsfd-advnetlab17 bz1829109]# ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound 8caf Name: OVN_Southbound Cluster ID: 96cc (96cc6df5-ff59-4f15-9cbf-52446c6148bd) Server ID: 8caf (8cafc269-8e76-47ca-9bd3-c0dc341b6b1a) Address: tcp:1.1.1.17:6644 Status: cluster member Role: follower Term: 226 Leader: unknown Vote: 41b1 Election timer: 1000 Log: [2, 10] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->0000 ->e87c <-41b1 Servers: 8caf (8caf at tcp:1.1.1.17:6644) (self) e87c (e87c at tcp:1.1.1.18:6644) dcf6 (dcf6 at tcp:1.1.1.17:6644) <=== there are two for 1.1.1.17 and 1.1.1.18 902e (902e at tcp:1.1.1.16:6644) 41b1 (41b1 at tcp:1.1.1.18:6644) Expected results: raft works well Additional info: if restart master at first, the issue doesn't occur.