Description of problem: OVN server still present in cluster/status after being fenced from rest of the cluster. Version-Release number of selected component (if applicable): network-scripts-openvswitch2.13-2.13.0-79.el8.x86_64 openvswitch2.13-2.13.0-79.el8.x86_64 ovn2.13-host-20.12.0-1.el8.x86_64 openvswitch-selinux-extra-policy-1.0-28.el8.noarch ovn2.13-20.12.0-1.el8.x86_64 ovn2.13-central-20.12.0-1.el8.x86_64 How reproducible: 100% Steps to Reproduce: 1. Deploy 3-node OVN cluster (hosts: ovn1, ovn2, ovn3) 2. Check ovn1 is leader, ovn2 and ovn3 are followers: [ovn1] $ ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound" 3. Fence ovn3 from ovn1, ovn2: [ovn3] $ sudo iptables -A INPUT -i eth0 -p tcp --match multiport --dports 6641:6644 -j DROP [ovn3] $ sudo iptables -A POSTROUTING -p tcp --match multiport --dports 6641:6644 -j DROP 4. Check that ovn1 and ovn2 detected ovn3 down: ovn1 and ovn2 start logging lots of following messages: Feb 17 12:57:36 ovn2 ovsdb-server[17973]: ovs|00025|raft|WARN|ignoring vote request received after only 278 ms (minimum election time is 1000 ms) 5. Check that ovn3 is in role "candidate" [ovn1] $ ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound" 6. Cluster status output does not seem to indicate that ovn3 is now disconnected from the rest of the cluster (split-brain) Previously established TCP connections are still present after fencing ovn3 with abovementioned iptables rules: [root@ovn3 ~]# netstat -na | grep 664 tcp 0 0 192.168.14.34:6641 0.0.0.0:* LISTEN tcp 0 0 192.168.14.34:6642 0.0.0.0:* LISTEN tcp 0 0 192.168.14.34:6643 0.0.0.0:* LISTEN tcp 0 0 192.168.14.34:6644 0.0.0.0:* LISTEN tcp 0 0 192.168.14.34:55090 192.168.14.32:6641 ESTABLISHED tcp 0 0 192.168.14.34:6644 192.168.14.33:42192 ESTABLISHED tcp 0 0 192.168.14.34:6644 192.168.14.32:51522 ESTABLISHED tcp 0 0 192.168.14.34:55182 192.168.14.32:6642 ESTABLISHED tcp 0 0 192.168.14.34:6643 192.168.14.32:45026 ESTABLISHED tcp 0 0 192.168.14.34:56518 192.168.14.33:6643 ESTABLISHED tcp 0 0 192.168.14.34:54228 192.168.14.32:6643 ESTABLISHED tcp 0 0 192.168.14.34:58724 192.168.14.33:6644 ESTABLISHED tcp 0 0 192.168.14.34:6643 192.168.14.33:58756 ESTABLISHED tcp 0 0 192.168.14.34:43102 192.168.14.32:6644 ESTABLISHED ovn3 changed role to "candidate" after fencing: [root@ovn3 ~]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound 9431 Name: OVN_Northbound Cluster ID: dcc3 (dcc308a6-0634-416d-a37e-57848aa6ee3e) Server ID: 9431 (94314df7-af85-438f-8004-3f6ba2737fa7) Address: tcp:192.168.14.34:6643 Status: cluster member Role: candidate Term: 915 Leader: 3234 Vote: self Election timer: 1000 Log: [2, 14] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->3234 ->8554 <-8554 <-3234 <-3234 <-8554 <-3234 Servers: 3234 (3234 at tcp:192.168.14.32:6643) 8554 (8554 at tcp:192.168.14.33:6643) 9431 (9431 at tcp:192.168.14.34:6643) (self) (voted for 9431) ovn1 is still the leader after fencing ovn3: [root@ovn1 ~]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound 3234 Name: OVN_Northbound Cluster ID: dcc3 (dcc308a6-0634-416d-a37e-57848aa6ee3e) Server ID: 3234 (32341975-62c8-4507-a8e1-0a74dfac4edc) Address: tcp:192.168.14.32:6643 Status: cluster member Role: leader Term: 818 Leader: self Vote: self Election timer: 1000 Log: [2, 14] Entries not yet committed: 0 Entries not yet applied: 0 Connections: <-8554 ->8554 ->9431 <-9431 Servers: 3234 (3234 at tcp:192.168.14.32:6643) (self) next_index=13 match_index=13 8554 (8554 at tcp:192.168.14.33:6643) next_index=14 match_index=13 9431 (9431 at tcp:192.168.14.34:6643) next_index=14 match_index=13
Created attachment 1757566 [details] OVN cluster logs
OK. There are several separate issues here. 1. ovn3 actually has no connection to ovn1 or ovn2. It's missing several heartbeats sent by leader, so it decides to start election. It constantly sends vote requests an doesn't receive any replies staying in 'candidate' role. On a second attempt to start election ovn3 will set the flag "candidate_retrying" that will be translated to 'is_connected':false in the _Server database that will inform clients that storage is disconnected. Clients should reconnect to other servers in this case. To verify we could run ovn-nbctl --db=<local db on ovn3> --no-leader-only -v show This command will fail, logs will show the reason "clustered database server is disconnected from cluster". So no actual split-brain. The issue here is that cluster/show command still shows "cluster member" in the status field for ovn3. That could be fixed by taking into account the "candidate_retrying" flag and reporting "disconnected from cluster" instead of "cluster member", probably with extra note "(election timeout)". I will send a patch for this. 2. Approximately a year ago there was a patch to disable inactivity probes for RAFT connections in ovsdb-server: db5a066c17bd ("raft: Disable RAFT jsonrpc inactivity probe.") Patch was accepted. Rationale was that it's enough to have RAFT heartbeats. Well... it's not. RAFT heartbeats are unidirectional. i.e. leader sends them to followers but not the other way around. Missing heartbeats provokes followers to start election, but if leader will not receive any replies it will not do anything until there is a quorum, i.e. there are enough other servers to make decisions. This leads to situation that while TCP connection is established leader will continue to blindly send messages to it. In our case this leads to growing send backlog. Connection will be terminated eventually due to excessive send backlog, but this this might take a lot of time and wasted process memory. To fix that we need to reintroduce inactivity probes that will drop connection if there was no incoming traffic for a long time and remote server doesn't reply to the "echo" request. Probe interval might be chosen based on an election timeout to avoid issues described in commit db5a066c17bd. Will send a patch for this issue too. 3. The test itself looks a bit strange. It uses following firewall rules: iptables -A INPUT -i eth0 -p tcp --match multiport --dports 6641:6644 -j DROP iptables -A POSTROUTING -p tcp --match multiport --dports 6641:6644 -j DROP IIUC, these rules will drop all incomin and outgoing traffic with TCP destination ports equal to database port range. However, each pair of servers usually has 2 connections between them: ovn1:<random port> <-----> ovn3:6641 ovn1:6641 <-----> ovn3:<random port> Above iptables rules will only cut one direction of each connection keeping the following schema: ovn1:<random port> <------ ovn3:6641 ovn1:6641 ------> ovn3:<random port> And we can't reliably tell which connection will be used by one server while sending messages to other one. It could be that communication is perfectly fine if ovn1 will always use second connection and ovn3 will use the first one. Is it intentional? Or should the testcase be changed to block by the TCP source port too?
Ilya, thank you very much for the prompt triaging and detailed commentary! OK. I understand now the fencing mechanism used was not sufficient to completely fence ovn3 off. I started from scratch, clean OVN databases. Taking into consideration the issues mentioned in point 1 and point 2 in comment #2, I think there is no additional issue but I would kindly ask you to confirm please. OVN and OVS logs attached next. 1. ovn1 elected leader, ovn2 and ovn3 followers: [root@ovn1 ~]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound 33c1 Name: OVN_Northbound Cluster ID: bfc0 (bfc03944-bfcd-4681-aec3-f53f864b046d) Server ID: 33c1 (33c1c9d8-db98-47d1-bb5c-84eb5b9bddf8) Address: tcp:192.168.14.32:6643 Status: cluster member Role: leader Term: 1 Leader: self Vote: self Election timer: 1000 Log: [2, 7] Entries not yet committed: 0 Entries not yet applied: 0 Connections: <-1962 ->1962 <-2ac8 ->2ac8 Servers: 33c1 (33c1 at tcp:192.168.14.32:6643) (self) next_index=2 match_index=6 1962 (1962 at tcp:192.168.14.33:6643) next_index=7 match_index=6 2ac8 (2ac8 at tcp:192.168.14.34:6643) next_index=7 match_index=6 [root@ovn3 ~]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound 2ac8 Name: OVN_Northbound Cluster ID: bfc0 (bfc03944-bfcd-4681-aec3-f53f864b046d) Server ID: 2ac8 (2ac8ab9f-b547-431f-9dab-9aa64548a9b1) Address: tcp:192.168.14.34:6643 Status: cluster member Role: follower Term: 1 Leader: 33c1 Vote: unknown Election timer: 1000 Log: [2, 7] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->0000 ->1962 <-33c1 <-1962 Servers: 33c1 (33c1 at tcp:192.168.14.32:6643) 1962 (1962 at tcp:192.168.14.33:6643) 2ac8 (2ac8 at tcp:192.168.14.34:6643) (self) 2. Interface eth0 was shut off for 5 minutes: [root@ovn3 ~]# date; ifdown eth0; sleep 300; ifup eth0 Fri Feb 19 08:35:27 UTC 2021 3. ovn1 detected inactivity. cluster/status command output remained unchanged: Feb 19 08:35:36 ovn1 ovsdb-server[27848]: ovs|00023|reconnect|ERR|tcp:192.168.14.34:56670: no response to inactivity probe after 5.02 seconds, disconnecting Feb 19 08:35:36 ovn1 ovsdb-server[27868]: ovs|00023|reconnect|ERR|tcp:192.168.14.34:56758: no response to inactivity probe after 5.02 seconds, disconnecting [root@ovn1 ~]# date; ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound Fri Feb 19 08:37:07 UTC 2021 33c1 Name: OVN_Northbound Cluster ID: bfc0 (bfc03944-bfcd-4681-aec3-f53f864b046d) Server ID: 33c1 (33c1c9d8-db98-47d1-bb5c-84eb5b9bddf8) Address: tcp:192.168.14.32:6643 Status: cluster member Role: leader Term: 1 Leader: self Vote: self Election timer: 1000 Log: [2, 7] Entries not yet committed: 0 Entries not yet applied: 0 Connections: <-1962 ->1962 <-2ac8 ->2ac8 Servers: 33c1 (33c1 at tcp:192.168.14.32:6643) (self) next_index=2 match_index=6 1962 (1962 at tcp:192.168.14.33:6643) next_index=7 match_index=6 2ac8 (2ac8 at tcp:192.168.14.34:6643) next_index=7 match_index=6 4. ovn1 started logging the following messages a while after ovn3 was fenced off (~3 minutes): Feb 19 08:38:18 ovn1 ovsdb-server[27848]: ovs|00024|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:55800, num of msgs: 50, backlog:13952. Feb 19 08:38:18 ovn1 ovsdb-server[27868]: ovs|00024|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:44674, num of msgs: 50, backlog: 13952. Feb 19 08:38:18 ovn1 ovsdb-server[27848]: ovs|00025|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:55800, num of msgs: 51, backlog: 14236. Feb 19 08:38:18 ovn1 ovsdb-server[27868]: ovs|00025|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:44674, num of msgs: 51, backlog: 14236. Feb 19 08:38:18 ovn1 ovsdb-server[27868]: ovs|00026|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:44674, num of msgs: 52, backlog: 14520. Feb 19 08:38:18 ovn1 ovsdb-server[27848]: ovs|00026|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:55800, num of msgs: 52, backlog: 14520. Feb 19 08:38:19 ovn1 ovsdb-server[27868]: ovs|00027|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:44674, num of msgs: 53, backlog: 14804. Feb 19 08:38:19 ovn1 ovsdb-server[27848]: ovs|00027|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:55800, num of msgs: 53, backlog: 14804. Feb 19 08:38:19 ovn1 ovsdb-server[27848]: ovs|00028|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:55800, num of msgs: 54, backlog: 15088. Feb 19 08:38:19 ovn1 ovsdb-server[27868]: ovs|00028|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:44674, num of msgs: 54, backlog: 15088. Feb 19 08:38:30 ovn1 ovsdb-server[27848]: ovs|00029|jsonrpc|INFO|Dropped 31 log messages in last 10 seconds (most recently, 0 seconds ago) due to excessive rate 5. ovn3 came back online after 5 minutes and was elected leader this time, ovn1 and ovn2 followers: [root@ovn3 ~]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound 2ac8 Name: OVN_Northbound Cluster ID: bfc0 (bfc03944-bfcd-4681-aec3-f53f864b046d) Server ID: 2ac8 (2ac8ab9f-b547-431f-9dab-9aa64548a9b1) Address: tcp:192.168.14.34:6643 Status: cluster member Role: leader Term: 206 Leader: self Vote: self Election timer: 1000 Log: [2, 8] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->0000 ->1962 <-33c1 <-1962 Servers: 33c1 (33c1 at tcp:192.168.14.32:6643) next_index=8 match_index=7 1962 (1962 at tcp:192.168.14.33:6643) next_index=8 match_index=7 2ac8 (2ac8 at tcp:192.168.14.34:6643) (self) next_index=7 match_index=7 [root@ovn1 ~]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound 33c1 Name: OVN_Northbound Cluster ID: bfc0 (bfc03944-bfcd-4681-aec3-f53f864b046d) Server ID: 33c1 (33c1c9d8-db98-47d1-bb5c-84eb5b9bddf8) Address: tcp:192.168.14.32:6643 Status: cluster member Role: follower Term: 206 Leader: 2ac8 Vote: 2ac8 Election timer: 1000 Log: [2, 8] Entries not yet committed: 0 Entries not yet applied: 0 Connections: <-1962 ->1962 ->2ac8 <-2ac8 Servers: 33c1 (33c1 at tcp:192.168.14.32:6643) (self) 1962 (1962 at tcp:192.168.14.33:6643) 2ac8 (2ac8 at tcp:192.168.14.34:6643)
Created attachment 1758098 [details] OVN, OVS logs
Yes, these logs shows that cluster works as expected (taking into account 2 issues mentioned in comment #2). One clarification, though, is that inactivity probe disconnection happened not on a raft connection, but on some client connection. I suppose, it was a connection from northd on ovn3 node to ovsdb-server on ovn1 node. Since you shut down the whole network this connection died and was killed due to inactivity (incactivity probes enabled on client connections, but not on raft ones). RAFT connections ramained established and backlog accumulated on them as expected.
Patches posted for review: 1. http://patchwork.ozlabs.org/project/openvswitch/patch/20210223131651.53640-1-i.maximets@ovn.org/ 2. http://patchwork.ozlabs.org/project/openvswitch/patch/20210223131500.53470-1-i.maximets@ovn.org/
I've created a scratch build with patches applied: http://brew-task-repos.usersys.redhat.com/repos/scratch/imaximet/openvswitch2.13/2.13.0/97.el8fdp/ @Carlos, could you, please, test with this package?
create raft with following script: server: ctl_cmd="/usr/share/ovn/scripts/ovn-ctl" ip_s=1.1.1.16 ip_c1=1.1.1.17 ip_c2=1.1.1.18 $ctl_cmd --db-nb-addr=$ip_s --db-nb-create-insecure-remote=yes \ --db-sb-addr=$ip_s --db-sb-create-insecure-remote=yes \ --db-nb-cluster-local-addr=$ip_s --db-sb-cluster-local-addr=$ip_s \ --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \ --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd client1: ctl_cmd="/usr/share/ovn/scripts/ovn-ctl" ip_s=1.1.1.16 ip_c1=1.1.1.17 ip_c2=1.1.1.18 $ctl_cmd --db-nb-addr=$ip_c1 --db-nb-create-insecure-remote=yes \ --db-sb-addr=$ip_c1 --db-sb-create-insecure-remote=yes \ --db-nb-cluster-local-addr=$ip_c1 --db-sb-cluster-local-addr=$ip_c1 \ --db-nb-cluster-remote-addr=$ip_s --db-sb-cluster-remote-addr=$ip_s \ --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \ --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd client2: ctl_cmd="/usr/share/ovn/scripts/ovn-ctl" ip_s=1.1.1.16 ip_c1=1.1.1.17 ip_c2=1.1.1.18 $ctl_cmd --db-nb-addr=$ip_c2 --db-nb-create-insecure-remote=yes \ --db-sb-addr=$ip_c2 --db-sb-create-insecure-remote=yes \ --db-nb-cluster-local-addr=$ip_c2 --db-sb-cluster-local-addr=$ip_c2 \ --db-nb-cluster-remote-addr=$ip_s --db-sb-cluster-remote-addr=$ip_s \ --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \ --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd result on openvswitch2.13-98: [root@wsfd-advnetlab18 bz1888829]# rpm -qa | grep -E "openvswitch2.13|ovn2.13" openvswitch2.13-2.13.0-98.el8fdp.x86_64 ovn2.13-host-20.12.0-86.el8fdp.x86_64 ovn2.13-20.12.0-86.el8fdp.x86_64 ovn2.13-central-20.12.0-86.el8fdp.x86_64 after startup, set interface down for client2: [root@wsfd-advnetlab18 bz1888829]# ip link set ens1f0 down after for a while, check the status on client2: [root@wsfd-advnetlab18 bz1888829]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound 315b Name: OVN_Northbound Cluster ID: 250a (250aa765-598a-4b2a-a9d8-417502ef0317) Server ID: 315b (315bf920-b6ad-41fd-a39f-50c7e1611ae5) Address: tcp:1.1.1.18:6643 Status: disconnected from the cluster (election timeout) <=== disconnected Role: candidate Term: 469 Leader: 2a74 Vote: self Election timer: 1000 Log: [2, 7] Entries not yet committed: 0 Entries not yet applied: 0 Connections: (->0000) (->6f6f) Servers: 315b (315b at tcp:1.1.1.18:6643) (self) (voted for 315b) 6f6f (6f6f at tcp:1.1.1.17:6643) 2a74 (2a74 at tcp:1.1.1.16:6643) status on server: [root@wsfd-advnetlab16 bz1888829]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound 2a74 Name: OVN_Northbound Cluster ID: 250a (250aa765-598a-4b2a-a9d8-417502ef0317) Server ID: 2a74 (2a749a1b-ddce-42db-8103-db2e474c6c17) Address: tcp:1.1.1.16:6643 Status: cluster member Role: leader Term: 7 Leader: self Vote: self Election timer: 1000 Log: [2, 4011] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->6f6f (->315b) <-6f6f Servers: 315b (315b at tcp:1.1.1.18:6643) next_index=4011 match_index=0 6f6f (6f6f at tcp:1.1.1.17:6643) next_index=4011 match_index=4010 2a74 (2a74 at tcp:1.1.1.16:6643) (self) next_index=4010 match_index=4010
Hi Ilya, so we tested this on centos strem with openvswitch2.13-2.13.0-101.el8.x86_64 which I double checked (https://cbs.centos.org/koji/buildinfo?buildID=32502) has the two linked patches. I think the case where you cut off controller-0 from controller-1 and controller-2 via a network failure and then running the cluster/status on the cut-off controller-0 will work (I haven't tested this personally, but I trust c#9 ;). The case that seemingly still does not work is the displaying of the cluster status on the other two nodes. There we still see no indication that one node has died. Here's is what we did. We have the following three nodes (running in VMs): controller-0 172.16.2.241 controller-1 172.16.2.79 controller-2 172.16.2.57 ovn-dbs is clustered across those three nodes and we virsh destroy controller-0. Then on controller-1 we see the following: [root@controller-1 ovs-2.13.0]# podman exec -ti ovn_cluster_north_db_server sh sh-4.4# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound 45fb Name: OVN_Northbound Cluster ID: 613c (613c0b6e-65af-4810-bb48-c9cbea43d442) Server ID: 45fb (45fba88d-5980-49ab-b562-1ea6e0db266c) Address: ssl:172.16.2.79:6643 Status: cluster member Role: follower Term: 2 Leader: c7dd Vote: c7dd Election timer: 1000 Log: [2, 118] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->c7dd (->9287) <-c7dd Servers: 45fb (45fb at ssl:172.16.2.79:6643) (self) c7dd (c7dd at ssl:172.16.2.57:6643) 9287 (9287 at ssl:172.16.2.241:6643) And above I have no indication that controller-0 (aka 172.16.2.241) is really gone when we query from the surviving, quorate partition. Does the above make sense? thanks, Michele
(In reply to Michele Baldessari from comment #10) > Election timer: 1000 > Log: [2, 118] > Entries not yet committed: 0 > Entries not yet applied: 0 > Connections: ->c7dd (->9287) <-c7dd > Servers: > 45fb (45fb at ssl:172.16.2.79:6643) (self) > c7dd (c7dd at ssl:172.16.2.57:6643) > 9287 (9287 at ssl:172.16.2.241:6643) > > And above I have no indication that controller-0 (aka 172.16.2.241) is > really gone when we query from the surviving, quorate partition. The list of connections says that you have no established connection with server 9287. "(->9287)" means that there is outgoing connection that is in "connecting" state. And there is no incoming connection that would be presented as "<-9287". You can see that this server has established incoming and outgoing connections with c7dd, but not with 9287. Also, on OVS 2.15 there will be extra information on when the last message was received from each server, this will give an extra clue.
(In reply to Ilya Maximets from comment #11) > (In reply to Michele Baldessari from comment #10) > > Election timer: 1000 > > Log: [2, 118] > > Entries not yet committed: 0 > > Entries not yet applied: 0 > > Connections: ->c7dd (->9287) <-c7dd > > Servers: > > 45fb (45fb at ssl:172.16.2.79:6643) (self) > > c7dd (c7dd at ssl:172.16.2.57:6643) > > 9287 (9287 at ssl:172.16.2.241:6643) > > > > And above I have no indication that controller-0 (aka 172.16.2.241) is > > really gone when we query from the surviving, quorate partition. > > The list of connections says that you have no established connection > with server 9287. "(->9287)" means that there is outgoing connection > that is in "connecting" state. And there is no incoming connection > that would be presented as "<-9287". You can see that this server has > established incoming and outgoing connections with c7dd, but not with > 9287. > > Also, on OVS 2.15 there will be extra information on when the last message > was received from each server, this will give an extra clue. Thanks. Can we not make this output a little more user-friendly though? (->1234) is hardly a clue that the raft cluster agrees a node is gone. A nicely spelled out "Not connected to the cluster" would give us a hint and displaying disconnected only in the unquorate partion does not cover the case when a node died. Last message received is certainly an improvement but a clear "Not part of the cluster right now" like any other cluster manager does would be helpful. cheers, Michele
(In reply to Michele Baldessari from comment #12) > (In reply to Ilya Maximets from comment #11) > > (In reply to Michele Baldessari from comment #10) > > > Election timer: 1000 > > > Log: [2, 118] > > > Entries not yet committed: 0 > > > Entries not yet applied: 0 > > > Connections: ->c7dd (->9287) <-c7dd > > > Servers: > > > 45fb (45fb at ssl:172.16.2.79:6643) (self) > > > c7dd (c7dd at ssl:172.16.2.57:6643) > > > 9287 (9287 at ssl:172.16.2.241:6643) > > > > > > And above I have no indication that controller-0 (aka 172.16.2.241) is > > > really gone when we query from the surviving, quorate partition. > > > > The list of connections says that you have no established connection > > with server 9287. "(->9287)" means that there is outgoing connection > > that is in "connecting" state. And there is no incoming connection > > that would be presented as "<-9287". You can see that this server has > > established incoming and outgoing connections with c7dd, but not with > > 9287. > > > > Also, on OVS 2.15 there will be extra information on when the last message > > was received from each server, this will give an extra clue. > > Thanks. Can we not make this output a little more user-friendly though? > (->1234) is hardly a clue that the raft cluster agrees a node is gone. > > A nicely spelled out "Not connected to the cluster" would give us a hint > and displaying disconnected only in the unquorate partion does not cover > the case when a node died. Last message received is certainly an improvement > but a clear "Not part of the cluster right now" like any other cluster > manager > does would be helpful. The problem here is that servers do not agree on who is part of a cluster and who is not. The only thing we can say is if this particular server has connection to other particular server. And this is reflected in a list of connections. We can also have connection only in one direction and this will work for some time until inactivity probe will cut them off. Technically, the only requirement for the raft cluster to function correctly is that half of followers should have connection with the leader. But it's not required for followers to have established connection between each other as they do not send anything between each other. So, from the point of view of one follower it's not possible to say if another follower is part of the cluster or not. We could add a more clear indication to the cluster/status that this server has no established outgoing connection with the other one. If you think that it's needed, we can file an RFE and schedule this work. As a solution for now, ovs-appctl cluster/status OVN_Northbound | grep -q -E "\(->[a-z0-9]*\)" if executed for a leader, will say if one of the followers is not connected right now. Note that even if one of the followers is not connected right now, it doesn't mean that it's not a transient network issue that could go away and raft will not even notice that. Leader doesn't care if all followers are alive while there is a quorum. Basically, if the question is "is this server is logically member of a raft cluster?" each server can only say for itself with 100% certainty. Leader can say that it didn't see one of the followers for a while, but it's not a reason to kick the server out of the cluster.
I agree with Michele here -- the output is not user-friendly at all. Are we expecting users to possess advanced internal knowledge of OVS/OVN to the point they can intuitively understand that a combination of server ids (not even their IPs), parenthesis and arrows have a special meaning, what that is and their implications to the functional state of the cluster? I would argue not. The fact that Michele and me, two engineers with HA and network backgrounds, could not even parse such status output may indicate other users would not be able to either. We would be happy to continue working with OVN folks to further improve the output for a better UX by testing and providing feedback. Thanks for the bug fixing thus far, much appreciated!
Verified on ovn2.13-20.12.0-104.el8fdp.x86_64: [root@wsfd-advnetlab21 bz1929690]# rpm -qa | grep ovn2.13 ovn2.13-20.12.0-104.el8fdp.x86_64 ovn2.13-central-20.12.0-104.el8fdp.x86_64 ovn2.13-host-20.12.0-104.el8fdp.x86_64 after setup with setup script in comment 9: server: [root@wsfd-advnetlab21 bz1929690]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound 82b9 Name: OVN_Northbound Cluster ID: 9353 (935355fa-9d7c-4c49-8b1a-4bc0f371bee5) Server ID: 82b9 (82b9bef6-8b20-4b40-be5b-b888e3dc1790) Address: tcp:1.1.1.21:6643 Status: cluster member Role: leader Term: 1 Leader: self Vote: self Last Election started 49330 ms ago, reason: timeout Last Election won: 49330 ms ago Election timer: 1000 Log: [2, 7] Entries not yet committed: 0 Entries not yet applied: 0 Connections: <-7516 ->7516 <-5625 ->5625 Disconnections: 0 Servers: 82b9 (82b9 at tcp:1.1.1.21:6643) (self) next_index=2 match_index=6 7516 (7516 at tcp:1.1.1.18:6643) next_index=7 match_index=6 last msg 45 ms ago 5625 (5625 at tcp:1.1.1.17:6643) next_index=7 match_index=6 last msg 45 ms ago client2: [root@wsfd-advnetlab18 bz1929690]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound 7516 Name: OVN_Northbound Cluster ID: 9353 (935355fa-9d7c-4c49-8b1a-4bc0f371bee5) Server ID: 7516 (751659ba-4cac-4520-a61b-04e951312b18) Address: tcp:1.1.1.18:6643 Status: cluster member Role: follower Term: 1 Leader: 82b9 Vote: unknown Election timer: 1000 Log: [2, 7] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->0000 <-82b9 <-5625 ->5625 Disconnections: 0 Servers: 82b9 (82b9 at tcp:1.1.1.21:6643) last msg 101 ms ago 7516 (7516 at tcp:1.1.1.18:6643) (self) 5625 (5625 at tcp:1.1.1.17:6643) set interface on client2 down: [root@wsfd-advnetlab18 bz1929690]# ip link set ens1f0 down after several seconds: server: [root@wsfd-advnetlab21 bz1929690]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound 82b9 Name: OVN_Northbound Cluster ID: 9353 (935355fa-9d7c-4c49-8b1a-4bc0f371bee5) Server ID: 82b9 (82b9bef6-8b20-4b40-be5b-b888e3dc1790) Address: tcp:1.1.1.21:6643 Status: cluster member Role: leader Term: 1 Leader: self Vote: self Last Election started 234737 ms ago, reason: timeout Last Election won: 234737 ms ago Election timer: 1000 Log: [2, 7] Entries not yet committed: 0 Entries not yet applied: 0 Connections: (->7516) <-5625 ->5625 <=== connection for 7516 is not connected Disconnections: 1 Servers: 82b9 (82b9 at tcp:1.1.1.21:6643) (self) next_index=2 match_index=6 7516 (7516 at tcp:1.1.1.18:6643) next_index=7 match_index=6 last msg 65341 ms ago 5625 (5625 at tcp:1.1.1.17:6643) next_index=7 match_index=6 last msg 292 ms ago client2: [root@wsfd-advnetlab18 bz1929690]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound 7516 Name: OVN_Northbound Cluster ID: 9353 (935355fa-9d7c-4c49-8b1a-4bc0f371bee5) Server ID: 7516 (751659ba-4cac-4520-a61b-04e951312b18) Address: tcp:1.1.1.18:6643 Status: disconnected from the cluster (election timeout) <=== disconnected Role: candidate Term: 92 Leader: unknown Vote: self Last Election started 1162 ms ago, reason: timeout Election timer: 1000 Log: [2, 7] Entries not yet committed: 0 Entries not yet applied: 0 Connections: (->0000) (->5625) Disconnections: 2 Servers: 82b9 (82b9 at tcp:1.1.1.21:6643) last msg 135209 ms ago 7516 (7516 at tcp:1.1.1.18:6643) (self) (voted for 7516) 5625 (5625 at tcp:1.1.1.17:6643)
also verified on ovn-2021-21.03.0-21.el8fdp.x86_64: [root@wsfd-advnetlab21 bz1929690]# rpm -qa | grep ovn ovn-2021-central-21.03.0-21.el8fdp.x86_64 ovn-2021-21.03.0-21.el8fdp.x86_64 ovn-2021-host-21.03.0-21.el8fdp.x86_64 server: [root@wsfd-advnetlab21 bz1929690]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound e915 Name: OVN_Northbound Cluster ID: 63b2 (63b268cb-495f-40bd-9133-8105bca56a26) Server ID: e915 (e915b002-865e-469e-b8f1-fec881c09526) Address: tcp:1.1.1.21:6643 Status: cluster member Role: leader Term: 1 Leader: self Vote: self Last Election started 271136 ms ago, reason: timeout Last Election won: 271136 ms ago Election timer: 1000 Log: [2, 7] Entries not yet committed: 0 Entries not yet applied: 0 Connections: (->ad9e) <-9c0f ->9c0f <=== disconnected Disconnections: 1 Servers: ad9e (ad9e at tcp:1.1.1.18:6643) next_index=7 match_index=6 last msg 237056 ms ago e915 (e915 at tcp:1.1.1.21:6643) (self) next_index=2 match_index=6 9c0f (9c0f at tcp:1.1.1.17:6643) next_index=7 match_index=6 last msg 222 ms ago client2: [root@wsfd-advnetlab18 bz1929690]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound ad9e Name: OVN_Northbound Cluster ID: 63b2 (63b268cb-495f-40bd-9133-8105bca56a26) Server ID: ad9e (ad9eae10-f9e5-436d-83ff-edcda589523e) Address: tcp:1.1.1.18:6643 Status: disconnected from the cluster (election timeout) <== disconnected Role: candidate Term: 177 Leader: unknown Vote: self Last Election started 642 ms ago, reason: timeout Election timer: 1000 Log: [2, 7] Entries not yet committed: 0 Entries not yet applied: 0 Connections: (->0000) (->9c0f) Disconnections: 2 Servers: ad9e (ad9e at tcp:1.1.1.18:6643) (self) (voted for ad9e) e915 (e915 at tcp:1.1.1.21:6643) last msg 260801 ms ago 9c0f (9c0f at tcp:1.1.1.17:6643)
FYI, I opened https://bugzilla.redhat.com/show_bug.cgi?id=1953688 soliciting improvements to the output of the cluster status command.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (openvswitch2.13 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2083