The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1929690 - Fenced OVN server still present in cluster/status
Summary: Fenced OVN server still present in cluster/status
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: openvswitch2.13
Version: RHEL 8.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Ilya Maximets
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-17 13:10 UTC by Carlos Goncalves
Modified: 2021-05-20 19:30 UTC (History)
8 users (show)

Fixed In Version: openvswitch2.13-2.13.0-96.el8fdp openvswitch2.13-2.13.0-85.el7fdp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-20 19:29:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
OVN cluster logs (480.80 KB, application/gzip)
2021-02-17 13:39 UTC, Carlos Goncalves
no flags Details
OVN, OVS logs (14.18 KB, application/gzip)
2021-02-19 09:00 UTC, Carlos Goncalves
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:2083 0 None None None 2021-05-20 19:30:00 UTC

Description Carlos Goncalves 2021-02-17 13:10:26 UTC
Description of problem:
OVN server still present in cluster/status after being fenced from rest of the cluster.

Version-Release number of selected component (if applicable):
network-scripts-openvswitch2.13-2.13.0-79.el8.x86_64
openvswitch2.13-2.13.0-79.el8.x86_64
ovn2.13-host-20.12.0-1.el8.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8.noarch
ovn2.13-20.12.0-1.el8.x86_64
ovn2.13-central-20.12.0-1.el8.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Deploy 3-node OVN cluster (hosts: ovn1, ovn2, ovn3)
2. Check ovn1 is leader, ovn2 and ovn3 are followers:
   [ovn1] $ ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound"
3. Fence ovn3 from ovn1, ovn2:
   [ovn3] $ sudo iptables -A INPUT -i eth0 -p tcp --match multiport --dports 6641:6644 -j DROP
   [ovn3] $ sudo iptables -A POSTROUTING -p tcp --match multiport --dports 6641:6644 -j DROP
4. Check that ovn1 and ovn2 detected ovn3 down: ovn1 and ovn2 start logging lots of following messages:
   Feb 17 12:57:36 ovn2 ovsdb-server[17973]: ovs|00025|raft|WARN|ignoring vote request received after only 278 ms (minimum election time is 1000 ms)
5. Check that ovn3 is in role "candidate"
   [ovn1] $ ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound"
6. Cluster status output does not seem to indicate that ovn3 is now disconnected from the rest of the cluster (split-brain)
 
Previously established TCP connections are still present after fencing ovn3 with abovementioned iptables rules:

[root@ovn3 ~]# netstat -na | grep 664
tcp        0      0 192.168.14.34:6641      0.0.0.0:*               LISTEN     
tcp        0      0 192.168.14.34:6642      0.0.0.0:*               LISTEN     
tcp        0      0 192.168.14.34:6643      0.0.0.0:*               LISTEN     
tcp        0      0 192.168.14.34:6644      0.0.0.0:*               LISTEN     
tcp        0      0 192.168.14.34:55090     192.168.14.32:6641      ESTABLISHED
tcp        0      0 192.168.14.34:6644      192.168.14.33:42192     ESTABLISHED
tcp        0      0 192.168.14.34:6644      192.168.14.32:51522     ESTABLISHED
tcp        0      0 192.168.14.34:55182     192.168.14.32:6642      ESTABLISHED
tcp        0      0 192.168.14.34:6643      192.168.14.32:45026     ESTABLISHED
tcp        0      0 192.168.14.34:56518     192.168.14.33:6643      ESTABLISHED
tcp        0      0 192.168.14.34:54228     192.168.14.32:6643      ESTABLISHED
tcp        0      0 192.168.14.34:58724     192.168.14.33:6644      ESTABLISHED
tcp        0      0 192.168.14.34:6643      192.168.14.33:58756     ESTABLISHED
tcp        0      0 192.168.14.34:43102     192.168.14.32:6644      ESTABLISHED

ovn3 changed role to "candidate" after fencing:

[root@ovn3 ~]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound              
9431
Name: OVN_Northbound
Cluster ID: dcc3 (dcc308a6-0634-416d-a37e-57848aa6ee3e)                                          
Server ID: 9431 (94314df7-af85-438f-8004-3f6ba2737fa7)                                            
Address: tcp:192.168.14.34:6643
Status: cluster member
Role: candidate
Term: 915
Leader: 3234
Vote: self
 
Election timer: 1000
Log: [2, 14]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->3234 ->8554 <-8554 <-3234 <-3234 <-8554 <-3234                                    
Servers:
    3234 (3234 at tcp:192.168.14.32:6643)
    8554 (8554 at tcp:192.168.14.33:6643)
    9431 (9431 at tcp:192.168.14.34:6643) (self) (voted for 9431)
 
ovn1 is still the leader after fencing ovn3:

[root@ovn1 ~]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound              
3234
Name: OVN_Northbound
Cluster ID: dcc3 (dcc308a6-0634-416d-a37e-57848aa6ee3e)                                          
Server ID: 3234 (32341975-62c8-4507-a8e1-0a74dfac4edc)                                            
Address: tcp:192.168.14.32:6643
Status: cluster member
Role: leader
Term: 818
Leader: self
Vote: self
 
Election timer: 1000
Log: [2, 14]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: <-8554 ->8554 ->9431 <-9431
Servers:
    3234 (3234 at tcp:192.168.14.32:6643) (self) next_index=13 match_index=13                    
    8554 (8554 at tcp:192.168.14.33:6643) next_index=14 match_index=13                            
    9431 (9431 at tcp:192.168.14.34:6643) next_index=14 match_index=13

Comment 1 Carlos Goncalves 2021-02-17 13:39:01 UTC
Created attachment 1757566 [details]
OVN cluster logs

Comment 2 Ilya Maximets 2021-02-18 17:33:59 UTC
OK.  There are several separate issues here.

1. ovn3 actually has no connection to ovn1 or ovn2.  It's missing several
   heartbeats sent by leader, so it decides to start election.
   It constantly sends vote requests an doesn't receive any replies staying
   in 'candidate' role.  On a second attempt to start election ovn3 will
   set the flag "candidate_retrying" that will be translated to
   'is_connected':false in the _Server database that will inform clients that
   storage is disconnected.  Clients should reconnect to other servers in this
   case.  To verify we could run
       ovn-nbctl --db=<local db on ovn3> --no-leader-only -v show
   This command will fail, logs will show the reason "clustered database server
   is disconnected from cluster".  So no actual split-brain.
   
   The issue here is that cluster/show command still shows "cluster member"
   in the status field for ovn3.  That could be fixed by taking into account
   the "candidate_retrying" flag and reporting "disconnected from cluster"
   instead of "cluster member", probably with extra note "(election timeout)".
   
   I will send a patch for this.

2. Approximately a year ago there was a patch to disable inactivity probes for
   RAFT connections in ovsdb-server:
     db5a066c17bd ("raft: Disable RAFT jsonrpc inactivity probe.")
   Patch was accepted.
   Rationale was that it's enough to have RAFT heartbeats.  Well... it's not.

   RAFT heartbeats are unidirectional.  i.e. leader sends them to followers
   but not the other way around.  Missing heartbeats provokes followers to
   start election, but if leader will not receive any replies it will not do
   anything until there is a quorum, i.e. there are enough other servers to
   make decisions.
   This leads to situation that while TCP connection is established leader
   will continue to blindly send messages to it.  In our case this leads to
   growing send backlog.  Connection will be terminated eventually due to
   excessive send backlog, but this this might take a lot of time and wasted
   process memory.
   
   To fix that we need to reintroduce inactivity probes that will drop
   connection if there was no incoming traffic for a long time and remote
   server doesn't reply to the "echo" request.  Probe interval might be
   chosen based on an election timeout to avoid issues described in commit
   db5a066c17bd.
   
   Will send a patch for this issue too.

3. The test itself looks a bit strange.  It uses following firewall rules:

    iptables -A INPUT -i eth0 -p tcp --match multiport --dports 6641:6644 -j DROP
    iptables -A POSTROUTING -p tcp --match multiport --dports 6641:6644 -j DROP

   IIUC, these rules will drop all incomin and outgoing traffic with TCP
   destination ports equal to database port range.   However, each pair of
   servers usually has 2 connections between them:
   
      ovn1:<random port> <-----> ovn3:6641
      ovn1:6641          <-----> ovn3:<random port>
      
   Above iptables rules will only cut one direction of each connection
   keeping the following schema:
   
      ovn1:<random port> <------ ovn3:6641
      ovn1:6641          ------> ovn3:<random port>

   And we can't reliably tell which connection will be used by one server
   while sending messages to other one.  It could be that communication
   is perfectly fine if ovn1 will always use second connection and ovn3
   will use the first one.
   
   Is it intentional?  Or should the testcase be changed to block by the TCP
   source port too?

Comment 3 Carlos Goncalves 2021-02-19 09:00:09 UTC
Ilya, thank you very much for the prompt triaging and detailed commentary!

OK. I understand now the fencing mechanism used was not sufficient to completely fence ovn3 off.

I started from scratch, clean OVN databases. Taking into consideration the issues mentioned in point 1 and point 2 in comment #2, I think there is no additional issue but I would kindly ask you to confirm please. OVN and OVS logs attached next.

1. ovn1 elected leader, ovn2 and ovn3 followers:

	[root@ovn1 ~]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
	33c1
	Name: OVN_Northbound
	Cluster ID: bfc0 (bfc03944-bfcd-4681-aec3-f53f864b046d)
	Server ID: 33c1 (33c1c9d8-db98-47d1-bb5c-84eb5b9bddf8)
	Address: tcp:192.168.14.32:6643
	Status: cluster member
	Role: leader
	Term: 1
	Leader: self
	Vote: self

	Election timer: 1000
	Log: [2, 7]
	Entries not yet committed: 0
	Entries not yet applied: 0
	Connections: <-1962 ->1962 <-2ac8 ->2ac8
	Servers:
	    33c1 (33c1 at tcp:192.168.14.32:6643) (self) next_index=2 match_index=6
	    1962 (1962 at tcp:192.168.14.33:6643) next_index=7 match_index=6
	    2ac8 (2ac8 at tcp:192.168.14.34:6643) next_index=7 match_index=6

	[root@ovn3 ~]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
	2ac8
	Name: OVN_Northbound
	Cluster ID: bfc0 (bfc03944-bfcd-4681-aec3-f53f864b046d)
	Server ID: 2ac8 (2ac8ab9f-b547-431f-9dab-9aa64548a9b1)
	Address: tcp:192.168.14.34:6643
	Status: cluster member
	Role: follower
	Term: 1
	Leader: 33c1
	Vote: unknown

	Election timer: 1000
	Log: [2, 7]
	Entries not yet committed: 0
	Entries not yet applied: 0
	Connections: ->0000 ->1962 <-33c1 <-1962
	Servers:
	    33c1 (33c1 at tcp:192.168.14.32:6643)
	    1962 (1962 at tcp:192.168.14.33:6643)
	    2ac8 (2ac8 at tcp:192.168.14.34:6643) (self)

2. Interface eth0 was shut off for 5 minutes:

	[root@ovn3 ~]# date; ifdown eth0; sleep 300; ifup eth0
	Fri Feb 19 08:35:27 UTC 2021

3. ovn1 detected inactivity. cluster/status command output remained unchanged:
	Feb 19 08:35:36 ovn1 ovsdb-server[27848]: ovs|00023|reconnect|ERR|tcp:192.168.14.34:56670: no response to inactivity probe after 5.02 seconds, disconnecting
	Feb 19 08:35:36 ovn1 ovsdb-server[27868]: ovs|00023|reconnect|ERR|tcp:192.168.14.34:56758: no response to inactivity probe after 5.02 seconds, disconnecting

	[root@ovn1 ~]# date; ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
	Fri Feb 19 08:37:07 UTC 2021
	33c1
	Name: OVN_Northbound
	Cluster ID: bfc0 (bfc03944-bfcd-4681-aec3-f53f864b046d)
	Server ID: 33c1 (33c1c9d8-db98-47d1-bb5c-84eb5b9bddf8)
	Address: tcp:192.168.14.32:6643
	Status: cluster member
	Role: leader
	Term: 1
	Leader: self
	Vote: self

	Election timer: 1000
	Log: [2, 7]
	Entries not yet committed: 0
	Entries not yet applied: 0
	Connections: <-1962 ->1962 <-2ac8 ->2ac8
	Servers:
	    33c1 (33c1 at tcp:192.168.14.32:6643) (self) next_index=2 match_index=6
	    1962 (1962 at tcp:192.168.14.33:6643) next_index=7 match_index=6
	    2ac8 (2ac8 at tcp:192.168.14.34:6643) next_index=7 match_index=6

4. ovn1 started logging the following messages a while after ovn3 was fenced off (~3 minutes):

	Feb 19 08:38:18 ovn1 ovsdb-server[27848]: ovs|00024|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:55800, num of msgs: 50, backlog:13952.
	Feb 19 08:38:18 ovn1 ovsdb-server[27868]: ovs|00024|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:44674, num of msgs: 50, backlog: 13952.
	Feb 19 08:38:18 ovn1 ovsdb-server[27848]: ovs|00025|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:55800, num of msgs: 51, backlog: 14236.
	Feb 19 08:38:18 ovn1 ovsdb-server[27868]: ovs|00025|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:44674, num of msgs: 51, backlog: 14236.
	Feb 19 08:38:18 ovn1 ovsdb-server[27868]: ovs|00026|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:44674, num of msgs: 52, backlog: 14520.
	Feb 19 08:38:18 ovn1 ovsdb-server[27848]: ovs|00026|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:55800, num of msgs: 52, backlog: 14520.
	Feb 19 08:38:19 ovn1 ovsdb-server[27868]: ovs|00027|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:44674, num of msgs: 53, backlog: 14804.
	Feb 19 08:38:19 ovn1 ovsdb-server[27848]: ovs|00027|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:55800, num of msgs: 53, backlog: 14804.
	Feb 19 08:38:19 ovn1 ovsdb-server[27848]: ovs|00028|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:55800, num of msgs: 54, backlog: 15088.
	Feb 19 08:38:19 ovn1 ovsdb-server[27868]: ovs|00028|jsonrpc|INFO|excessive sending backlog, jsonrpc: tcp:192.168.14.34:44674, num of msgs: 54, backlog: 15088.
	Feb 19 08:38:30 ovn1 ovsdb-server[27848]: ovs|00029|jsonrpc|INFO|Dropped 31 log messages in last 10 seconds (most recently, 0 seconds ago) due to excessive rate

5. ovn3 came back online after 5 minutes and was elected leader this time, ovn1 and ovn2 followers:

	[root@ovn3 ~]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
	2ac8
	Name: OVN_Northbound
	Cluster ID: bfc0 (bfc03944-bfcd-4681-aec3-f53f864b046d)
	Server ID: 2ac8 (2ac8ab9f-b547-431f-9dab-9aa64548a9b1)
	Address: tcp:192.168.14.34:6643
	Status: cluster member
	Role: leader
	Term: 206
	Leader: self
	Vote: self

	Election timer: 1000
	Log: [2, 8]
	Entries not yet committed: 0
	Entries not yet applied: 0
	Connections: ->0000 ->1962 <-33c1 <-1962
	Servers:
	    33c1 (33c1 at tcp:192.168.14.32:6643) next_index=8 match_index=7
	    1962 (1962 at tcp:192.168.14.33:6643) next_index=8 match_index=7
	    2ac8 (2ac8 at tcp:192.168.14.34:6643) (self) next_index=7 match_index=7
	    
	[root@ovn1 ~]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
	33c1
	Name: OVN_Northbound
	Cluster ID: bfc0 (bfc03944-bfcd-4681-aec3-f53f864b046d)
	Server ID: 33c1 (33c1c9d8-db98-47d1-bb5c-84eb5b9bddf8)
	Address: tcp:192.168.14.32:6643
	Status: cluster member
	Role: follower
	Term: 206
	Leader: 2ac8
	Vote: 2ac8

	Election timer: 1000
	Log: [2, 8]
	Entries not yet committed: 0
	Entries not yet applied: 0
	Connections: <-1962 ->1962 ->2ac8 <-2ac8
	Servers:
	    33c1 (33c1 at tcp:192.168.14.32:6643) (self)
	    1962 (1962 at tcp:192.168.14.33:6643)
	    2ac8 (2ac8 at tcp:192.168.14.34:6643)

Comment 4 Carlos Goncalves 2021-02-19 09:00:46 UTC
Created attachment 1758098 [details]
OVN, OVS logs

Comment 5 Ilya Maximets 2021-02-19 16:18:27 UTC
Yes, these logs shows that cluster works as expected (taking into account 2 issues
mentioned in comment #2).

One clarification, though, is that inactivity probe disconnection happened not on
a raft connection, but on some client connection.  I suppose, it was a connection
from northd on ovn3 node to ovsdb-server on ovn1 node.  Since you shut down the
whole network this connection died and was killed due to inactivity (incactivity
probes enabled on client connections, but not on raft ones).  RAFT connections
ramained established and backlog accumulated on them as expected.

Comment 7 Ilya Maximets 2021-02-23 15:23:59 UTC
I've created a scratch build with patches applied:
  http://brew-task-repos.usersys.redhat.com/repos/scratch/imaximet/openvswitch2.13/2.13.0/97.el8fdp/

@Carlos, could you, please, test with this package?

Comment 9 Jianlin Shi 2021-03-26 02:47:31 UTC
create raft with following script:

server:
ctl_cmd="/usr/share/ovn/scripts/ovn-ctl"                                                              
ip_s=1.1.1.16                                                                                         
ip_c1=1.1.1.17                                                                                        
ip_c2=1.1.1.18                                                                                        
                                                                                                      
$ctl_cmd --db-nb-addr=$ip_s --db-nb-create-insecure-remote=yes \
                        --db-sb-addr=$ip_s --db-sb-create-insecure-remote=yes \
                        --db-nb-cluster-local-addr=$ip_s --db-sb-cluster-local-addr=$ip_s \
                        --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \
                        --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd

client1:
ctl_cmd="/usr/share/ovn/scripts/ovn-ctl"                                                              
ip_s=1.1.1.16
ip_c1=1.1.1.17                                                                                        
ip_c2=1.1.1.18                                                                                        

$ctl_cmd --db-nb-addr=$ip_c1 --db-nb-create-insecure-remote=yes \
                        --db-sb-addr=$ip_c1 --db-sb-create-insecure-remote=yes \
                        --db-nb-cluster-local-addr=$ip_c1 --db-sb-cluster-local-addr=$ip_c1 \         
                        --db-nb-cluster-remote-addr=$ip_s --db-sb-cluster-remote-addr=$ip_s \         
                        --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \           
                        --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd
client2:
ctl_cmd="/usr/share/ovn/scripts/ovn-ctl"
ip_s=1.1.1.16
ip_c1=1.1.1.17
ip_c2=1.1.1.18                                                                                        
                                                                                                      
$ctl_cmd --db-nb-addr=$ip_c2 --db-nb-create-insecure-remote=yes \
                        --db-sb-addr=$ip_c2 --db-sb-create-insecure-remote=yes \
                        --db-nb-cluster-local-addr=$ip_c2 --db-sb-cluster-local-addr=$ip_c2 \
                        --db-nb-cluster-remote-addr=$ip_s --db-sb-cluster-remote-addr=$ip_s \
                        --ovn-northd-nb-db=tcp:$ip_s:6641,tcp:$ip_c1:6641,tcp:$ip_c2:6641 \
                        --ovn-northd-sb-db=tcp:$ip_s:6642,tcp:$ip_c1:6642,tcp:$ip_c2:6642 start_northd

result on openvswitch2.13-98:

[root@wsfd-advnetlab18 bz1888829]# rpm -qa | grep -E "openvswitch2.13|ovn2.13"
openvswitch2.13-2.13.0-98.el8fdp.x86_64
ovn2.13-host-20.12.0-86.el8fdp.x86_64
ovn2.13-20.12.0-86.el8fdp.x86_64
ovn2.13-central-20.12.0-86.el8fdp.x86_64

after startup, set interface down for client2:

[root@wsfd-advnetlab18 bz1888829]# ip link set ens1f0 down 
after for a while, check the status on client2:

[root@wsfd-advnetlab18 bz1888829]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound                                                                                                   
315b                                                   
Name: OVN_Northbound
Cluster ID: 250a (250aa765-598a-4b2a-a9d8-417502ef0317)
Server ID: 315b (315bf920-b6ad-41fd-a39f-50c7e1611ae5)     
Address: tcp:1.1.1.18:6643                          
Status: disconnected from the cluster (election timeout)

<=== disconnected

Role: candidate                                           
Term: 469
Leader: 2a74                                            
Vote: self                                                
         
Election timer: 1000                            
Log: [2, 7]                                   
Entries not yet committed: 0                          
Entries not yet applied: 0                            
Connections: (->0000) (->6f6f)                        
Servers:
    315b (315b at tcp:1.1.1.18:6643) (self) (voted for 315b)                                          
    6f6f (6f6f at tcp:1.1.1.17:6643)                       
    2a74 (2a74 at tcp:1.1.1.16:6643)

status on server:

[root@wsfd-advnetlab16 bz1888829]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
2a74                                                                                                  
Name: OVN_Northbound                                                                                  
Cluster ID: 250a (250aa765-598a-4b2a-a9d8-417502ef0317)                                               
Server ID: 2a74 (2a749a1b-ddce-42db-8103-db2e474c6c17)                                                
Address: tcp:1.1.1.16:6643                                                                            
Status: cluster member                                                                                
Role: leader                                                                                          
Term: 7                                                                                               
Leader: self                                                                                          
Vote: self                                                                                            
                                                                                                      
Election timer: 1000                                                                                  
Log: [2, 4011]                                                                                        
Entries not yet committed: 0                                                                          
Entries not yet applied: 0                                                                            
Connections: ->6f6f (->315b) <-6f6f                                                                   
Servers:                                                                                              
    315b (315b at tcp:1.1.1.18:6643) next_index=4011 match_index=0                                    
    6f6f (6f6f at tcp:1.1.1.17:6643) next_index=4011 match_index=4010                                 
    2a74 (2a74 at tcp:1.1.1.16:6643) (self) next_index=4010 match_index=4010

Comment 10 Michele Baldessari 2021-04-22 09:01:51 UTC
Hi Ilya,

so we tested this on centos strem with openvswitch2.13-2.13.0-101.el8.x86_64 which I double checked (https://cbs.centos.org/koji/buildinfo?buildID=32502) has the two linked patches.

I think the case where you cut off controller-0 from controller-1 and controller-2 via a network failure and then running the cluster/status on the cut-off controller-0 will work (I haven't tested this personally, but I trust c#9 ;). 

The case that seemingly still does not work is the displaying of the cluster status on the other two nodes. There we still see no indication that one node has died.

Here's is what we did. We have the following three nodes (running in VMs):
controller-0 172.16.2.241
controller-1 172.16.2.79
controller-2 172.16.2.57

ovn-dbs is clustered across those three nodes and we virsh destroy controller-0.
Then on controller-1 we see the following:
[root@controller-1 ovs-2.13.0]# podman exec -ti ovn_cluster_north_db_server sh
sh-4.4# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
45fb
Name: OVN_Northbound
Cluster ID: 613c (613c0b6e-65af-4810-bb48-c9cbea43d442)
Server ID: 45fb (45fba88d-5980-49ab-b562-1ea6e0db266c)
Address: ssl:172.16.2.79:6643
Status: cluster member
Role: follower
Term: 2
Leader: c7dd
Vote: c7dd

Election timer: 1000
Log: [2, 118]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->c7dd (->9287) <-c7dd
Servers:
    45fb (45fb at ssl:172.16.2.79:6643) (self)
    c7dd (c7dd at ssl:172.16.2.57:6643)
    9287 (9287 at ssl:172.16.2.241:6643)

And above I have no indication that controller-0 (aka 172.16.2.241) is really gone when we query from the surviving, quorate partition.

Does the above make sense?

thanks,
Michele

Comment 11 Ilya Maximets 2021-04-22 09:41:52 UTC
(In reply to Michele Baldessari from comment #10)
> Election timer: 1000
> Log: [2, 118]
> Entries not yet committed: 0
> Entries not yet applied: 0
> Connections: ->c7dd (->9287) <-c7dd
> Servers:
>     45fb (45fb at ssl:172.16.2.79:6643) (self)
>     c7dd (c7dd at ssl:172.16.2.57:6643)
>     9287 (9287 at ssl:172.16.2.241:6643)
> 
> And above I have no indication that controller-0 (aka 172.16.2.241) is
> really gone when we query from the surviving, quorate partition.

The list of connections says that you have no established connection
with server 9287.  "(->9287)" means that there is outgoing connection
that is in "connecting" state.  And there is no incoming connection
that would be presented as "<-9287".  You can see that this server has
established incoming and outgoing  connections with c7dd, but not with
9287.

Also, on OVS 2.15 there will be extra information on when the last message
was received from each server, this will give an extra clue.

Comment 12 Michele Baldessari 2021-04-22 09:53:20 UTC
(In reply to Ilya Maximets from comment #11)
> (In reply to Michele Baldessari from comment #10)
> > Election timer: 1000
> > Log: [2, 118]
> > Entries not yet committed: 0
> > Entries not yet applied: 0
> > Connections: ->c7dd (->9287) <-c7dd
> > Servers:
> >     45fb (45fb at ssl:172.16.2.79:6643) (self)
> >     c7dd (c7dd at ssl:172.16.2.57:6643)
> >     9287 (9287 at ssl:172.16.2.241:6643)
> > 
> > And above I have no indication that controller-0 (aka 172.16.2.241) is
> > really gone when we query from the surviving, quorate partition.
> 
> The list of connections says that you have no established connection
> with server 9287.  "(->9287)" means that there is outgoing connection
> that is in "connecting" state.  And there is no incoming connection
> that would be presented as "<-9287".  You can see that this server has
> established incoming and outgoing  connections with c7dd, but not with
> 9287.
> 
> Also, on OVS 2.15 there will be extra information on when the last message
> was received from each server, this will give an extra clue.

Thanks. Can we not make this output a little more user-friendly though?
(->1234) is hardly a clue that the raft cluster agrees a node is gone.

A nicely spelled out "Not connected to the cluster" would give us a hint
and displaying disconnected only in the unquorate partion does not cover
the case when a node died. Last message received is certainly an improvement
but a clear "Not part of the cluster right now" like any other cluster manager
does would be helpful.

cheers,
Michele

Comment 13 Ilya Maximets 2021-04-22 13:28:51 UTC
(In reply to Michele Baldessari from comment #12)
> (In reply to Ilya Maximets from comment #11)
> > (In reply to Michele Baldessari from comment #10)
> > > Election timer: 1000
> > > Log: [2, 118]
> > > Entries not yet committed: 0
> > > Entries not yet applied: 0
> > > Connections: ->c7dd (->9287) <-c7dd
> > > Servers:
> > >     45fb (45fb at ssl:172.16.2.79:6643) (self)
> > >     c7dd (c7dd at ssl:172.16.2.57:6643)
> > >     9287 (9287 at ssl:172.16.2.241:6643)
> > > 
> > > And above I have no indication that controller-0 (aka 172.16.2.241) is
> > > really gone when we query from the surviving, quorate partition.
> > 
> > The list of connections says that you have no established connection
> > with server 9287.  "(->9287)" means that there is outgoing connection
> > that is in "connecting" state.  And there is no incoming connection
> > that would be presented as "<-9287".  You can see that this server has
> > established incoming and outgoing  connections with c7dd, but not with
> > 9287.
> > 
> > Also, on OVS 2.15 there will be extra information on when the last message
> > was received from each server, this will give an extra clue.
> 
> Thanks. Can we not make this output a little more user-friendly though?
> (->1234) is hardly a clue that the raft cluster agrees a node is gone.
> 
> A nicely spelled out "Not connected to the cluster" would give us a hint
> and displaying disconnected only in the unquorate partion does not cover
> the case when a node died. Last message received is certainly an improvement
> but a clear "Not part of the cluster right now" like any other cluster
> manager
> does would be helpful.

The problem here is that servers do not agree on who is part of a cluster and
who is not.  The only thing we can say is if this particular server has
connection to other particular server.  And this is reflected in a list of
connections.  We can also have connection only in one direction and this will
work for some time until inactivity probe will cut them off.

Technically, the only requirement for the raft cluster to function correctly
is that half of followers should have connection with the leader.  But it's not
required for followers to have established connection between each other as
they do not send anything between each other.

So, from the point of view of one follower it's not possible to say if another
follower is part of the cluster or not.

We could add a more clear indication to the cluster/status that this server
has no established outgoing connection with the other one.
If you think that it's needed, we can file an RFE and schedule this work.
As a solution for now,

   ovs-appctl cluster/status OVN_Northbound | grep -q -E "\(->[a-z0-9]*\)"

if executed for a leader, will say if one of the followers is not connected
right now.

Note that even if one of the followers is not connected right now, it doesn't
mean that it's not a transient network issue that could go away and raft will
not even notice that.  Leader doesn't care if all followers are alive while
there is a quorum.  Basically, if the question is "is this server is logically
member of a raft cluster?" each server can only say for itself with 100% certainty.
Leader can say that it didn't see one of the followers for a while, but it's
not a reason to kick the server out of the cluster.

Comment 14 Carlos Goncalves 2021-04-22 14:07:52 UTC
I agree with Michele here -- the output is not user-friendly at all. 

Are we expecting users to possess advanced internal knowledge of OVS/OVN to the point they can intuitively understand that a combination of server ids (not even their IPs), parenthesis and arrows have a special meaning, what that is and their implications to the functional state of the cluster? I would argue not. The fact that Michele and me, two engineers with HA and network backgrounds, could not even parse such status output may indicate other users would not be able to either. We would be happy to continue working with OVN folks to further improve the output for a better UX by testing and providing feedback.

Thanks for the bug fixing thus far, much appreciated!

Comment 17 Jianlin Shi 2021-04-26 02:18:40 UTC
Verified on ovn2.13-20.12.0-104.el8fdp.x86_64:

[root@wsfd-advnetlab21 bz1929690]# rpm -qa | grep ovn2.13
ovn2.13-20.12.0-104.el8fdp.x86_64
ovn2.13-central-20.12.0-104.el8fdp.x86_64
ovn2.13-host-20.12.0-104.el8fdp.x86_64

after setup with setup script in comment 9:

server:
[root@wsfd-advnetlab21 bz1929690]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound                                                                                                   
82b9
Name: OVN_Northbound
Cluster ID: 9353 (935355fa-9d7c-4c49-8b1a-4bc0f371bee5)
Server ID: 82b9 (82b9bef6-8b20-4b40-be5b-b888e3dc1790)
Address: tcp:1.1.1.21:6643
Status: cluster member
Role: leader
Term: 1
Leader: self
Vote: self

Last Election started 49330 ms ago, reason: timeout
Last Election won: 49330 ms ago
Election timer: 1000
Log: [2, 7]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: <-7516 ->7516 <-5625 ->5625
Disconnections: 0
Servers:
    82b9 (82b9 at tcp:1.1.1.21:6643) (self) next_index=2 match_index=6
    7516 (7516 at tcp:1.1.1.18:6643) next_index=7 match_index=6 last msg 45 ms ago
    5625 (5625 at tcp:1.1.1.17:6643) next_index=7 match_index=6 last msg 45 ms ago

client2:

[root@wsfd-advnetlab18 bz1929690]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound                                                                                                   
7516
Name: OVN_Northbound
Cluster ID: 9353 (935355fa-9d7c-4c49-8b1a-4bc0f371bee5)
Server ID: 7516 (751659ba-4cac-4520-a61b-04e951312b18)
Address: tcp:1.1.1.18:6643
Status: cluster member
Role: follower
Term: 1
Leader: 82b9
Vote: unknown

Election timer: 1000
Log: [2, 7]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->0000 <-82b9 <-5625 ->5625
Disconnections: 0
Servers:
    82b9 (82b9 at tcp:1.1.1.21:6643) last msg 101 ms ago
    7516 (7516 at tcp:1.1.1.18:6643) (self)
    5625 (5625 at tcp:1.1.1.17:6643)

set interface on client2 down:

[root@wsfd-advnetlab18 bz1929690]# ip link set ens1f0 down

after several seconds:
server:
[root@wsfd-advnetlab21 bz1929690]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound                                                                                                   
82b9
Name: OVN_Northbound
Cluster ID: 9353 (935355fa-9d7c-4c49-8b1a-4bc0f371bee5)
Server ID: 82b9 (82b9bef6-8b20-4b40-be5b-b888e3dc1790)
Address: tcp:1.1.1.21:6643
Status: cluster member
Role: leader
Term: 1
Leader: self
Vote: self

Last Election started 234737 ms ago, reason: timeout
Last Election won: 234737 ms ago
Election timer: 1000
Log: [2, 7]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: (->7516) <-5625 ->5625

<=== connection for 7516 is not connected

Disconnections: 1
Servers:
    82b9 (82b9 at tcp:1.1.1.21:6643) (self) next_index=2 match_index=6
    7516 (7516 at tcp:1.1.1.18:6643) next_index=7 match_index=6 last msg 65341 ms ago
    5625 (5625 at tcp:1.1.1.17:6643) next_index=7 match_index=6 last msg 292 ms ago

client2:

[root@wsfd-advnetlab18 bz1929690]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
7516
Name: OVN_Northbound
Cluster ID: 9353 (935355fa-9d7c-4c49-8b1a-4bc0f371bee5)
Server ID: 7516 (751659ba-4cac-4520-a61b-04e951312b18)
Address: tcp:1.1.1.18:6643
Status: disconnected from the cluster (election timeout)

<=== disconnected

Role: candidate
Term: 92
Leader: unknown
Vote: self

Last Election started 1162 ms ago, reason: timeout
Election timer: 1000
Log: [2, 7]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: (->0000) (->5625)
Disconnections: 2
Servers:
    82b9 (82b9 at tcp:1.1.1.21:6643) last msg 135209 ms ago
    7516 (7516 at tcp:1.1.1.18:6643) (self) (voted for 7516)
    5625 (5625 at tcp:1.1.1.17:6643)

Comment 18 Jianlin Shi 2021-04-26 02:30:25 UTC
also verified on ovn-2021-21.03.0-21.el8fdp.x86_64:

[root@wsfd-advnetlab21 bz1929690]# rpm -qa | grep ovn
ovn-2021-central-21.03.0-21.el8fdp.x86_64
ovn-2021-21.03.0-21.el8fdp.x86_64
ovn-2021-host-21.03.0-21.el8fdp.x86_64

server:

[root@wsfd-advnetlab21 bz1929690]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
e915
Name: OVN_Northbound
Cluster ID: 63b2 (63b268cb-495f-40bd-9133-8105bca56a26)
Server ID: e915 (e915b002-865e-469e-b8f1-fec881c09526)
Address: tcp:1.1.1.21:6643
Status: cluster member
Role: leader
Term: 1
Leader: self
Vote: self

Last Election started 271136 ms ago, reason: timeout
Last Election won: 271136 ms ago
Election timer: 1000
Log: [2, 7]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: (->ad9e) <-9c0f ->9c0f

<=== disconnected

Disconnections: 1
Servers:
    ad9e (ad9e at tcp:1.1.1.18:6643) next_index=7 match_index=6 last msg 237056 ms ago
    e915 (e915 at tcp:1.1.1.21:6643) (self) next_index=2 match_index=6
    9c0f (9c0f at tcp:1.1.1.17:6643) next_index=7 match_index=6 last msg 222 ms ago

client2:

[root@wsfd-advnetlab18 bz1929690]# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
ad9e
Name: OVN_Northbound
Cluster ID: 63b2 (63b268cb-495f-40bd-9133-8105bca56a26)
Server ID: ad9e (ad9eae10-f9e5-436d-83ff-edcda589523e)
Address: tcp:1.1.1.18:6643
Status: disconnected from the cluster (election timeout)

<== disconnected

Role: candidate
Term: 177
Leader: unknown
Vote: self

Last Election started 642 ms ago, reason: timeout
Election timer: 1000
Log: [2, 7]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: (->0000) (->9c0f)
Disconnections: 2
Servers:
    ad9e (ad9e at tcp:1.1.1.18:6643) (self) (voted for ad9e)
    e915 (e915 at tcp:1.1.1.21:6643) last msg 260801 ms ago
    9c0f (9c0f at tcp:1.1.1.17:6643)

Comment 19 Carlos Goncalves 2021-04-26 16:32:57 UTC
FYI, I opened https://bugzilla.redhat.com/show_bug.cgi?id=1953688 soliciting improvements to the output of the cluster status command.

Comment 21 errata-xmlrpc 2021-05-20 19:29:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (openvswitch2.13 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2083


Note You need to log in before you can comment on or make changes to this bug.