Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1788906

Summary: ovsdb-server running in standby mode reconnects to active because of no probe interval response
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Numan Siddique <nusiddiq>
Component: openvswitch2.12Assignee: Numan Siddique <nusiddiq>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: RHEL 7.7CC: ctrautma, jhsiao, jishi, kfida, ovs-qe, ralongi, tredaelli
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openvswitch2.11-2.11.0-17.el7fdn Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1788800 Environment:
Last Closed: 2020-03-10 09:36:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ovnnb_db.db file none

Description Numan Siddique 2020-01-08 11:04:35 UTC
+++ This bug was initially created as a clone of Bug #1788800 +++

Description of problem:

If active ovsdb-server doesn't respond to the echo request from the standby ovsdb-servers (in the active/passive deployment) within 5 seconds, the standby ovsdb-server disconnects. And if active ovsdb-server is heavily loaded then this could result in continous loop of connect/disconnect.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Jianlin Shi 2020-02-05 09:33:06 UTC
reproduced with following steps:

install pcs on two systems:
yum -y install pcs pacemaker fence-agents-all

setenforce 0
systemctl start openvswitch

then setup pcs with following script:

setenforce 0
systemctl start openvswitch
ip_c1=20.0.30.26
ip_c2=20.0.30.25
ip_v=20.0.30.100
(sleep 2;echo "hacluster"; sleep 2; echo "redhat" ) |pcs cluster auth  $ip_c1 $ip_c2
sleep 5
pcs cluster setup --force --start --name my_cluster $ip_c1 $ip_c2
pcs cluster enable --all

pcs property set stonith-enabled=false
pcs property set no-quorum-policy=ignore
pcs cluster cib tmp-cib.xml
sleep 10
cp tmp-cib.xml tmp-cib.deltasrc
pcs resource delete ip-$ip_v
pcs resource delete ovndb_servers-master
sleep 5
pcs status

pcs -f tmp-cib.xml resource create ip-$ip_v ocf:heartbeat:IPaddr2 ip=$ip_v op monitor interval=30s
sleep 5
pcs -f tmp-cib.xml resource create ovndb_servers  ocf:ovn:ovndb-servers manage_northd=yes master_ip=$ip_v nb_master_port=6641 sb_master_port=6642 master
sleep 5
pcs -f tmp-cib.xml resource meta ovndb_servers-master notify=true
pcs -f tmp-cib.xml constraint order start ip-$ip_v then promote ovndb_servers-master
pcs -f tmp-cib.xml constraint colocation add ip-$ip_v with master ovndb_servers-master

#pcs -f tmp-cib.xml constraint location ip-$ip_v prefers $ip_c2=1000
#pcs -f tmp-cib.xml constraint location ovndb_servers-master prefers $ip_c2=1000
#pcs -f tmp-cib.xml constraint location ip-$ip_v prefers $ip_c1=500
#pcs -f tmp-cib.xml constraint location ovndb_servers-master prefers $ip_c1=500

pcs cluster cib-push tmp-cib.xml diff-against=tmp-cib.deltasrc

then copy ovnnb_db.db attached to /etc/ovn
then restart resource with: pcs resource restart ovndb_servers

reproduced on ovs2.12.0-10:

[root@hp-dl380pg8-12 ovs2.12.0-10]# rpm -ivh *                                                                                                                                                              
Preparing...                          ################################# [100%]                                                                                                                              
Updating / installing...                                                                                                                                                                                    
   1:openvswitch2.12-2.12.0-10.el7fdp ################################# [100%]                                                                                                                                                                                                             
[root@hp-dl380pg8-12 ovn2.12.0-26]# rpm -ivh *                                                                                                                                       
Preparing...                          ################################# [100%]                                                                                                                              
Updating / installing...                                                                                                                                                       
   1:ovn2.12-2.12.0-26.el7fdp         ################################# [ 33%]                                                                                                 
Unit ovn-northd.service could not be found.                                                                                                                                                                 
   2:ovn2.12-central-2.12.0-26.el7fdp ################################# [ 67%]                                                                                                                              
Unit ovn-controller.service could not be found.                                                                                                                                                     
   3:ovn2.12-host-2.12.0-26.el7fdp    ################################# [100%] 

[root@hp-dl380pg8-12 bz1788800]# pcs status                                                                                                                                                                 
Cluster name: my_cluster                                                                                                                                                                                    
                                                                                                                                                                                                            
WARNINGS:                                                                                                                                                                                                   
Corosync and pacemaker node names do not match (IPs used in setup?)                                                                                                                                         
                                                                                                                                                                                                            
Stack: corosync                                                                                                                                                                                             
Current DC: dell-per740-12.rhts.eng.pek2.redhat.com (version 1.1.20-5.el7-3c4c782f70) - partition with quorum                                                      
Last updated: Wed Feb  5 04:05:39 2020                                                                                                                                                                      
Last change: Wed Feb  5 04:05:02 2020 by root via crm_resource on hp-dl380pg8-12.rhts.eng.pek2.redhat.com                                                                                                   
                                                                                                                                                                                                            
2 nodes configured                                                                                                                                                                   
3 resources configured                                                                                                                                                                                      
                                                                                                                                                                               
Online: [ dell-per740-12.rhts.eng.pek2.redhat.com hp-dl380pg8-12.rhts.eng.pek2.redhat.com ]                                                                                    
                                                                                                                                                                                                            
Full list of resources:                                                                                                                                                                                     
                                                                                                                                                                                                    
 ip-20.0.30.100 (ocf::heartbeat:IPaddr2):       Started hp-dl380pg8-12.rhts.eng.pek2.redhat.com                                                                     
 Master/Slave Set: ovndb_servers-master [ovndb_servers]                                                                                                             
     Masters: [ hp-dl380pg8-12.rhts.eng.pek2.redhat.com ]                                                                                                                                                   
     Slaves: [ dell-per740-12.rhts.eng.pek2.redhat.com ]                                                                                                                                                    
                                                                                                                                                                                   
Daemon Status:                                                                                                                                                                                              
  corosync: active/enabled                                                                                                                                                                                  
  pacemaker: active/enabled                                                                                                                                                                                 
  pcsd: active/disabled

top result on master (after about 5m):

Tasks: 334 total,   3 running, 331 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.6 us,  0.6 sy,  0.0 ni, 95.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32736216 total, 25530636 free,  4704328 used,  2501252 buff/cache
KiB Swap: 16515068 total, 16515068 free,        0 used. 27531560 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                 
32726 root      20   0 3623956   3.4g   1780 R 100.0 10.9   2:20.24 ovsdb-server

log in ovsdb-server-sb.log on slave:

2020-02-05T09:07:39.691Z|00048|reconnect|ERR|tcp:20.0.30.100:6642: no response to inactivity probe after 5 seconds, disconnecting
2020-02-05T09:07:39.691Z|00049|reconnect|INFO|tcp:20.0.30.100:6642: connection dropped
2020-02-05T09:07:40.693Z|00050|reconnect|INFO|tcp:20.0.30.100:6642: connecting...
2020-02-05T09:07:40.694Z|00051|reconnect|INFO|tcp:20.0.30.100:6642: connected

Verified on ovs2.12.0-21:

[root@hp-dl380pg8-12 bz1788800]# pcs status
Cluster name: my_cluster

WARNINGS:
Corosync and pacemaker node names do not match (IPs used in setup?)

Stack: corosync
Current DC: dell-per740-12.rhts.eng.pek2.redhat.com (version 1.1.20-5.el7-3c4c782f70) - partition with quorum
Last updated: Wed Feb  5 04:31:59 2020
Last change: Wed Feb  5 04:12:43 2020 by root via crm_resource on dell-per740-12.rhts.eng.pek2.redhat.com

2 nodes configured
3 resources configured

Online: [ dell-per740-12.rhts.eng.pek2.redhat.com hp-dl380pg8-12.rhts.eng.pek2.redhat.com ]

Full list of resources:

 ip-20.0.30.100 (ocf::heartbeat:IPaddr2):       Started hp-dl380pg8-12.rhts.eng.pek2.redhat.com
 Master/Slave Set: ovndb_servers-master [ovndb_servers]
     Masters: [ hp-dl380pg8-12.rhts.eng.pek2.redhat.com ]
     Slaves: [ dell-per740-12.rhts.eng.pek2.redhat.com ]

Failed Resource Actions:
* ovndb_servers_monitor_10000 on hp-dl380pg8-12.rhts.eng.pek2.redhat.com 'unknown error' (1): call=34, status=Timed Out, exitreason='',
    last-rc-change='Wed Feb  5 04:13:35 2020', queued=0ms, exec=0ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/disabled
[root@hp-dl380pg8-12 bz1788800]# rpm -qa | grep -E "openvswitch|ovn"
kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch
ovn2.12-central-2.12.0-26.el7fdp.x86_64
ovn2.12-host-2.12.0-26.el7fdp.x86_64
openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch
kernel-kernel-networking-openvswitch-ovn-basic-1.0-18.noarch
openvswitch2.12-2.12.0-21.el7fdp.x86_64
ovn2.12-2.12.0-26.el7fdp.x86_64

top - 04:32:19 up  7:38,  2 users,  load average: 0.01, 0.04, 0.14
Tasks: 333 total,   1 running, 332 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.1 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32736216 total, 25063884 free,  5166016 used,  2506316 buff/cache
KiB Swap: 16515068 total, 16515068 free,        0 used. 27067668 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                 
34444 root      rt   0  192140  95920  70836 S   1.0  0.3   0:13.14 corosync                                                                                                                                
    9 root      20   0       0      0      0 S   0.3  0.0   0:22.18 rcu_sched                                                                                                                               
43201 root      20   0  162292   2548   1580 R   0.3  0.0   0:00.03 top                                                                                                                                     
    1 root      20   0  194168   7320   4216 S   0.0  0.0   0:35.60 sys

no reconnect log in ovsdb-server-sb.log on slave

Comment 3 Jianlin Shi 2020-02-05 09:35:55 UTC
Created attachment 1657858 [details]
ovnnb_db.db file

Comment 5 errata-xmlrpc 2020-03-10 09:36:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0745