Bug 1788906
Summary: | ovsdb-server running in standby mode reconnects to active because of no probe interval response | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Numan Siddique <nusiddiq> | ||||
Component: | openvswitch2.12 | Assignee: | Numan Siddique <nusiddiq> | ||||
Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | RHEL 7.7 | CC: | ctrautma, jhsiao, jishi, kfida, ovs-qe, ralongi, tredaelli | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | openvswitch2.11-2.11.0-17.el7fdn | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 1788800 | Environment: | |||||
Last Closed: | 2020-03-10 09:36:07 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Numan Siddique
2020-01-08 11:04:35 UTC
reproduced with following steps: install pcs on two systems: yum -y install pcs pacemaker fence-agents-all setenforce 0 systemctl start openvswitch then setup pcs with following script: setenforce 0 systemctl start openvswitch ip_c1=20.0.30.26 ip_c2=20.0.30.25 ip_v=20.0.30.100 (sleep 2;echo "hacluster"; sleep 2; echo "redhat" ) |pcs cluster auth $ip_c1 $ip_c2 sleep 5 pcs cluster setup --force --start --name my_cluster $ip_c1 $ip_c2 pcs cluster enable --all pcs property set stonith-enabled=false pcs property set no-quorum-policy=ignore pcs cluster cib tmp-cib.xml sleep 10 cp tmp-cib.xml tmp-cib.deltasrc pcs resource delete ip-$ip_v pcs resource delete ovndb_servers-master sleep 5 pcs status pcs -f tmp-cib.xml resource create ip-$ip_v ocf:heartbeat:IPaddr2 ip=$ip_v op monitor interval=30s sleep 5 pcs -f tmp-cib.xml resource create ovndb_servers ocf:ovn:ovndb-servers manage_northd=yes master_ip=$ip_v nb_master_port=6641 sb_master_port=6642 master sleep 5 pcs -f tmp-cib.xml resource meta ovndb_servers-master notify=true pcs -f tmp-cib.xml constraint order start ip-$ip_v then promote ovndb_servers-master pcs -f tmp-cib.xml constraint colocation add ip-$ip_v with master ovndb_servers-master #pcs -f tmp-cib.xml constraint location ip-$ip_v prefers $ip_c2=1000 #pcs -f tmp-cib.xml constraint location ovndb_servers-master prefers $ip_c2=1000 #pcs -f tmp-cib.xml constraint location ip-$ip_v prefers $ip_c1=500 #pcs -f tmp-cib.xml constraint location ovndb_servers-master prefers $ip_c1=500 pcs cluster cib-push tmp-cib.xml diff-against=tmp-cib.deltasrc then copy ovnnb_db.db attached to /etc/ovn then restart resource with: pcs resource restart ovndb_servers reproduced on ovs2.12.0-10: [root@hp-dl380pg8-12 ovs2.12.0-10]# rpm -ivh * Preparing... ################################# [100%] Updating / installing... 1:openvswitch2.12-2.12.0-10.el7fdp ################################# [100%] [root@hp-dl380pg8-12 ovn2.12.0-26]# rpm -ivh * Preparing... ################################# [100%] Updating / installing... 1:ovn2.12-2.12.0-26.el7fdp ################################# [ 33%] Unit ovn-northd.service could not be found. 2:ovn2.12-central-2.12.0-26.el7fdp ################################# [ 67%] Unit ovn-controller.service could not be found. 3:ovn2.12-host-2.12.0-26.el7fdp ################################# [100%] [root@hp-dl380pg8-12 bz1788800]# pcs status Cluster name: my_cluster WARNINGS: Corosync and pacemaker node names do not match (IPs used in setup?) Stack: corosync Current DC: dell-per740-12.rhts.eng.pek2.redhat.com (version 1.1.20-5.el7-3c4c782f70) - partition with quorum Last updated: Wed Feb 5 04:05:39 2020 Last change: Wed Feb 5 04:05:02 2020 by root via crm_resource on hp-dl380pg8-12.rhts.eng.pek2.redhat.com 2 nodes configured 3 resources configured Online: [ dell-per740-12.rhts.eng.pek2.redhat.com hp-dl380pg8-12.rhts.eng.pek2.redhat.com ] Full list of resources: ip-20.0.30.100 (ocf::heartbeat:IPaddr2): Started hp-dl380pg8-12.rhts.eng.pek2.redhat.com Master/Slave Set: ovndb_servers-master [ovndb_servers] Masters: [ hp-dl380pg8-12.rhts.eng.pek2.redhat.com ] Slaves: [ dell-per740-12.rhts.eng.pek2.redhat.com ] Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/disabled top result on master (after about 5m): Tasks: 334 total, 3 running, 331 sleeping, 0 stopped, 0 zombie %Cpu(s): 3.6 us, 0.6 sy, 0.0 ni, 95.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 32736216 total, 25530636 free, 4704328 used, 2501252 buff/cache KiB Swap: 16515068 total, 16515068 free, 0 used. 27531560 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 32726 root 20 0 3623956 3.4g 1780 R 100.0 10.9 2:20.24 ovsdb-server log in ovsdb-server-sb.log on slave: 2020-02-05T09:07:39.691Z|00048|reconnect|ERR|tcp:20.0.30.100:6642: no response to inactivity probe after 5 seconds, disconnecting 2020-02-05T09:07:39.691Z|00049|reconnect|INFO|tcp:20.0.30.100:6642: connection dropped 2020-02-05T09:07:40.693Z|00050|reconnect|INFO|tcp:20.0.30.100:6642: connecting... 2020-02-05T09:07:40.694Z|00051|reconnect|INFO|tcp:20.0.30.100:6642: connected Verified on ovs2.12.0-21: [root@hp-dl380pg8-12 bz1788800]# pcs status Cluster name: my_cluster WARNINGS: Corosync and pacemaker node names do not match (IPs used in setup?) Stack: corosync Current DC: dell-per740-12.rhts.eng.pek2.redhat.com (version 1.1.20-5.el7-3c4c782f70) - partition with quorum Last updated: Wed Feb 5 04:31:59 2020 Last change: Wed Feb 5 04:12:43 2020 by root via crm_resource on dell-per740-12.rhts.eng.pek2.redhat.com 2 nodes configured 3 resources configured Online: [ dell-per740-12.rhts.eng.pek2.redhat.com hp-dl380pg8-12.rhts.eng.pek2.redhat.com ] Full list of resources: ip-20.0.30.100 (ocf::heartbeat:IPaddr2): Started hp-dl380pg8-12.rhts.eng.pek2.redhat.com Master/Slave Set: ovndb_servers-master [ovndb_servers] Masters: [ hp-dl380pg8-12.rhts.eng.pek2.redhat.com ] Slaves: [ dell-per740-12.rhts.eng.pek2.redhat.com ] Failed Resource Actions: * ovndb_servers_monitor_10000 on hp-dl380pg8-12.rhts.eng.pek2.redhat.com 'unknown error' (1): call=34, status=Timed Out, exitreason='', last-rc-change='Wed Feb 5 04:13:35 2020', queued=0ms, exec=0ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/disabled [root@hp-dl380pg8-12 bz1788800]# rpm -qa | grep -E "openvswitch|ovn" kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch ovn2.12-central-2.12.0-26.el7fdp.x86_64 ovn2.12-host-2.12.0-26.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch kernel-kernel-networking-openvswitch-ovn-basic-1.0-18.noarch openvswitch2.12-2.12.0-21.el7fdp.x86_64 ovn2.12-2.12.0-26.el7fdp.x86_64 top - 04:32:19 up 7:38, 2 users, load average: 0.01, 0.04, 0.14 Tasks: 333 total, 1 running, 332 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.1 us, 0.1 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 32736216 total, 25063884 free, 5166016 used, 2506316 buff/cache KiB Swap: 16515068 total, 16515068 free, 0 used. 27067668 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 34444 root rt 0 192140 95920 70836 S 1.0 0.3 0:13.14 corosync 9 root 20 0 0 0 0 S 0.3 0.0 0:22.18 rcu_sched 43201 root 20 0 162292 2548 1580 R 0.3 0.0 0:00.03 top 1 root 20 0 194168 7320 4216 S 0.0 0.0 0:35.60 sys no reconnect log in ovsdb-server-sb.log on slave Created attachment 1657858 [details]
ovnnb_db.db file
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0745 |