Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1512568 - OVN services does not recovered on OVN master node after killing ovsdb-server services[nb/sb] or ovn-northd service
OVN services does not recovered on OVN master node after killing ovsdb-serve...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch (Show other bugs)
12.0 (Pike)
Unspecified Unspecified
high Severity urgent
: z1
: 12.0 (Pike)
Assigned To: Numan Siddique
Eran Kuris
: TechPreview, Triaged, ZStream
Depends On:
Blocks: 1433534
  Show dependency treegraph
 
Reported: 2017-11-13 09:33 EST by Eran Kuris
Modified: 2018-02-15 17:33 EST (History)
9 users (show)

See Also:
Fixed In Version: openvswitch-2.7.3-3.git20180112.el7fdp
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-01-30 15:25:08 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1731934 None None None 2017-11-13 09:33 EST
Red Hat Product Errata RHBA-2018:0248 normal SHIPPED_LIVE Red Hat OpenStack Platform 12 Bug Fix and Enhancement Advisory 2018-02-15 22:46:52 EST

  None (edit)
Description Eran Kuris 2017-11-13 09:33:53 EST
Description of problem:
Auto recover does not work on OVN master node after killing ovnDB services or north-d service.
When running kill -9 of ovsdb-server [ovnnb_db.pid/ovnsb_db.pid] or ovn-northd
The expected behavior is one of the slaves nodes will take the OVN master role.
Pacemaker should detect that the services are down and move the role to the slave node.  
After debugging with dev its looks like that recovering script on the master is not called.

Version-Release number of selected component (if applicable):
rpm -qa |grep ovn 
openstack-nova-novncproxy-16.0.3-0.20171028031400.60d6e87.el7ost.noarch
puppet-ovn-11.3.1-0.20170825135756.c03c3ed.el7ost.noarch
python-networking-ovn-3.0.1-0.20171005161553.0cde8a5.el7ost.noarch
novnc-0.6.1-1.el7ost.noarch
openvswitch-ovn-central-2.7.2-4.git20170719.el7fdp.x86_64
openvswitch-ovn-host-2.7.2-4.git20170719.el7fdp.x86_64
openvswitch-ovn-common-2.7.2-4.git20170719.el7fdp.x86_64
(overcloud) [root@controller-2 ~]# rpm -qa |grep pacemaker
pacemaker-cli-1.1.16-12.el7_4.4.x86_64
ansible-pacemaker-1.0.3-2.el7ost.noarch
pacemaker-1.1.16-12.el7_4.4.x86_64
puppet-pacemaker-0.6.1-0.20171024215340.9a46ecd.el7ost.noarch
pacemaker-libs-1.1.16-12.el7_4.4.x86_64
pacemaker-cluster-libs-1.1.16-12.el7_4.4.x86_

How reproducible:
100%

Steps to Reproduce:
1.deploy HA setup with OVN 
2.Kill -9 ovn-northd service / ovsdb-server   on Master node
3.verify that one of the slave node change the status to be "Master"

https://drive.google.com/a/redhat.com/file/d/1v_4oDMM1jQaQ7Ey40vUgrIaiFlGjF4lK/view?usp=sharing
Comment 1 Numan Siddique 2017-11-13 09:48:28 EST
One thing which is missing from OVN pacemaker OCf script is that - on the master node, it doesn't check the health of ovn-northd. This needs to be fixed.

But the main issue here is that, on the node where OVN db servers are running as master, pacemaker is not calling the OVN OCF script periodically with the "monitor" action. Where as it calls this script on the slave nodes. When a slave node is made as master, we see the same behavior. And the node which was master, when it becomes slave, the OCF script gets called periodically.

Its OSP12 setup with all the other pacemaker services run as bundles and only OVN db service runs as a baremetal resource.

@Michelle - You have any comments on this ?
Comment 3 Numan Siddique 2017-11-13 12:40:50 EST
Hi Michele,
We have a setup. We can definitely look into this anytime you are fine with.

Thanks
Numan
Comment 7 Numan Siddique 2017-11-22 07:20:22 EST
Submitted the patch upstream to fix the issue - https://patchwork.ozlabs.org/patch/839022/
Comment 8 Numan Siddique 2017-12-04 00:43:12 EST
The laest patch - https://patchwork.ozlabs.org/patch/844113/
Comment 9 Numan Siddique 2017-12-06 02:46:20 EST
The patch to fix this issue is merged in master/branch/2.8 and branch 2.7 - https://github.com/openvswitch/ovs/commit/e7b9b17cd096c569b1c4d408b423ecedb9497c41
Comment 14 Eran Kuris 2018-01-28 10:38:26 EST
fixed verified 
[stack@undercloud-0 ~]$ cat /etc/yum.repos.d/latest-installed 
12   -p 2018-01-26.2
[root@controller-0 ~]# rpm -qa |grep openvswitch-2.7.3-3
python-openvswitch-2.7.3-3.git20180112.el7fdp.noarch
openvswitch-2.7.3-3.git20180112.el7fdp.x86_64
Comment 17 errata-xmlrpc 2018-01-30 15:25:08 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0248

Note You need to log in before you can comment on or make changes to this bug.