Bug 1636714
Summary: | [OVN] after restarting ovn-dbs-bundle-docker (ovn-northd) on master node the service stuck in Stopped status [regression] | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Eran Kuris <ekuris> | ||||
Component: | openvswitch | Assignee: | Numan Siddique <nusiddiq> | ||||
Status: | CLOSED ERRATA | QA Contact: | Eran Kuris <ekuris> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 14.0 (Rocky) | CC: | aconole, apevec, atelang, bcafarel, chrisw, dalvarez, lhh, lmartins, majopela, nusiddiq, nyechiel, rhos-maint, sclewis, tvignaud | ||||
Target Milestone: | beta | Keywords: | Triaged | ||||
Target Release: | 14.0 (Rocky) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | openvswitch2.10-2.10.0-28.el7fdp | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-01-11 11:53:41 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
With "docker restart", since it doesn't stop the ovsdb-server's gracefully, the ovsdb-server pid files remain. And when the the service is started, ovn-ctl returns the status as "not-running" since it calls "pidfile_is_running" for the old file name. It requires a fix either in ovn-ctl to delete stale pid files before starting the services or delete the pidfiles in the actual binary before creating new one when --pidfile option is specified. I will propose a fix upstream. Submitted the patch to fix this - https://patchwork.ozlabs.org/patch/981066/ Looks like this bug is also present in ovs 2.9, but the fix will look different. Is that true? I see in start_ovsdb__() local pid ... eval pid=\$DB_${DB}_PID ... if pidfile_is_running $pid; then .... fi In that case, would a possible fix be: - test -e "$pidfile" && pid=`cat "$pidfile"` && pid_exists "$pid" + test -e "$pidfile" && ispid=`cat "$pidfile"` && pid_exists "$ispid" ? (In reply to Aaron Conole from comment #5) > Looks like this bug is also present in ovs 2.9, but the fix will look > different. Is that true? I see in start_ovsdb__() > > local pid > ... > eval pid=\$DB_${DB}_PID > ... > if pidfile_is_running $pid; then > .... > fi > > In that case, would a possible fix be: > > - test -e "$pidfile" && pid=`cat "$pidfile"` && pid_exists "$pid" > + test -e "$pidfile" && ispid=`cat "$pidfile"` && pid_exists "$ispid" > > ? I think the same fix will do that. The issue is because the local function 'pidfile_is_running' is overriding the 'pid' variable. In the u/s fix, I changed the name of the variable "pid" to "db_pid_file" in the start_ovsdb__(). The fix is already commited in u/s 2.9 branch - https://github.com/openvswitch/ovs/commit/4c7a432154a7b379cd97d26a51caaa155f35b449 I will backport it in 2.9 d/s. the issue was fixed on OpenStack/14.0-RHEL-7/2018-11-22.2/ [root@controller-0 ~]# rpm -qa | grep openvsw rhosp-openvswitch-2.10-0.1.el7ost.noarch openvswitch2.10-2.10.0-28.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-5.el7fdp.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045 |
Created attachment 1491282 [details] logs Description of problem: When I am restarting the ovn-dbs-bundle-docker (ovn-northd) docker on the master node [controller-0] Pacemaker reprogramming / populate another controller and it becomes a master [for example controller-1]. The issue, in that case, controller-0 should return to "slave" status. The actual result is that ovn-dbs-bundle-0 is in Stopped status. Version-Release number of selected component (if applicable): OpenStack/14.0-RHEL-7/2018-10-02.2 puppet-ovn-13.3.1-0.20180907024738.b9a1e0b.el7ost.noarch rhosp-openvswitch-ovn-common-2.10-0.1.el7ost.noarch openvswitch2.10-ovn-central-2.10.0-0.20180810git58a7ce6.el7fdp.x86_64 rhosp-openvswitch-ovn-central-2.10-0.1.el7ost.noarch openvswitch2.10-ovn-common-2.10.0-0.20180810git58a7ce6.el7fdp.x86_64 (undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep tripl python-tripleoclient-heat-installer-10.5.1-0.20180906012842.el7ost.noarch ansible-tripleo-ipsec-9.0.1-0.20180827143021.d2b9234.el7ost.noarch ansible-role-tripleo-modify-image-1.0.1-0.20180915144057.cb535e9.el7ost.noarch openstack-tripleo-heat-templates-9.0.0-0.20180919080941.0rc1.0rc1.el7ost.noarch openstack-tripleo-puppet-elements-9.0.0-0.20180906013709.daf9069.el7ost.noarch openstack-tripleo-validations-9.3.1-0.20180831205306.el7ost.noarch openstack-tripleo-common-containers-9.3.1-0.20180923215325.d22cb3e.el7ost.noarch openstack-tripleo-image-elements-9.0.0-0.20180831210308.2dc678a.el7ost.noarch openstack-tripleo-common-9.3.1-0.20180923215325.d22cb3e.el7ost.noarch python-tripleoclient-10.5.1-0.20180906012842.el7ost.noarch python2-tripleo-common-9.3.1-0.20180923215325.d22cb3e.el7ost.noarch puppet-tripleo-9.3.1-0.20180831202649.8ec6c86.el7ost.noarch [root@controller-0 ~]# docker ps | grep ovn 4f716dacbce1 192.168.24.1:8787/rhosp14/openstack-ovn-northd:pcmklatest "/bin/bash /usr/lo..." 27 minutes ago Up 27 minutes ovn-dbs-bundle-docker-0 d2d80f7c2730 192.168.24.1:8787/rhosp14/openstack-ovn-controller:2018-10-01.1 "kolla_start" 3 days ago Up 3 days ovn_controller 2f6351ac2b48 192.168.24.1:8787/rhosp14/openstack-neutron-server-ovn:2018-10-01.1 "kolla_start" 3 days ago Up 3 days (healthy) neutron_api 6ffc3f666301 192.168.24.1:8787/rhosp14/openstack-nova-novncproxy:2018-10-01.1 How reproducible: 100% Steps to Reproduce: 1 pcs status 2 docker ps | grep ovn 3 docker restart ovn-dbs-bundle-docker-0 4 pcs status 5 clear 6 pcs status 7 vi /var/log/containers/openvswitch/ovn-controller.log 8 vi /var/log/containers/openvswitch/ovn-northd.log.1 9 vi /var/log/containers/openvswitch/ovsdb-server-nb.log 10 vi /var/log/containers/openvswitch/ovsdb-server-sb.log 11 history 12 vi /var/log/containers/neutron/server.log 13 pcs status Actual results: ovn-dbs-bundle-0 is in Stopped status. Expected results: ovn-dbs-bundle-0 is in Slave status. Additional info: Logs & sos-report attached