Bug 1684419
| Summary: | [ovn_cluster][RHEL 7] master node can't be up after restart openvswitch | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | haidong li <haili> | |
| Component: | OVN | Assignee: | Numan Siddique <nusiddiq> | |
| Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | FDP 19.G | CC: | ctrautma, dceara, ekuris, fhallal, jhsiao, jiji, jishi, kfida, mmichels, nusiddiq, qding, ralongi | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | ovn2.12-2.12.0-2 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1787971 (view as bug list) | Environment: | ||
| Last Closed: | 2019-12-11 12:04:46 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1787971 | |||
|
Description
haidong li
2019-03-01 08:49:45 UTC
*** Bug 1723291 has been marked as a duplicate of this bug. *** The main issue is that when you restart openvswitch, the ovs run time folders - /var/run/openvswitch is deleted and recreated again by the openvswitch systemd script. Since the OVN ovsdb-servers (and ovn-controller) also use the same runtime directory, all the OVN ovsdb-servers' run time socket files are also deleted. After which the OVN ocf script can't stop or monitor the status of the ovsdb-servers. If you do "ps -aef | grep ovsdb-servers" you will see that the old ovsdb-servers will be still running. Killing those processes manually and then refreshing the pacemaker resource recovers it. I think this is expected and known issue. The proper fix to it is to use a separate runtime directory for OVN. *** Bug 1566412 has been marked as a duplicate of this bug. *** This issue is blocked by bug: https://bugzilla.redhat.com/show_bug.cgi?id=1769202 Verified on ovn2.12.0-7:
[root@dell-per740-12 ovn_ha]# pcs status
Cluster name: my_cluster
WARNINGS:
Corosync and pacemaker node names do not match (IPs used in setup?)
Stack: corosync
Current DC: dell-per740-12.rhts.eng.pek2.redhat.com (version 1.1.20-5.el7-3c4c782f70) - partition with quorum
Last updated: Sat Nov 9 01:20:42 2019
Last change: Sat Nov 9 01:16:11 2019 by root via crm_attribute on dell-per740-12.rhts.eng.pek2.redhat.com
3 nodes configured
4 resources configured
Online: [ dell-per740-12.rhts.eng.pek2.redhat.com hp-dl380pg8-11.rhts.eng.pek2.redhat.com ibm-x3650m5-03.rhts.eng.pek2.redhat.com ]
Full list of resources:
ip-70.11.0.50 (ocf::heartbeat:IPaddr2): Started dell-per740-12.rhts.eng.pek2.redhat.com
Master/Slave Set: ovndb_servers-master [ovndb_servers]
Masters: [ dell-per740-12.rhts.eng.pek2.redhat.com ]
Slaves: [ hp-dl380pg8-11.rhts.eng.pek2.redhat.com ibm-x3650m5-03.rhts.eng.pek2.redhat.com ]
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@dell-per740-12 ovn_ha]# hostname
dell-per740-12.rhts.eng.pek2.redhat.com
[root@dell-per740-12 ovn_ha]# systemctl restart openvswitch
<==== restart openvswitch
[root@dell-per740-12 ovn_ha]# pcs status
Cluster name: my_cluster
WARNINGS:
Corosync and pacemaker node names do not match (IPs used in setup?)
Stack: corosync
Current DC: dell-per740-12.rhts.eng.pek2.redhat.com (version 1.1.20-5.el7-3c4c782f70) - partition with quorum
Last updated: Sat Nov 9 01:21:07 2019
Last change: Sat Nov 9 01:16:11 2019 by root via crm_attribute on dell-per740-12.rhts.eng.pek2.redhat.com
3 nodes configured
4 resources configured
Online: [ dell-per740-12.rhts.eng.pek2.redhat.com hp-dl380pg8-11.rhts.eng.pek2.redhat.com ibm-x3650m5-03.rhts.eng.pek2.redhat.com ]
Full list of resources:
ip-70.11.0.50 (ocf::heartbeat:IPaddr2): Started dell-per740-12.rhts.eng.pek2.redhat.com
Master/Slave Set: ovndb_servers-master [ovndb_servers]
Masters: [ dell-per740-12.rhts.eng.pek2.redhat.com ]
Slaves: [ hp-dl380pg8-11.rhts.eng.pek2.redhat.com ibm-x3650m5-03.rhts.eng.pek2.redhat.com ]
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
<==== pcs is still up
[root@dell-per740-12 ovn_ha]# rpm -qa | grep -E "openvswitch|ovn"
openvswitch2.12-2.12.0-4.el7fdp.x86_64
ovn2.12-host-2.12.0-7.el7fdp.x86_64
kernel-kernel-networking-openvswitch-ovn_ha-1.0-43.noarch
ovn2.12-central-2.12.0-7.el7fdp.x86_64
ovn2.12-2.12.0-7.el7fdp.x86_64
openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:4208 |