Bug 1495224
Summary: | [osp12]Overcloud HA services isn't operable after reboot of overcloud nodes | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Artem Hrechanychenko <ahrechan> |
Component: | openstack-containers | Assignee: | Assaf Muller <amuller> |
Status: | CLOSED DUPLICATE | QA Contact: | Omri Hochman <ohochman> |
Severity: | urgent | Docs Contact: | Andrew Burden <aburden> |
Priority: | urgent | ||
Version: | 12.0 (Pike) | CC: | bhaley, jlibosva, m.andre, michele, rhallise |
Target Milestone: | ga | Keywords: | TestBlocker |
Target Release: | 12.0 (Pike) | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-09-27 09:17:35 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Artem Hrechanychenko
2017-09-25 14:40:29 UTC
So the HA containers are down because the network is busted. You can tell that from the following initial lines in 'pcs status': Online: [ overcloud-controller-0 ] OFFLINE: [ overcloud-controller-1 overcloud-controller-2 ] So node 0 cannot talk to node 1 and 2. [root@overcloud-controller-0 ~]# ping overcloud-controller-1 PING overcloud-controller-1.localdomain (172.17.1.25) 56(84) bytes of data. ^C --- overcloud-controller-1.localdomain ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1000ms [root@overcloud-controller-0 ~]# ping overcloud-controller-2 PING overcloud-controller-2.localdomain (172.17.1.20) 56(84) bytes of data. ^C --- overcloud-controller-2.localdomain ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms Initially I thought this was a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1473763 But I see that you have: [root@overcloud-controller-0 ~]# rpm -q os-net-config os-net-config-7.3.0-0.20170910153345.77fe592.el7ost.noarch And the above BZ (1473763) has fixed-in: os-net-config-7.2.1-0.20170825174722.77fe592.el7ost And the patch from the BZ is in the package (I just checked). So not sure about this one. Maybe it is related to the '[rhos-dev] Overcloud Nodes Network Layout' thread on rhos-dev ? I don't exactly know where to start looking on the test setup, just some things I noticed: 1. overcloud-controller-0 had a full set of iptables rules, overcloud-controller-1 has none 2. overcloud-controller-1 is flooding the vlan20 network with ARP frames, don't know why that is yet I'll keep looking tomorrow, but if I can get more information on how things are configured here it might help. Artem, this is likely a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1490281 The configuration does look just like the other bug looking at ovs-vsctl output, as well as the ARP storm on vlan20. Closing as duplicate as the reproducer steps and symptoms are the same. *** This bug has been marked as a duplicate of bug 1490281 *** |