| Summary: | [Backwards Compatibility] UC10-OC9 deploy completes successfully, but the controllers are unreachable post-deployment and the entire setup is non-op | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Dan Yasny <dyasny> |
| Component: | rhosp-director | Assignee: | Angus Thomas <athomas> |
| Status: | CLOSED DUPLICATE | QA Contact: | Dan Yasny <dyasny> |
| Severity: | urgent | Docs Contact: | |
| Priority: | high | ||
| Version: | 9.0 (Mitaka) | CC: | apetrich, beagles, ccamacho, dbecker, dyasny, jcoufal, jschluet, jslagle, mandreou, mburns, morazi, nyechiel, ohochman, rhel-osp-director-maint, sasha |
| Target Milestone: | ga | Keywords: | ZStream |
| Target Release: | 9.0 (Mitaka) | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-11-30 23:59:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Dan Yasny
2016-11-23 21:09:26 UTC
I'm looking at this also. Happens to me on clean OSP9.0 deployment: Environment: openstack-puppet-modules-8.1.8-3.el7ost.noarch instack-undercloud-4.0.0-15.el7ost.noarch openstack-tripleo-heat-templates-liberty-2.0.0-40.el7ost.noarch openstack-tripleo-heat-templates-2.0.0-40.el7ost.noarch The controllers became unreachable after a reboot. (In reply to Alexander Chuzhoy from comment #4) > Happens to me on clean OSP9.0 deployment: > Environment: > openstack-puppet-modules-8.1.8-3.el7ost.noarch > instack-undercloud-4.0.0-15.el7ost.noarch > openstack-tripleo-heat-templates-liberty-2.0.0-40.el7ost.noarch > openstack-tripleo-heat-templates-2.0.0-40.el7ost.noarch > > > The controllers became unreachable after a reboot. Confirmed in my env - this is reproducing when I reboot the nodes @jarda as discussed just now - reminder that you want to move to DFG:DF (happens on OSP9 deployment?) I believe some combination of dan/sasha will provide the env too Additional findings: 1. As was recommended, I tried to restart the network service on the controllers. That made the controllers reachable, but trying to contact the overcloud endpoints produced a 503 error 2. Rebooted the controllers again - they are unreachable again On a side note, issuing the "reboot" command on the controllers made them hang (probably some service or process holding everything back), had to do a powercycle instead.
[root@overcloud-controller-0 ~]# ovs-vsctl show
150c5a41-48b6-4e54-8fbf-58874b11a578
Bridge br-ex
Port "eth0"
Interface "eth0"
Port br-ex
Interface br-ex
type: internal
Bridge br-int
fail_mode: secure
Port "tapd48b845e-02"
tag: 2
Interface "tapd48b845e-02"
type: internal
Port "tapadbb4a12-31"
tag: 3
Interface "tapadbb4a12-31"
type: internal
Port "ha-fb411560-ee"
tag: 4
Interface "ha-fb411560-ee"
type: internal
Port int-br-ex
Interface int-br-ex
type: patch
options: {peer=phy-br-ex}
Port "qg-8f932a91-18"
tag: 5
Interface "qg-8f932a91-18"
type: internal
Port "tap3ee36087-11"
tag: 1
Interface "tap3ee36087-11"
type: internal
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port "qr-bd5fdd91-cb"
tag: 3
Interface "qr-bd5fdd91-cb"
type: internal
Port br-int
Interface br-int
type: internal
Port "qr-ea15e493-86"
tag: 1
Interface "qr-ea15e493-86"
type: internal
Port "qr-582362df-cb"
tag: 2
Interface "qr-582362df-cb"
type: internal
Bridge br-tun
fail_mode: secure
Port "gre-c0a8960d"
Interface "gre-c0a8960d"
type: gre
options: {df_default="true", in_key=flow, local_ip="192.168.150.12", out_key=flow, remote_ip="192.168.150.13"}
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Port "vxlan-c0a8960d"
Interface "vxlan-c0a8960d"
type: vxlan
options: {df_default="true", in_key=flow, local_ip="192.168.150.12", out_key=flow, remote_ip="192.168.150.13"}
Port "vxlan-c0a8960a"
Interface "vxlan-c0a8960a"
type: vxlan
options: {df_default="true", in_key=flow, local_ip="192.168.150.12", out_key=flow, remote_ip="192.168.150.10"}
Port "vxlan-c0a8960b"
Interface "vxlan-c0a8960b"
type: vxlan
options: {df_default="true", in_key=flow, local_ip="192.168.150.12", out_key=flow, remote_ip="192.168.150.11"}
Port "gre-c0a8960a"
Interface "gre-c0a8960a"
type: gre
options: {df_default="true", in_key=flow, local_ip="192.168.150.12", out_key=flow, remote_ip="192.168.150.10"}
Port br-tun
Interface br-tun
type: internal
Port "gre-c0a8960b"
Interface "gre-c0a8960b"
type: gre
options: {df_default="true", in_key=flow, local_ip="192.168.150.12", out_key=flow, remote_ip="192.168.150.11"}
ovs_version: "2.5.0"
[root@overcloud-controller-0 ~]# ping 192.0.2.1 -c1
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
--- 192.0.2.1 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
[root@overcloud-controller-0 ~]# ifdown br-ex; ifup br-ex; ifup eth0
[root@overcloud-controller-0 ~]# ping 192.0.2.1 -c1
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
64 bytes from 192.0.2.1: icmp_seq=1 ttl=64 time=2.38 ms
--- 192.0.2.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.388/2.388/2.388/0.000 ms
Restarting network service works, but once the nodes are rebooted the issue is back. But if one runs "ifdown br-ex; ifup br-ex; ifup eth0" then the nodes are reachable after reboot. This reminds me of: https://bugzilla.redhat.com/show_bug.cgi?id=1394890 Is this bug a duplicate of ^ Environment openstack-neutron-common-8.1.2-12.el7ost.noarch openstack-neutron-ml2-8.1.2-12.el7ost.noarch openstack-neutron-openvswitch-8.1.2-12.el7ost.noarch openstack-neutron-8.1.2-12.el7ost.noarch #10 workaround seems to work for me also (In reply to Alexander Chuzhoy from comment #10) > Restarting network service works, but once the nodes are rebooted the issue > is back. > > But if one runs "ifdown br-ex; ifup br-ex; ifup eth0" then the nodes are > reachable after reboot. > > > This reminds me of: > https://bugzilla.redhat.com/show_bug.cgi?id=1394890 > > Is this bug a duplicate of ^ It appears to be. Setting to DFG:Networking. Issue gone with the current OSP9 images in place setting to verified *** This bug has been marked as a duplicate of bug 1394890 *** |