Bug 1398013

Summary:	[Backwards Compatibility] UC10-OC9 deploy completes successfully, but the controllers are unreachable post-deployment and the entire setup is non-op
Product:	Red Hat OpenStack	Reporter:	Dan Yasny <dyasny>
Component:	rhosp-director	Assignee:	Angus Thomas <athomas>
Status:	CLOSED DUPLICATE	QA Contact:	Dan Yasny <dyasny>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	9.0 (Mitaka)	CC:	apetrich, beagles, ccamacho, dbecker, dyasny, jcoufal, jschluet, jslagle, mandreou, mburns, morazi, nyechiel, ohochman, rhel-osp-director-maint, sasha
Target Milestone:	ga	Keywords:	ZStream
Target Release:	9.0 (Mitaka)
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-11-30 23:59:12 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dan Yasny 2016-11-23 21:09:26 UTC

Description of problem:
  
 Deployment command: openstack overcloud deploy --templates /home/stack/tht --control-scale 3 --compute-scale 1   --neutron-network-type vxlan --neutron-tunnel-types vxlan  --ntp-server clock.redhat.com --timeout 90 -e /home/stack/tht/environments/puppet-pacemaker.yaml -e /home/stack/tht/environments/storage-environment.yaml -e /home/stack/tht/environments/network-isolation.yaml -e network-environment.yaml --ceph-storage-scale 1

after installation, undercloud is operational, overcloud is not. all nodes except controllers are reachable:

[stack@instack ~]$ . stackrc 
[stack@instack ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks            |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| 63ca7ebe-2157-4d25-8f53-209f9f446ac0 | overcloud-cephstorage-0 | ACTIVE | -          | Running     | ctlplane=192.0.2.16 |
| 9570db77-bb36-41c0-9809-f892b36dda97 | overcloud-compute-0     | ACTIVE | -          | Running     | ctlplane=192.0.2.11 |
| 35457105-62ec-4042-a389-41ac56ffc2f3 | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=192.0.2.10 |
| e0cd1863-6c11-4f38-9b04-bf6ef716a4d4 | overcloud-controller-1  | ACTIVE | -          | Running     | ctlplane=192.0.2.17 |
| 3788d1f1-9e65-48cc-9409-85adcd4a25b4 | overcloud-controller-2  | ACTIVE | -          | Running     | ctlplane=192.0.2.6  |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
[stack@instack ~]$ ping 192.0.2.16
PING 192.0.2.16 (192.0.2.16) 56(84) bytes of data.
64 bytes from 192.0.2.16: icmp_seq=1 ttl=64 time=0.255 ms
64 bytes from 192.0.2.16: icmp_seq=2 ttl=64 time=0.240 ms
^C
--- 192.0.2.16 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.240/0.247/0.255/0.017 ms
[stack@instack ~]$ ping 192.0.2.11
PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data.
64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.572 ms
64 bytes from 192.0.2.11: icmp_seq=2 ttl=64 time=0.217 ms
^C
--- 192.0.2.11 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.217/0.394/0.572/0.178 ms
[stack@instack ~]$ ping 192.0.2.6
PING 192.0.2.6 (192.0.2.6) 56(84) bytes of data.
^C
--- 192.0.2.6 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 1999ms


Tried to restart the entire setup, controllers and undercloud - same error. 



Version-Release number of selected component (if applicable):

Undercloud (OSP10)
openstack-mistral-executor-3.0.2-8.el7ost.noarch
openstack-swift-container-2.10.0-5.el7ost.noarch
openstack-tripleo-ui-1.0.5-1.el7ost.noarch
python-openstacksdk-0.9.5-1.el7ost.noarch
openstack-tripleo-puppet-elements-5.1.0-2.el7ost.noarch
openstack-heat-templates-0-0.8.1e6015dgit.el7ost.noarch
openstack-ironic-inspector-4.2.0-3.el7ost.noarch
openstack-neutron-common-9.1.0-5.el7ost.noarch
openstack-swift-proxy-2.10.0-5.el7ost.noarch
openstack-ceilometer-central-7.0.0-2.5.el7ost.noarch
openstack-tripleo-0.0.8-0.2.4de13b3git.el7ost.noarch
openstack-glance-13.0.0-1.el7ost.noarch
openstack-neutron-ml2-9.1.0-5.el7ost.noarch
openstack-tripleo-heat-templates-5.1.0-3.el7ost.noarch
openstack-neutron-9.1.0-5.el7ost.noarch
openstack-ceilometer-notification-7.0.0-2.5.el7ost.noarch
openstack-ceilometer-collector-7.0.0-2.5.el7ost.noarch
openstack-tripleo-image-elements-5.1.0-1.el7ost.noarch
openstack-mistral-engine-3.0.2-8.el7ost.noarch
puppet-openstack_extras-9.4.0-1.el7ost.noarch
openstack-puppet-modules-9.3.0-1.el7ost.noarch
python-openstack-mistral-3.0.2-8.el7ost.noarch
openstack-heat-api-cfn-7.0.0-7.el7ost.noarch
openstack-keystone-10.0.0-3.el7ost.noarch
openstack-tripleo-heat-templates-compat-2.0.0-34.4.el7ost.noarch
openstack-tripleo-validations-5.1.0-5.el7ost.noarch
openstack-aodh-listener-3.0.1-4.el7ost.noarch
openstack-ceilometer-api-7.0.0-2.5.el7ost.noarch
openstack-nova-conductor-14.0.2-6.el7ost.noarch
openstack-aodh-notifier-3.0.1-4.el7ost.noarch
puppet-openstacklib-9.4.0-3.el7ost.noarch
openstack-selinux-0.7.12-1.el7ost.noarch
openstack-aodh-evaluator-3.0.1-4.el7ost.noarch
openstack-heat-engine-7.0.0-7.el7ost.noarch
openstack-swift-object-2.10.0-5.el7ost.noarch
openstack-nova-cert-14.0.2-6.el7ost.noarch
openstack-tempest-13.0.0-5.bafe630git.el7ost.noarch
openstack-ironic-conductor-6.2.1-5.el7ost.noarch
openstack-neutron-openvswitch-9.1.0-5.el7ost.noarch
openstack-heat-common-7.0.0-7.el7ost.noarch
openstack-nova-api-14.0.2-6.el7ost.noarch
openstack-ceilometer-polling-7.0.0-2.5.el7ost.noarch
openstack-utils-2016.1-1.el7ost.noarch
python-openstackclient-3.2.0-2.el7ost.noarch
openstack-tripleo-common-5.4.0-2.el7ost.noarch
openstack-nova-common-14.0.2-6.el7ost.noarch
openstack-nova-compute-14.0.2-6.el7ost.noarch
openstack-mistral-api-3.0.2-8.el7ost.noarch
openstack-aodh-common-3.0.1-4.el7ost.noarch
openstack-aodh-api-3.0.1-4.el7ost.noarch
openstack-heat-api-7.0.0-7.el7ost.noarch
openstack-nova-scheduler-14.0.2-6.el7ost.noarch
openstack-ceilometer-common-7.0.0-2.5.el7ost.noarch
openstack-swift-account-2.10.0-5.el7ost.noarch
openstack-mistral-common-3.0.2-8.el7ost.noarch
openstack-ironic-common-6.2.1-5.el7ost.noarch
openstack-zaqar-3.0.0-3.el7ost.noarch
openstack-ironic-api-6.2.1-5.el7ost.noarch

Overcloud (taken on compute node, OSP9)
openstack-nova-novncproxy-13.1.1-7.el7ost.noarch
openstack-sahara-ui-4.0.0-3.el7ost.noarch
openstack-trove-api-5.0.1-1.el7ost.noarch
openstack-aodh-notifier-2.0.5-1.el7ost.noarch
openstack-sahara-engine-4.0.1-2.el7ost.noarch
openstack-neutron-bigswitch-lldp-2015.3.8-1.el7ost.noarch
openstack-gnocchi-indexer-sqlalchemy-2.1.3-3.el7ost.noarch
openstack-gnocchi-statsd-2.1.3-3.el7ost.noarch
openstack-heat-api-cloudwatch-6.0.0-11.el7ost.noarch
openstack-ceilometer-central-6.1.3-2.el7ost.noarch
python-openstacksdk-0.8.3-1.el7ost.noarch
openstack-neutron-ml2-8.1.2-5.el7ost.noarch
openstack-heat-common-6.0.0-11.el7ost.noarch
openstack-swift-2.7.0-2.el7ost.noarch
openstack-neutron-common-8.1.2-5.el7ost.noarch
openstack-utils-2015.2-1.el7ost.noarch
openstack-neutron-lbaas-8.0.0-1.el7ost.noarch
openstack-swift-object-2.7.0-2.el7ost.noarch
openstack-ceilometer-notification-6.1.3-2.el7ost.noarch
openstack-nova-api-13.1.1-7.el7ost.noarch
openstack-heat-api-6.0.0-11.el7ost.noarch
openstack-neutron-openvswitch-8.1.2-5.el7ost.noarch
openstack-ceilometer-collector-6.1.3-2.el7ost.noarch
openstack-sahara-4.0.1-2.el7ost.noarch
openstack-manila-share-2.0.0-6.el7ost.noarch
openstack-aodh-api-2.0.5-1.el7ost.noarch
openstack-neutron-metering-agent-8.1.2-5.el7ost.noarch
python-openstackclient-2.2.0-1.el7ost.noarch
openstack-dashboard-9.0.1-2.el7ost.noarch
openstack-swift-plugin-swift3-1.10-1.el7ost.noarch
openstack-gnocchi-api-2.1.3-3.el7ost.noarch
openstack-cinder-8.1.1-2.el7ost.noarch
openstack-keystone-9.0.2-1.el7ost.noarch
openstack-ceilometer-common-6.1.3-2.el7ost.noarch
openstack-neutron-bigswitch-agent-2015.3.8-1.el7ost.noarch
openstack-nova-console-13.1.1-7.el7ost.noarch
openstack-nova-conductor-13.1.1-7.el7ost.noarch
openstack-heat-engine-6.0.0-11.el7ost.noarch
openstack-ceilometer-api-6.1.3-2.el7ost.noarch
openstack-manila-2.0.0-6.el7ost.noarch
openstack-trove-conductor-5.0.1-1.el7ost.noarch
openstack-aodh-listener-2.0.5-1.el7ost.noarch
openstack-sahara-common-4.0.1-2.el7ost.noarch
openstack-nova-common-13.1.1-7.el7ost.noarch
openstack-swift-container-2.7.0-2.el7ost.noarch
openstack-gnocchi-common-2.1.3-3.el7ost.noarch
openstack-gnocchi-metricd-2.1.3-3.el7ost.noarch
openstack-swift-proxy-2.7.0-2.el7ost.noarch
openstack-nova-cert-13.1.1-7.el7ost.noarch
openstack-heat-api-cfn-6.0.0-11.el7ost.noarch
openstack-glance-12.0.0-1.el7ost.noarch
openstack-ceilometer-compute-6.1.3-2.el7ost.noarch
openstack-aodh-common-2.0.5-1.el7ost.noarch
openstack-ceilometer-polling-6.1.3-2.el7ost.noarch
openstack-neutron-8.1.2-5.el7ost.noarch
openstack-swift-account-2.7.0-2.el7ost.noarch
openstack-nova-scheduler-13.1.1-7.el7ost.noarch
openstack-trove-taskmanager-5.0.1-1.el7ost.noarch
openstack-aodh-evaluator-2.0.5-1.el7ost.noarch
python-django-openstack-auth-2.2.0-1.el7ost.noarch
openstack-selinux-0.7.11-1.el7ost.noarch
openstack-puppet-modules-8.1.8-3.el7ost.noarch
openstack-trove-common-5.0.1-1.el7ost.noarch
openstack-sahara-api-4.0.1-2.el7ost.noarch
openstack-dashboard-theme-9.0.1-2.el7ost.noarch
openstack-gnocchi-carbonara-2.1.3-3.el7ost.noarch
openstack-nova-compute-13.1.1-7.el7ost.noarch


How reproducible:
at least 8 times out of 8 attempts, with/without SSL, IP4/IP6

Steps to Reproduce:
1. deploy a new backwards compatible setup
2.
3.

Actual results:
as described above

Expected results:
everything should be operational

Additional info:
a setup is available for troubleshooting

Comment 3 Adriano Petrich 2016-11-24 15:44:43 UTC

I'm looking at this also.

Comment 4 Alexander Chuzhoy 2016-11-24 16:13:55 UTC

Happens to me on clean OSP9.0 deployment:
Environment:
openstack-puppet-modules-8.1.8-3.el7ost.noarch
instack-undercloud-4.0.0-15.el7ost.noarch
openstack-tripleo-heat-templates-liberty-2.0.0-40.el7ost.noarch
openstack-tripleo-heat-templates-2.0.0-40.el7ost.noarch


The controllers became unreachable after a reboot.

Comment 5 Dan Yasny 2016-11-24 16:17:23 UTC

(In reply to Alexander Chuzhoy from comment #4)
> Happens to me on clean OSP9.0 deployment:
> Environment:
> openstack-puppet-modules-8.1.8-3.el7ost.noarch
> instack-undercloud-4.0.0-15.el7ost.noarch
> openstack-tripleo-heat-templates-liberty-2.0.0-40.el7ost.noarch
> openstack-tripleo-heat-templates-2.0.0-40.el7ost.noarch
> 
> 
> The controllers became unreachable after a reboot.

Confirmed in my env - this is reproducing when I reboot the nodes

Comment 6 Marios Andreou 2016-11-24 16:50:48 UTC

@jarda as discussed just now - reminder that you want to move to DFG:DF (happens on OSP9 deployment?) I believe some combination of dan/sasha will provide the env too

Comment 7 Dan Yasny 2016-11-24 18:49:11 UTC

Additional findings:
1. As was recommended, I tried to restart the network service on the controllers. That made the controllers reachable, but trying to contact the overcloud endpoints produced a 503 error
2. Rebooted the controllers again - they are unreachable again

On a side note, issuing the "reboot" command on the controllers made them hang (probably some service or process holding everything back), had to do a powercycle instead.

Comment 9 Alexander Chuzhoy 2016-11-25 00:31:09 UTC

      
[root@overcloud-controller-0 ~]# ovs-vsctl show
150c5a41-48b6-4e54-8fbf-58874b11a578           
    Bridge br-ex                               
        Port "eth0"                            
            Interface "eth0"                   
        Port br-ex                             
            Interface br-ex                    
                type: internal                 
    Bridge br-int                              
        fail_mode: secure                      
        Port "tapd48b845e-02"                  
            tag: 2                             
            Interface "tapd48b845e-02"         
                type: internal                 
        Port "tapadbb4a12-31"                  
            tag: 3                             
            Interface "tapadbb4a12-31"         
                type: internal                 
        Port "ha-fb411560-ee"                  
            tag: 4                             
            Interface "ha-fb411560-ee"         
                type: internal                 
        Port int-br-ex                         
            Interface int-br-ex                
                type: patch                    
                options: {peer=phy-br-ex}      
        Port "qg-8f932a91-18"                  
            tag: 5                             
            Interface "qg-8f932a91-18"         
                type: internal                 
        Port "tap3ee36087-11"                  
            tag: 1                             
            Interface "tap3ee36087-11"         
                type: internal                 
        Port patch-tun                         
            Interface patch-tun                
                type: patch                    
                options: {peer=patch-int}      
        Port "qr-bd5fdd91-cb"                  
            tag: 3                             
            Interface "qr-bd5fdd91-cb"         
                type: internal                 
        Port br-int                            
            Interface br-int
                type: internal
        Port "qr-ea15e493-86"
            tag: 1
            Interface "qr-ea15e493-86"
                type: internal
        Port "qr-582362df-cb"
            tag: 2
            Interface "qr-582362df-cb"
                type: internal
    Bridge br-tun
        fail_mode: secure
        Port "gre-c0a8960d"
            Interface "gre-c0a8960d"
                type: gre
                options: {df_default="true", in_key=flow, local_ip="192.168.150.12", out_key=flow, remote_ip="192.168.150.13"}
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port "vxlan-c0a8960d"
            Interface "vxlan-c0a8960d"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="192.168.150.12", out_key=flow, remote_ip="192.168.150.13"}
        Port "vxlan-c0a8960a"
            Interface "vxlan-c0a8960a"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="192.168.150.12", out_key=flow, remote_ip="192.168.150.10"}
        Port "vxlan-c0a8960b"
            Interface "vxlan-c0a8960b"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="192.168.150.12", out_key=flow, remote_ip="192.168.150.11"}
        Port "gre-c0a8960a"
            Interface "gre-c0a8960a"
                type: gre
                options: {df_default="true", in_key=flow, local_ip="192.168.150.12", out_key=flow, remote_ip="192.168.150.10"}
        Port br-tun
            Interface br-tun
                type: internal
        Port "gre-c0a8960b"
            Interface "gre-c0a8960b"
                type: gre
                options: {df_default="true", in_key=flow, local_ip="192.168.150.12", out_key=flow, remote_ip="192.168.150.11"}
    ovs_version: "2.5.0"




[root@overcloud-controller-0 ~]# ping 192.0.2.1 -c1                  
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.                     

--- 192.0.2.1 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms



[root@overcloud-controller-0 ~]# ifdown br-ex; ifup br-ex; ifup eth0




[root@overcloud-controller-0 ~]# ping 192.0.2.1 -c1
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.   
64 bytes from 192.0.2.1: icmp_seq=1 ttl=64 time=2.38 ms

--- 192.0.2.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.388/2.388/2.388/0.000 ms

Comment 10 Alexander Chuzhoy 2016-11-25 00:43:38 UTC

Restarting network service works, but once the nodes are rebooted the issue is back.

But if one runs "ifdown br-ex; ifup br-ex; ifup eth0"  then the nodes are reachable after reboot.


This reminds me of:
https://bugzilla.redhat.com/show_bug.cgi?id=1394890

Is this bug a duplicate of ^

Comment 11 Alexander Chuzhoy 2016-11-25 01:43:08 UTC

Environment 
openstack-neutron-common-8.1.2-12.el7ost.noarch
openstack-neutron-ml2-8.1.2-12.el7ost.noarch
openstack-neutron-openvswitch-8.1.2-12.el7ost.noarch
openstack-neutron-8.1.2-12.el7ost.noarch

Comment 12 Adriano Petrich 2016-11-25 08:00:35 UTC

#10 workaround seems to work for me also

Comment 13 James Slagle 2016-11-28 13:32:20 UTC

(In reply to Alexander Chuzhoy from comment #10)
> Restarting network service works, but once the nodes are rebooted the issue
> is back.
> 
> But if one runs "ifdown br-ex; ifup br-ex; ifup eth0"  then the nodes are
> reachable after reboot.
> 
> 
> This reminds me of:
> https://bugzilla.redhat.com/show_bug.cgi?id=1394890
> 
> Is this bug a duplicate of ^

It appears to be. Setting to DFG:Networking.

Comment 22 Dan Yasny 2016-11-30 19:16:25 UTC

Issue gone with the current OSP9 images in place

setting to verified

Comment 23 Jaromir Coufal 2016-11-30 23:59:12 UTC


*** This bug has been marked as a duplicate of bug 1394890 ***