Bug 1365153 - Broken network connectivity to instances launched post upgrade due to MTU mismatch
Summary: Broken network connectivity to instances launched post upgrade due to MTU mis...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ga
: 9.0 (Mitaka)
Assignee: Brent Eagles
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On: 1365617 1365622 1365678 1365949
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-08 13:38 UTC by Marius Cornea
Modified: 2016-08-24 13:31 UTC (History)
10 users (show)

Fixed In Version: openstack-tripleo-heat-templates-2.0.0-32.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-24 13:31:21 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
OpenStack gerrit 333333 None None None 2016-08-09 16:59:03 UTC
Red Hat Bugzilla 1365617 None None None Never
Red Hat Bugzilla 1365622 None None None Never
Red Hat Bugzilla 1365678 None None None Never
Red Hat Product Errata RHBA-2016:1766 normal SHIPPED_LIVE Red Hat OpenStack Platform 9 director 0day Advisory 2016-08-24 17:29:37 UTC

Internal Links: 1365617 1365622 1365678

Description Marius Cornea 2016-08-08 13:38:10 UTC
Description of problem:

When trying to SSH to an instance launched on an upgraded OSP8->OSP9 overcloud the SSH connection fails:

[stack@undercloud ~]$ ssh fedora@172.16.18.143 -v
OpenSSH_6.6.1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 56: Applying options for *
debug1: Connecting to 172.16.18.143 [172.16.18.143] port 22.
debug1: Connection established.
debug1: identity file /home/stack/.ssh/id_rsa type 1
debug1: identity file /home/stack/.ssh/id_rsa-cert type -1
debug1: identity file /home/stack/.ssh/id_dsa type -1
debug1: identity file /home/stack/.ssh/id_dsa-cert type -1
debug1: identity file /home/stack/.ssh/id_ecdsa type -1
debug1: identity file /home/stack/.ssh/id_ecdsa-cert type -1
debug1: identity file /home/stack/.ssh/id_ed25519 type -1
debug1: identity file /home/stack/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.6.1
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.8
debug1: match: OpenSSH_6.8 pat OpenSSH* compat 0x04000000
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr hmac-sha1-etm@openssh.com none
debug1: kex: client->server aes128-ctr hmac-sha1-etm@openssh.com none
debug1: kex: curve25519-sha256@libssh.org need=20 dh_need=20
debug1: kex: curve25519-sha256@libssh.org need=20 dh_need=20
debug1: sending SSH2_MSG_KEX_ECDH_INIT
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
Connection closed by 172.16.18.143

It appears that the MTU for the ports created on the compute node for the new instance is of 1350B, while inside the instance the interface gets the MTU set to 1400B:

[root@overcloud-compute-0 heat-admin]# brctl show
bridge name	bridge id		STP enabled	interfaces
qbr1a01fd91-90		8000.c65abb2b3e52	no		qvb1a01fd91-90
							tap1a01fd91-90
qbrfc90dc3f-fe		8000.beb702a152fd	no		qvbfc90dc3f-fe
							tapfc90dc3f-fe

[root@overcloud-compute-0 heat-admin]# ip a s dev qbrfc90dc3f-fe
22: qbrfc90dc3f-fe: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1350 qdisc noqueue state UP 
    link/ether be:b7:02:a1:52:fd brd ff:ff:ff:ff:ff:ff

[root@overcloud-compute-0 heat-admin]# ip a s dev tapfc90dc3f-fe
25: tapfc90dc3f-fe: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1350 qdisc pfifo_fast master qbrfc90dc3f-fe state UNKNOWN qlen 500
    link/ether fe:16:3e:95:18:d4 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc16:3eff:fe95:18d4/64 scope link 
       valid_lft forever preferred_lft forever

[root@overcloud-compute-0 heat-admin]# ip a s dev qvbfc90dc3f-fe
24: qvbfc90dc3f-fe@qvofc90dc3f-fe: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1350 qdisc pfifo_fast master qbrfc90dc3f-fe state UP qlen 1000
    link/ether be:b7:02:a1:52:fd brd ff:ff:ff:ff:ff:ff
    inet6 fe80::bcb7:2ff:fea1:52fd/64 scope link 
       valid_lft forever preferred_lft forever

Inside the instance I can see that the MTU of eth0 is set to 1400. This is the network that the instance is connected to:

[stack@undercloud ~]$ neutron net-show stack-76-tenant_net_ext_tagged-q3pqucnr3qnn-private_network-bdp7opz6hmht
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 172.16.18.25 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 172.16.18.25 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
+---------------------------+--------------------------------------------------------------------------+
| Field                     | Value                                                                    |
+---------------------------+--------------------------------------------------------------------------+
| admin_state_up            | True                                                                     |
| availability_zone_hints   |                                                                          |
| availability_zones        | nova                                                                     |
| created_at                | 2016-08-08T12:40:12                                                      |
| description               |                                                                          |
| id                        | 0161eb0f-8eba-4fff-b7a9-6ac6bc2244fd                                     |
| ipv4_address_scope        |                                                                          |
| ipv6_address_scope        |                                                                          |
| mtu                       | 1350                                                                     |
| name                      | stack-76-tenant_net_ext_tagged-q3pqucnr3qnn-private_network-bdp7opz6hmht |
| port_security_enabled     | True                                                                     |
| provider:network_type     | vxlan                                                                    |
| provider:physical_network |                                                                          |
| provider:segmentation_id  | 96                                                                       |
| qos_policy_id             |                                                                          |
| router:external           | False                                                                    |
| shared                    | False                                                                    |
| status                    | ACTIVE                                                                   |
| subnets                   | e77997c5-1e91-4cf8-a846-53cc8ca88bb3                                     |
| tags                      |                                                                          |
| tenant_id                 | 2c85f77a58f34fdc91cdb3d90ce5b3b0                                         |
| updated_at                | 2016-08-08T12:40:12                                                      |
+---------------------------+--------------------------------------------------------------------------+
[stack@undercloud ~]$ neutron subnet-show e77997c5-1e91-4cf8-a846-53cc8ca88bb3
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 172.16.18.25 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 172.16.18.25 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
+-------------------+-------------------------------------------------------------------------+
| Field             | Value                                                                   |
+-------------------+-------------------------------------------------------------------------+
| allocation_pools  | {"start": "10.10.10.2", "end": "10.10.10.254"}                          |
| cidr              | 10.10.10.0/24                                                           |
| created_at        | 2016-08-08T12:40:15                                                     |
| description       |                                                                         |
| dns_nameservers   | 8.8.8.8                                                                 |
|                   | 8.8.4.4                                                                 |
| enable_dhcp       | True                                                                    |
| gateway_ip        | 10.10.10.1                                                              |
| host_routes       |                                                                         |
| id                | e77997c5-1e91-4cf8-a846-53cc8ca88bb3                                    |
| ip_version        | 4                                                                       |
| ipv6_address_mode |                                                                         |
| ipv6_ra_mode      |                                                                         |
| name              | stack-76-tenant_net_ext_tagged-q3pqucnr3qnn-private_subnet-cuzos5vunata |
| network_id        | 0161eb0f-8eba-4fff-b7a9-6ac6bc2244fd                                    |
| subnetpool_id     |                                                                         |
| tenant_id         | 2c85f77a58f34fdc91cdb3d90ce5b3b0                                        |
| updated_at        | 2016-08-08T12:40:15                                                     |
+-------------------+-------------------------------------------------------------------------+

I'm trying to reach the instance via a floating IP. Note that instances created before upgrade got the ports created with 1400B MTU so it appears that the network mtu got changed from 1400 to 1350 during the upgrade. 

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-liberty-2.0.0-30.el7ost.noarch
openstack-tripleo-heat-templates-2.0.0-30.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.14-16.el7ost.noarch


How reproducible:
100% 

Steps to Reproduce:
1. Deploy OSP8 overcloud 
2. Upgrade overcloud to OSP9
3. Create network, router, floting ip on an external network
4. Launch instance 

Actual results:
Unable to SSH to the instance via the floating IP. It looks like there's a mismatch between the MTU set inside the instance (1400B) and the MTU for the devices on the compute nodes,set to 1350B.

Expected results:
I'm able to SSH to instances created in the same way as OSP8. 

Additional info:

Comment 2 Jay Dobies 2016-08-08 13:46:33 UTC
Potentially related to https://review.openstack.org/#/c/333333/  (I'll add it as an actual tracker when I get confirmation)

Comment 8 Scott Lewis 2016-08-09 14:12:55 UTC
Nir,
Can you review if this is indeed a RC blocker, or can it wait for a fix for GA (0day) or a later maintenance release. 

Thanks,
Scott

Comment 11 Thierry Vignaud 2016-08-12 15:08:39 UTC
Clearing flags now that build is done

Comment 13 Assaf Muller 2016-08-15 19:32:27 UTC
We have a dependency on 4 other RHBZs, as of right now 3 are ready, 1 is pending on backports. Once all 4 are ON_QA, this bug can be verified.

Comment 16 errata-xmlrpc 2016-08-24 13:31:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1766.html


Note You need to log in before you can comment on or make changes to this bug.