Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1378052

Summary: Unable to reach instance via floating IP due to MTU mismatch in virtual environment with network isolation
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-neutronAssignee: Assaf Muller <amuller>
Status: CLOSED ERRATA QA Contact: Toni Freger <tfreger>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 10.0 (Newton)CC: amuller, beagles, chrisw, dbecker, ihrachys, jschluet, mburns, mcornea, misalunk, morazi, nyechiel, oblaut, ragiman, rhel-osp-director-maint, sasha, srevivo, tfreger
Target Milestone: betaKeywords: Triaged
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-5.0.0-0.20160907212643.90c852e.2.el7ost, openstack-neutron-9.0.0-0.20160907193737.dc6508a.1.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-14 16:03:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2016-09-21 12:15:14 UTC
Description of problem:
Unable to reach instance via floating IP due to MTU mismatch in virtual environment with network isolation:

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-5.0.0-0.20160907212643.90c852e.1.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud 
2. Create a tenant network, router and an external netowrk
3. Launch instance on the tenant network and attach it a floating ip
4. Try to reach instance via floating IP

Actual results:
Unable to ssh:
[stack@undercloud-0 ~]$ ssh fedora.18.146
Connection closed by 172.16.18.146

Expected results:
Successful SSH connection.  

Additional info:
Here are the created network details:
http://paste.openstack.org/show/582392/

The MTU for the external network is 1500 and the one for the tenant network is 1450.

This is a default virt environment and the deployment uses network isolation. To workaround this issue I had to pass the following parameter when deploying:

NeutronGlobalPhysnetMtu: 1496

which resulted in the external network having a MTU of 1500 and the tenant network an mtu of 1446

Previously this used to work by default, without any adjustments.

Comment 2 Marius Cornea 2016-09-21 12:17:01 UTC
(In reply to Marius Cornea from comment #0)
> NeutronGlobalPhysnetMtu: 1496
> 
> which resulted in the external network having a MTU of 1500 and the tenant
> network an mtu of 1446

Correction here - the external network results with a 1496 MTU

Comment 4 Ihar Hrachyshka 2016-09-22 13:32:37 UTC
External 1500 to internal 1450 should still work, because on L3 boundary (router) fragmentation should happen if needed. The fact that 4 bytes reduction helps to fix the issue suggests that it may be the upstream https://bugs.launchpad.net/neutron/+bug/1622017 that was fixed in RC1: https://review.openstack.org/#/c/368553/

Could you please retest with a newer python-neutron package that would include the patch I mentioned above? Alternatively, you can validate if switching to ovs-ofctl of_interface helps.

Comment 5 Marius Cornea 2016-09-22 14:08:39 UTC
(In reply to Ihar Hrachyshka from comment #4)
> External 1500 to internal 1450 should still work, because on L3 boundary
> (router) fragmentation should happen if needed. The fact that 4 bytes
> reduction helps to fix the issue suggests that it may be the upstream
> https://bugs.launchpad.net/neutron/+bug/1622017 that was fixed in RC1:
> https://review.openstack.org/#/c/368553/
> 
> Could you please retest with a newer python-neutron package that would
> include the patch I mentioned above? Alternatively, you can validate if
> switching to ovs-ofctl of_interface helps.

After applying https://review.openstack.org/#/c/368553/ to the overcloud image I was able to successfully SSH to the instance with the default MTU:

http://paste.openstack.org/show/582585/

Comment 6 Ihar Hrachyshka 2016-09-22 14:10:23 UTC
Comment 5 also suggests that the workaround with using an alternative driver for of_interface would help too. It's up to us how we proceed. I suggest a backport for neutron.

Comment 7 Assaf Muller 2016-09-22 14:21:03 UTC
Already fixed and merged, will be available in OSP 10 puddle based off RC1.

Comment 9 Assaf Muller 2016-09-22 15:44:00 UTC
We ended up backporting the fix so it's available in an RPM based off M3. The cherry pick will not be required once the RPM is rebased to be based off RC1.

Comment 10 Alexander Chuzhoy 2016-09-22 23:11:45 UTC
Verified:
Environment:
openstack-tripleo-heat-templates-5.0.0-0.20160907212643.90c852e.2.el7ost.noarch
openstack-neutron-9.0.0-0.20160907193737.dc6508a.1.el7ost.noarch

Was able to ssh into the launched instance.

Comment 11 Alexander Chuzhoy 2016-09-22 23:22:12 UTC
The MTU of the created VXLAN tenant network is: 1446

Comment 14 errata-xmlrpc 2016-12-14 16:03:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html