Description of problem:
Unable to reach instance via floating IP due to MTU mismatch in virtual environment with network isolation:
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Deploy overcloud
2. Create a tenant network, router and an external netowrk
3. Launch instance on the tenant network and attach it a floating ip
4. Try to reach instance via floating IP
Unable to ssh:
[stack@undercloud-0 ~]$ ssh firstname.lastname@example.org
Connection closed by 172.16.18.146
Successful SSH connection.
Here are the created network details:
The MTU for the external network is 1500 and the one for the tenant network is 1450.
This is a default virt environment and the deployment uses network isolation. To workaround this issue I had to pass the following parameter when deploying:
which resulted in the external network having a MTU of 1500 and the tenant network an mtu of 1446
Previously this used to work by default, without any adjustments.
(In reply to Marius Cornea from comment #0)
> NeutronGlobalPhysnetMtu: 1496
> which resulted in the external network having a MTU of 1500 and the tenant
> network an mtu of 1446
Correction here - the external network results with a 1496 MTU
External 1500 to internal 1450 should still work, because on L3 boundary (router) fragmentation should happen if needed. The fact that 4 bytes reduction helps to fix the issue suggests that it may be the upstream https://bugs.launchpad.net/neutron/+bug/1622017 that was fixed in RC1: https://review.openstack.org/#/c/368553/
Could you please retest with a newer python-neutron package that would include the patch I mentioned above? Alternatively, you can validate if switching to ovs-ofctl of_interface helps.
(In reply to Ihar Hrachyshka from comment #4)
> External 1500 to internal 1450 should still work, because on L3 boundary
> (router) fragmentation should happen if needed. The fact that 4 bytes
> reduction helps to fix the issue suggests that it may be the upstream
> https://bugs.launchpad.net/neutron/+bug/1622017 that was fixed in RC1:
> Could you please retest with a newer python-neutron package that would
> include the patch I mentioned above? Alternatively, you can validate if
> switching to ovs-ofctl of_interface helps.
After applying https://review.openstack.org/#/c/368553/ to the overcloud image I was able to successfully SSH to the instance with the default MTU:
Comment 5 also suggests that the workaround with using an alternative driver for of_interface would help too. It's up to us how we proceed. I suggest a backport for neutron.
Already fixed and merged, will be available in OSP 10 puddle based off RC1.
We ended up backporting the fix so it's available in an RPM based off M3. The cherry pick will not be required once the RPM is rebased to be based off RC1.
Was able to ssh into the launched instance.
The MTU of the created VXLAN tenant network is: 1446
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.