Bug 1927047 - multiple external gateway pods will not work in ingress with IP fragmentation [NEEDINFO]
Summary: multiple external gateway pods will not work in ingress with IP fragmentation
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.8.0
Assignee: Tim Rozet
QA Contact: Anurag saxena
: 1936010 (view as bug list)
Depends On:
TreeView+ depends on / blocked
Reported: 2021-02-09 22:28 UTC by Tim Rozet
Modified: 2021-07-27 22:43 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2021-07-27 22:43:10 UTC
Target Upstream Version:
anusaxen: needinfo? (trozet)
anusaxen: needinfo? (trozet)

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 559 0 None open Bug 1927047: Handling packet sizes greater than pod MTU 2021-06-01 12:52:07 UTC
Github ovn-org ovn-kubernetes pull 2225 0 None closed Bug 1927047: Fixes handling large packets towards OVN 2021-06-01 19:28:50 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:43:32 UTC

Description Tim Rozet 2021-02-09 22:28:39 UTC
Description of problem:
Ingress cluster traffic directly towards pods will be not be delivered if the traffic is IP fragmented. This includes UDP traffic. The traffic will make it into the node and OVS, then get dropped as it is sent to the pod due to a refragmentation issue in OVS:


The purpose of this bug is to track this fix landing into OCP.

Comment 2 Tim Rozet 2021-03-16 15:03:04 UTC
*** Bug 1936010 has been marked as a duplicate of this bug. ***

Comment 3 Tim Rozet 2021-04-08 22:22:51 UTC
OVS patch was rejected upstream. So we need a different solution...

dcbw has an idea to move all of the MTUs to be equal in our cluster and then just use ip route <pod network> mtu <max mtu - geneve overhead> to force pods to send traffic that will fit the tunnel. I think this is a great idea. Need to think about it more and how it will affect upgrades, etc.

Comment 4 Tim Rozet 2021-04-20 14:41:16 UTC
Tried this idea out:

Unfortunately I think it is going to introduce more potential pitfalls where we would have to lower the MTU for some nodeport services as well as external IP, because those would resolve to east/west endpoints. That is not very scalable because we would end up having to go into every pod and update routes based on service changes.

Spoke with the OVN team and new plan is to just allow OVN to detect if the packet is too large (larger than pod MTU) and then send correct ICMP message to indicate fragmentation needed. Moving dependent bug to OVN team.

Comment 6 Tim Rozet 2021-05-21 21:59:02 UTC
Note, this issue affects 4.8 with shared gateway mode for accessing services and not local gateway mode. Local gateway mode service packets are handled via the kernel so the kernel will respond with ICMP needs frag or packet too big. However in 4.6->4.8, both gateway modes are affected for external gateway -> pod traffic. In both gateway mode cases, packets go directly via br-ex to the pod and not via kernel.

Rather than wait on an OVN fix to handle this case, I've implemented a fix in ovn-kubernetes for these releases where if we detect a packet in the respective mode that is going to be larger than the pod MTU, we send it to the kernel. The kernel then will send out the ICMP needs frag/pkt too big based on routes and next hop interfaces where we have the MTU set to the pod's MTU.


Comment 9 Anurag saxena 2021-06-10 14:14:50 UTC
@trozet@redhat.com ^

Comment 14 errata-xmlrpc 2021-07-27 22:43:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.