Bug 1927047

Summary: multiple external gateway pods will not work in ingress with IP fragmentation
Product: OpenShift Container Platform Reporter: Tim Rozet <trozet>
Component: NetworkingAssignee: Tim Rozet <trozet>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: high CC: aconstan, anbhat, astoycos, bbennett, dblack, jlema, mark.d.gray, murali, wking
Version: 4.6   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:43:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tim Rozet 2021-02-09 22:28:39 UTC
Description of problem:
Ingress cluster traffic directly towards pods will be not be delivered if the traffic is IP fragmented. This includes UDP traffic. The traffic will make it into the node and OVS, then get dropped as it is sent to the pod due to a refragmentation issue in OVS:

https://bugzilla.redhat.com/show_bug.cgi?id=1927046

The purpose of this bug is to track this fix landing into OCP.

Comment 2 Tim Rozet 2021-03-16 15:03:04 UTC
*** Bug 1936010 has been marked as a duplicate of this bug. ***

Comment 3 Tim Rozet 2021-04-08 22:22:51 UTC
OVS patch was rejected upstream. So we need a different solution...

dcbw has an idea to move all of the MTUs to be equal in our cluster and then just use ip route <pod network> mtu <max mtu - geneve overhead> to force pods to send traffic that will fit the tunnel. I think this is a great idea. Need to think about it more and how it will affect upgrades, etc.

Comment 4 Tim Rozet 2021-04-20 14:41:16 UTC
Tried this idea out:
https://github.com/trozet/ovn-kubernetes/commit/9c023c69fddc4ddb2f2d9720dc8fccc035140c0d

Unfortunately I think it is going to introduce more potential pitfalls where we would have to lower the MTU for some nodeport services as well as external IP, because those would resolve to east/west endpoints. That is not very scalable because we would end up having to go into every pod and update routes based on service changes.

Spoke with the OVN team and new plan is to just allow OVN to detect if the packet is too large (larger than pod MTU) and then send correct ICMP message to indicate fragmentation needed. Moving dependent bug to OVN team.

Comment 6 Tim Rozet 2021-05-21 21:59:02 UTC
Note, this issue affects 4.8 with shared gateway mode for accessing services and not local gateway mode. Local gateway mode service packets are handled via the kernel so the kernel will respond with ICMP needs frag or packet too big. However in 4.6->4.8, both gateway modes are affected for external gateway -> pod traffic. In both gateway mode cases, packets go directly via br-ex to the pod and not via kernel.

Rather than wait on an OVN fix to handle this case, I've implemented a fix in ovn-kubernetes for these releases where if we detect a packet in the respective mode that is going to be larger than the pod MTU, we send it to the kernel. The kernel then will send out the ICMP needs frag/pkt too big based on routes and next hop interfaces where we have the MTU set to the pod's MTU.

https://github.com/ovn-org/ovn-kubernetes/pull/2225

Comment 9 Anurag saxena 2021-06-10 14:14:50 UTC
@trozet ^

Comment 14 errata-xmlrpc 2021-07-27 22:43:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 15 Red Hat Bugzilla 2023-09-15 01:00:53 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days