Current version of OVN pipelines relies on L4 port matching in some of the OpenFlow rules. But matching on them is not possible in case of later IP fragments. For that reason OVN even has a special stage called 'lr_in_defrag' that is passing all the traffic through conntrack in hopes that conntrack will re-assemble the packet. But that is not a generally correct assumption. OpenFlow defines several operation modes for a switch: - OFPC_FRAG_NORMAL = 0, /* No special handling for fragments. */ - OFPC_FRAG_DROP = 1, /* Drop fragments. */ - OFPC_FRAG_REASM = 2, /* Reassemble (only if OFPC_IP_REASM set). */ Open vSwitch has a following extension for the list: - OFPC_FRAG_NX_MATCH = 3, /* Make first fragments available for matching. */ OFPC_FRAG_NX_MATCH is default mode in OVS and it does not support OFPC_FRAG_REASM. So, from the OpenFlow point of view, users cannot use L4 information on later fragments and users cannot expect Open vSwitch to reassemble fragmented IP packets. The fact that kernel conntrack does reassemble IP fragments is an unfortunate side effect of re-using kernel connection tracking implementation. And it is causing a lot of issues for OVS pipeline. A few examples are: Necessity to re-fragment reassembled packets back on egress. This is a problem, because re-fragmentation in theory has to slice the packets in the exact fragments it was sliced before. It's hard to do that, and current implementation can lead to forwarding back NEEDS_FRAG replies for fragments never sent from the source. Also, OVS actions like truncated output and check_pkt_len may not work correctly, because reassembled packet obviously have different length. See some lengthy discussions on this patch: https://lore.kernel.org/all/20210319204307.3128280-1-aconole@redhat.com/ And the final point is that userspace conntrack behaves differently. :) It doesn't reassemble fragmented packets, but releases all the fragments back to the datapath in exactly same form they came into it. So, after ct() action these packets have a conntrack state, but they are still separate IP fragments, so L4 information cannot be matched. So, ability to reassemble IP fragments is not backed up by neither OpenFlow specification nor different datapath implementations. In general, OVN cannot rely on IP fragments being reassembled and needs to build pipelines accordingly. One potential solution could be to use ct mark/label to store required L4 information, so it will be available to later fragments, or ct_tp_src/dst might be utilized somehow. But I didn't think this through. Alternative might be to make userspace datapath mimic the kernel conntrack. However, that might be not acceptable due to, likely, noticeable performance drop and potential requirement to re-work the whole dp-packet management in order to accommodate multi-buffer packets, which was unsuccessfully tried before several times.
Just to give more context using userspace datapath currently fails the test in following patch: https://patchwork.ozlabs.org/project/ovn/patch/20230309062144.726961-1-amusil@redhat.com/
We need to resolve https://issues.redhat.com/browse/FDP-124 first before proceeding.
THis issue is being closed because it is one of three open OVN Bugzilla issues. If this issue is still a problem in modern OVN versions, please create a Jira issue.