Description of problem: In the router pipeline, after conntrack recirculation (for NAT), we don't check conntrack state and continue processing packets even if they were marked as +trk+inv. In specific cases, e.g., some of which are described in bug 2130939, FIN packets processed without any prior DATA packets or out of order RST packets, will not be NAT-ed. If NB_Global.options:use_ct_inv_match is "true" (current OVN default) these packets should be dropped after the logical router NAT stages. Steps to Reproduce: Send TCP traffic that should be SNATed by an OVN gateway. Fail over to a new gateway and force the connection to be closed (without sending any data). Actual results: The FIN packet leaves the OVN cluster without being SNAT-ed. Expected results: The FIN packet should either be SNAT-ed or dropped. Additional info: The not-SNAT-ed RST packets issue can be hit with the steps described in https://bugzilla.redhat.com/show_bug.cgi?id=2130939#c8
upstream support: https://patchwork.ozlabs.org/project/ovn/patch/34c8edba46bedd90656fd5603a85c9cbe7a34e99.1675807627.git.lorenzo.bianconi@redhat.com/
ovn23.03 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2181414 ovn23.03 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2181415
Moving back to ASSIGNED as the fix actually got reverted quickly after it was applied. The revert was done via https://github.com/ovn-org/ovn/commit/0c71712b35. No released OVN version (usptream or downstream) has the original patch anymore.
During our sprint planning meeting today, we discussed this issue. The idea we came up with was to send all packets that traverse a logical router with a load balancer to conntrack. This is similar to what we currently do on logical switches that have a stateful ACL or load balancer on them. This way, we can properly determine whether packets that bypassed conntrack to go directly to a load balancer backend are invalid or not. I have updated the devel whiteboard to remove the "ovn-synced" and clones since this issue will go through ovn-sync automation again and will need to be updated properly. I also have unassigned this issue from Lorenzo since he doesn't need to be on the hook for the enhanced scope of this issue.
ovn23.06 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2203010 ovn23.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2203011
This issue is being closed as an automatic process due to the issue's age. If you wish to re-open this issue, please do so in Jira (https://issues.redhat.com) in the 'FDP' project. Please be sure to set the component to the latest OVN version where this issue is known to occur. If this is a feature request or improvement, please set the component to 'OVN'.