Bug 2160685

Summary: Drop packets with ct_state +trk+inv in the router pipeline.
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Dumitru Ceara <dceara>
Component: ovn22.12Assignee: OVN Team <ovnteam>
Status: CLOSED WONTFIX QA Contact: Jianlin Shi <jishi>
Severity: unspecified Docs Contact:
Priority: high    
Version: FDP 22.LCC: ctrautma, jiji, lorenzo.bianconi, mmichels
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-02-14 21:15:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dumitru Ceara 2023-01-13 10:30:55 UTC
Description of problem:

In the router pipeline, after conntrack recirculation (for NAT), we don't check conntrack state and continue processing packets even if they were marked as +trk+inv.

In specific cases, e.g., some of which are described in bug 2130939, FIN packets processed without any prior DATA packets or out of order RST packets, will not be NAT-ed.  If NB_Global.options:use_ct_inv_match is "true" (current OVN default) these packets should be dropped after the logical router NAT stages.

Steps to Reproduce:

Send TCP traffic that should be SNATed by an OVN gateway.  Fail over to a new gateway and force the connection to be closed (without sending any data).

Actual results:
The FIN packet leaves the OVN cluster without being SNAT-ed.

Expected results:
The FIN packet should either be SNAT-ed or dropped.

Additional info:
The not-SNAT-ed RST packets issue can be hit with the steps described in https://bugzilla.redhat.com/show_bug.cgi?id=2130939#c8

Comment 2 OVN Bot 2023-03-24 04:08:19 UTC
ovn23.03 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2181414
ovn23.03 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2181415

Comment 3 Dumitru Ceara 2023-04-20 08:02:23 UTC
Moving back to ASSIGNED as the fix actually got reverted quickly after it was applied.

The revert was done via https://github.com/ovn-org/ovn/commit/0c71712b35.

No released OVN version (usptream or downstream) has the original patch anymore.

Comment 4 Mark Michelson 2023-05-09 15:07:03 UTC
During our sprint planning meeting today, we discussed this issue.

The idea we came up with was to send all packets that traverse a logical router with a load balancer to conntrack.  This is similar to what we currently do on logical switches that have a stateful ACL or load balancer on them.

This way, we can properly determine whether packets that bypassed conntrack to go directly to a load balancer backend are invalid or not.

I have updated the devel whiteboard to remove the "ovn-synced" and clones since this issue will go through ovn-sync automation again and will need to be updated properly. I also have unassigned this issue from Lorenzo since he doesn't need to be on the hook for the enhanced scope of this issue.

Comment 5 OVN Bot 2023-05-11 04:10:21 UTC
ovn23.06 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2203010
ovn23.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2203011

Comment 6 OVN Bot 2024-02-14 21:15:37 UTC
This issue is being closed as an automatic process due to the issue's age. If you wish to re-open this issue, please do so in Jira (https://issues.redhat.com) in the 'FDP' project. Please be sure to set the component to the latest OVN version where this issue is known to occur. If this is a feature request or improvement, please set the component to 'OVN'.