Bug 2178962 - Revisit OVN's logic of flushing conntrack
Summary: Revisit OVN's logic of flushing conntrack
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn23.03
Version: FDP 23.A
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Ales Musil
QA Contact: ying xu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-16 09:54 UTC by Surya Seetharaman
Modified: 2023-07-06 20:05 UTC (History)
5 users (show)

Fixed In Version: ovn23.03-23.03.0-8.el8fdp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-06 20:05:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-2747 0 None None None 2023-03-16 09:55:50 UTC
Red Hat Product Errata RHBA-2023:3991 0 None None None 2023-07-06 20:05:36 UTC

Description Surya Seetharaman 2023-03-16 09:54:44 UTC
Description of problem:

After we did RFE: https://bugzilla.redhat.com/show_bug.cgi?id=1839103 -> https://patchwork.ozlabs.org/project/ovn/patch/20230124114622.37867-2-amusil@redhat.com/
OVN has started flushing conntrack entries for services (vips->backends) whenever the 5 tuple changes (so either LB is deleted OR VIP, VIP port, backend IP, backend port, proto changes). 

From OVNK pov, we only care about UDP, flushing conntrack for TCP/SCTP is not desired because we want graceful terminations for those endpoints going away.

Another issue is that currently OVN is flushing CT entries for all zones including 64xxx zones owned by OVNK.

This has been causing issues downstream on openshift CI:
we see a CT entry with established but unreplied..

tcp      6 157 ESTABLISHED src=10.131.0.6 dst=10.0.178.50 sport=44680 dport=6443 [UNREPLIED] src=10.0.178.50 dst=10.0.139.173 sport=6443 dport=44680 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

Side-note: and other subtle issues like when OVNK does a LB merge it changes the names and this causes a libovsdb LB update transact in two ops that causes a deletion and addition to load balancers and thus triggers a flush on OVN side. These are causing network issues and disruptions all over. -> This is probably something that OVNK needs to fix but point being OVN doesn't have enough info to determine what's a right situation to flush and what's not currently for the CMS, "flush always as soon as any of the 5 tuple changes logic" isn't working well currently for OVNK.

We need to revisit the logic behind OVN's CT flush. For now, we could either revert the fix, or add a knob to make this opt-in or have a way for CMS to disable OVN from flushing.





Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Surya Seetharaman 2023-03-16 09:55:37 UTC
See https://redhat-internal.slack.com/archives/C01G7T6SYSD/p1678895123613709 for details

Comment 3 OVN Bot 2023-03-17 04:07:51 UTC
ovn23.03 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2179215

Comment 7 ying xu 2023-05-05 06:45:54 UTC
verified on version:
# rpm -qa|grep ovn23
ovn23.03-23.03.0-24.el8fdp.x86_64
ovn23.03-host-23.03.0-24.el8fdp.x86_64
ovn23.03-central-23.03.0-24.el8fdp.x86_64

with option ct_flush=true:
ovn-nbctl lb-add lb2 192.168.2.100:8080 192.168.2.2:80,192.168.2.3:80  -- set load_balancer lb2 options:ct_flush="true"
ovn-nbctl ls-lb-add foo lb2

check the conntrack:
# ovs-appctl dpctl/dump-conntrack|grep 192.168.2.100
tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=37316,dport=8080),reply=(src=192.168.2.3,dst=192.168.1.2,sport=80,dport=37316),zone=13,mark=2,protoinfo=(state=SYN_SENT)
tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=53580,dport=8080),reply=(src=192.168.2.3,dst=192.168.1.2,sport=80,dport=53580),zone=13,mark=2,protoinfo=(state=SYN_SENT)

delete lb, the conntrack flushed
# ovn-nbctl lb-del lb2
# ovs-appctl dpctl/dump-conntrack|grep 192.168.2.100   ------------flushed

without ct_flush=true
# ovn-nbctl lb-del lb2
[root@dell-per740-53 nat]#  ovs-appctl dpctl/dump-conntrack|grep 192.168.2.100
tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=40054,dport=8080),reply=(src=192.168.2.100,dst=192.168.1.2,sport=8080,dport=40054),zone=13,protoinfo=(state=SYN_SENT)  -------not flushed.

Comment 11 errata-xmlrpc 2023-07-06 20:05:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn23.03 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3991


Note You need to log in before you can comment on or make changes to this bug.