Bug 2178962

Summary: Revisit OVN's logic of flushing conntrack
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Surya Seetharaman <surya>
Component: ovn23.03Assignee: Ales Musil <amusil>
Status: CLOSED ERRATA QA Contact: ying xu <yinxu>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: FDP 23.ACC: amusil, ctrautma, dceara, jiji, jishi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn23.03-23.03.0-8.el8fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-07-06 20:05:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Surya Seetharaman 2023-03-16 09:54:44 UTC
Description of problem:

After we did RFE: https://bugzilla.redhat.com/show_bug.cgi?id=1839103 -> https://patchwork.ozlabs.org/project/ovn/patch/20230124114622.37867-2-amusil@redhat.com/
OVN has started flushing conntrack entries for services (vips->backends) whenever the 5 tuple changes (so either LB is deleted OR VIP, VIP port, backend IP, backend port, proto changes). 

From OVNK pov, we only care about UDP, flushing conntrack for TCP/SCTP is not desired because we want graceful terminations for those endpoints going away.

Another issue is that currently OVN is flushing CT entries for all zones including 64xxx zones owned by OVNK.

This has been causing issues downstream on openshift CI:
we see a CT entry with established but unreplied..

tcp      6 157 ESTABLISHED src=10.131.0.6 dst=10.0.178.50 sport=44680 dport=6443 [UNREPLIED] src=10.0.178.50 dst=10.0.139.173 sport=6443 dport=44680 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

Side-note: and other subtle issues like when OVNK does a LB merge it changes the names and this causes a libovsdb LB update transact in two ops that causes a deletion and addition to load balancers and thus triggers a flush on OVN side. These are causing network issues and disruptions all over. -> This is probably something that OVNK needs to fix but point being OVN doesn't have enough info to determine what's a right situation to flush and what's not currently for the CMS, "flush always as soon as any of the 5 tuple changes logic" isn't working well currently for OVNK.

We need to revisit the logic behind OVN's CT flush. For now, we could either revert the fix, or add a knob to make this opt-in or have a way for CMS to disable OVN from flushing.





Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Surya Seetharaman 2023-03-16 09:55:37 UTC
See https://redhat-internal.slack.com/archives/C01G7T6SYSD/p1678895123613709 for details

Comment 3 OVN Bot 2023-03-17 04:07:51 UTC
ovn23.03 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2179215

Comment 7 ying xu 2023-05-05 06:45:54 UTC
verified on version:
# rpm -qa|grep ovn23
ovn23.03-23.03.0-24.el8fdp.x86_64
ovn23.03-host-23.03.0-24.el8fdp.x86_64
ovn23.03-central-23.03.0-24.el8fdp.x86_64

with option ct_flush=true:
ovn-nbctl lb-add lb2 192.168.2.100:8080 192.168.2.2:80,192.168.2.3:80  -- set load_balancer lb2 options:ct_flush="true"
ovn-nbctl ls-lb-add foo lb2

check the conntrack:
# ovs-appctl dpctl/dump-conntrack|grep 192.168.2.100
tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=37316,dport=8080),reply=(src=192.168.2.3,dst=192.168.1.2,sport=80,dport=37316),zone=13,mark=2,protoinfo=(state=SYN_SENT)
tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=53580,dport=8080),reply=(src=192.168.2.3,dst=192.168.1.2,sport=80,dport=53580),zone=13,mark=2,protoinfo=(state=SYN_SENT)

delete lb, the conntrack flushed
# ovn-nbctl lb-del lb2
# ovs-appctl dpctl/dump-conntrack|grep 192.168.2.100   ------------flushed

without ct_flush=true
# ovn-nbctl lb-del lb2
[root@dell-per740-53 nat]#  ovs-appctl dpctl/dump-conntrack|grep 192.168.2.100
tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=40054,dport=8080),reply=(src=192.168.2.100,dst=192.168.1.2,sport=8080,dport=40054),zone=13,protoinfo=(state=SYN_SENT)  -------not flushed.

Comment 11 errata-xmlrpc 2023-07-06 20:05:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn23.03 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3991