Description of problem: After we did RFE: https://bugzilla.redhat.com/show_bug.cgi?id=1839103 -> https://patchwork.ozlabs.org/project/ovn/patch/20230124114622.37867-2-amusil@redhat.com/ OVN has started flushing conntrack entries for services (vips->backends) whenever the 5 tuple changes (so either LB is deleted OR VIP, VIP port, backend IP, backend port, proto changes). From OVNK pov, we only care about UDP, flushing conntrack for TCP/SCTP is not desired because we want graceful terminations for those endpoints going away. Another issue is that currently OVN is flushing CT entries for all zones including 64xxx zones owned by OVNK. This has been causing issues downstream on openshift CI: we see a CT entry with established but unreplied.. tcp 6 157 ESTABLISHED src=10.131.0.6 dst=10.0.178.50 sport=44680 dport=6443 [UNREPLIED] src=10.0.178.50 dst=10.0.139.173 sport=6443 dport=44680 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 Side-note: and other subtle issues like when OVNK does a LB merge it changes the names and this causes a libovsdb LB update transact in two ops that causes a deletion and addition to load balancers and thus triggers a flush on OVN side. These are causing network issues and disruptions all over. -> This is probably something that OVNK needs to fix but point being OVN doesn't have enough info to determine what's a right situation to flush and what's not currently for the CMS, "flush always as soon as any of the 5 tuple changes logic" isn't working well currently for OVNK. We need to revisit the logic behind OVN's CT flush. For now, we could either revert the fix, or add a knob to make this opt-in or have a way for CMS to disable OVN from flushing. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
See https://redhat-internal.slack.com/archives/C01G7T6SYSD/p1678895123613709 for details
ovn23.03 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2179215
u/s patch posted: https://patchwork.ozlabs.org/project/ovn/patch/20230317104836.384388-1-amusil@redhat.com/
verified on version: # rpm -qa|grep ovn23 ovn23.03-23.03.0-24.el8fdp.x86_64 ovn23.03-host-23.03.0-24.el8fdp.x86_64 ovn23.03-central-23.03.0-24.el8fdp.x86_64 with option ct_flush=true: ovn-nbctl lb-add lb2 192.168.2.100:8080 192.168.2.2:80,192.168.2.3:80 -- set load_balancer lb2 options:ct_flush="true" ovn-nbctl ls-lb-add foo lb2 check the conntrack: # ovs-appctl dpctl/dump-conntrack|grep 192.168.2.100 tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=37316,dport=8080),reply=(src=192.168.2.3,dst=192.168.1.2,sport=80,dport=37316),zone=13,mark=2,protoinfo=(state=SYN_SENT) tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=53580,dport=8080),reply=(src=192.168.2.3,dst=192.168.1.2,sport=80,dport=53580),zone=13,mark=2,protoinfo=(state=SYN_SENT) delete lb, the conntrack flushed # ovn-nbctl lb-del lb2 # ovs-appctl dpctl/dump-conntrack|grep 192.168.2.100 ------------flushed without ct_flush=true # ovn-nbctl lb-del lb2 [root@dell-per740-53 nat]# ovs-appctl dpctl/dump-conntrack|grep 192.168.2.100 tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=40054,dport=8080),reply=(src=192.168.2.100,dst=192.168.1.2,sport=8080,dport=40054),zone=13,protoinfo=(state=SYN_SENT) -------not flushed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn23.03 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3991