Description of problem: Revisit OVN's logic of flushing conntrack for LR this bug is releated to https://bugzilla.redhat.com/show_bug.cgi?id=2178962 when the LB is set to LR, ct_flush=true doesn't work. Version-Release number of selected component (if applicable): ovn23.03-23.03.0-24.el8fdp.x86_64 How reproducible: always Steps to Reproduce: with option ct_flush=true: ovn-nbctl lb-add lb2 192.168.2.100:8080 192.168.2.2:80,192.168.2.3:80 -- set load_balancer lb2 options:ct_flush="true" ovn-nbctl lr-lb-add lr1 lb2 check the conntrack: # ovs-appctl dpctl/dump-conntrack|grep 192.168.2.100 tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=37316,dport=8080),reply=(src=192.168.2.3,dst=192.168.1.2,sport=80,dport=37316),zone=1,mark=2,protoinfo=(state=SYN_SENT) tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=53580,dport=8080),reply=(src=192.168.2.3,dst=192.168.1.2,sport=80,dport=53580),zone=1,mark=2,protoinfo=(state=SYN_SENT) then delete lb ovn-nbctl lb-del lb2 Actual results: conntrack entries still there # ovs-appctl dpctl/dump-conntrack|grep 192.168.2.100 tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=37316,dport=8080),reply=(src=192.168.2.3,dst=192.168.1.2,sport=80,dport=37316),zone=1,mark=2,protoinfo=(state=SYN_SENT) tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=53580,dport=8080),reply=(src=192.168.2.3,dst=192.168.1.2,sport=80,dport=53580),zone=1,mark=2,protoinfo=(state=SYN_SENT) Expected results: conntrack entries flushed # ovs-appctl dpctl/dump-conntrack|grep 192.168.2.100 Additional info:
I reproduced this with the following test added to tests/system-ovn.at: OVN_FOR_EACH_NORTHD([ AT_SETUP([ct_flush on logical router load balancer]) CHECK_CONNTRACK() CHECK_CONNTRACK_NAT() ovn_start OVS_TRAFFIC_VSWITCHD_START() ADD_BR([br-int]) # # Set external-ids in br-int needed for ovn-controller ovs-vsctl \ -- set Open_vSwitch . external-ids:system-id=hv1 \ -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ -- set bridge br-int fail-mode=secure other-config:disable-in-band=true start_daemon ovn-controller check ovn-nbctl lr-add R1 check ovn-nbctl ls-add sw0 check ovn-nbctl ls-add public check ovn-nbctl lrp-add R1 rp-sw0 00:00:01:01:02:03 192.168.1.1/24 check ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.1/24 check ovn-nbctl set logical_router R1 options:chassis=hv1 check ovn-nbctl lsp-add sw0 sw0-rp -- set Logical_Switch_Port sw0-rp \ type=router options:router-port=rp-sw0 \ -- lsp-set-addresses sw0-rp router check ovn-nbctl lsp-add sw0 sw0-vm \ -- lsp-set-addresses sw0-vm "00:00:01:01:02:04 192.168.1.2/24" check ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port public-rp \ type=router options:router-port=rp-public \ -- lsp-set-addresses public-rp router check ovn-nbctl lsp-add public public-vm \ -- lsp-set-addresses public-vm "00:00:02:01:02:04 172.16.1.2/24" ADD_NAMESPACES(sw0-vm) ADD_VETH(sw0-vm, sw0-vm, br-int, "192.168.1.2/24", "00:00:01:01:02:04", \ "192.168.1.1") OVS_WAIT_UNTIL([test "$(ip netns exec sw0-vm ip a | grep fe80 | grep tentative)" = ""]) ADD_NAMESPACES(public-vm) ADD_VETH(public-vm, public-vm, br-int, "172.16.1.2/24", "00:00:02:01:02:04", \ "172.16.1.1") OVS_WAIT_UNTIL([test "$(ip netns exec public-vm ip a | grep fe80 | grep tentative)" = ""]) # Start webservers in 'server'. OVS_START_L7([sw0-vm], [http]) # Create a load balancer and associate to R1 check ovn-nbctl lb-add lb1 172.16.1.150:80 192.168.1.2:80 \ -- set load_balancer lb1 options:ct_flush="true" check ovn-nbctl lr-lb-add R1 lb1 check ovn-nbctl --wait=hv sync for i in $(seq 1 5); do echo Request $i NS_CHECK_EXEC([public-vm], [wget 172.16.1.150 -t 5 -T 1 --retry-connrefused -v -o wget$i.log]) done OVS_WAIT_FOR_OUTPUT([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.1.150) | wc -l ], [0], [dnl 1 ]) check ovn-nbctl lb-del lb1 # XXX This check fails because the conntrack entry remains OVS_WAIT_FOR_OUTPUT([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.1.150) | wc -l ], [0], [dnl 0 ]) OVS_APP_EXIT_AND_WAIT([ovn-controller]) as ovn-sb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) as ovn-nb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) as northd OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE]) as OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d /Failed to acquire.*/d /connection dropped.*/d"]) AT_CLEANUP ])
What needs to be done is pretty clear, we also need to sync LBs attached to LR. How to achieve that isn't very clear, there is a couple of options: 1) Have a single DP group for both LR and LS. This will be significantly harder to achieve since https://github.com/ovn-org/ovn/commit/53febfbc37768f4d6c4a1fce837cd11d593d4c43 2) Allow two DP groups to be present in the SB DB. 3) Duplicate the LB in SB if it is applied to both LS and LR. This will also require changes to the CT flush in ovn-controller. One thing to keep in mind is that this should be backported down to 23.03.
ovn23.09 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2245944
I'm closing this issue. The issue is fixed in ovn23.09+ . The implementation required the addition of new columns to the database, and was deemed a risk to backport. Therefore, this issue raised against ovn23.03 will not get the fix.