2193323 – Revisit OVN's logic of flushing conntrack for LR

The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2193323 - Revisit OVN's logic of flushing conntrack for LR

Summary: Revisit OVN's logic of flushing conntrack for LR

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux Fast Datapath
Classification:	Red Hat
Component:	ovn23.03
Sub Component:
Version:	FDP 23.D
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	lorenzo bianconi
QA Contact:	ying xu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-05-05 08:22 UTC by ying xu
Modified:	2024-03-04 19:03 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-03-04 19:03:45 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	FD-2838	0	None	None	None	2023-05-05 08:22:53 UTC

Description ying xu 2023-05-05 08:22:33 UTC

Description of problem:
Revisit OVN's logic of flushing conntrack for LR
this bug is releated to  https://bugzilla.redhat.com/show_bug.cgi?id=2178962
when the LB is set to LR, ct_flush=true doesn't work.

Version-Release number of selected component (if applicable):
ovn23.03-23.03.0-24.el8fdp.x86_64

How reproducible:
always

Steps to Reproduce:
with option ct_flush=true:
ovn-nbctl lb-add lb2 192.168.2.100:8080 192.168.2.2:80,192.168.2.3:80  -- set load_balancer lb2 options:ct_flush="true"
ovn-nbctl lr-lb-add lr1 lb2

check the conntrack:
# ovs-appctl dpctl/dump-conntrack|grep 192.168.2.100
tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=37316,dport=8080),reply=(src=192.168.2.3,dst=192.168.1.2,sport=80,dport=37316),zone=1,mark=2,protoinfo=(state=SYN_SENT)
tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=53580,dport=8080),reply=(src=192.168.2.3,dst=192.168.1.2,sport=80,dport=53580),zone=1,mark=2,protoinfo=(state=SYN_SENT)

then delete lb
ovn-nbctl lb-del lb2

Actual results:
conntrack entries still there
# ovs-appctl dpctl/dump-conntrack|grep 192.168.2.100
tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=37316,dport=8080),reply=(src=192.168.2.3,dst=192.168.1.2,sport=80,dport=37316),zone=1,mark=2,protoinfo=(state=SYN_SENT)
tcp,orig=(src=192.168.1.2,dst=192.168.2.100,sport=53580,dport=8080),reply=(src=192.168.2.3,dst=192.168.1.2,sport=80,dport=53580),zone=1,mark=2,protoinfo=(state=SYN_SENT)

Expected results:
conntrack entries flushed
# ovs-appctl dpctl/dump-conntrack|grep 192.168.2.100

Additional info:

Comment 1 Mark Michelson 2023-06-02 19:09:55 UTC

I reproduced this with the following test added to tests/system-ovn.at:

OVN_FOR_EACH_NORTHD([
AT_SETUP([ct_flush on logical router load balancer])
CHECK_CONNTRACK()
CHECK_CONNTRACK_NAT()
ovn_start
OVS_TRAFFIC_VSWITCHD_START()
ADD_BR([br-int])
#
# Set external-ids in br-int needed for ovn-controller
ovs-vsctl \
        -- set Open_vSwitch . external-ids:system-id=hv1 \
        -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
        -- set bridge br-int fail-mode=secure other-config:disable-in-band=true

start_daemon ovn-controller

check ovn-nbctl lr-add R1

check ovn-nbctl ls-add sw0
check ovn-nbctl ls-add public

check ovn-nbctl lrp-add R1 rp-sw0 00:00:01:01:02:03 192.168.1.1/24
check ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.1/24

check ovn-nbctl set logical_router R1 options:chassis=hv1

check ovn-nbctl lsp-add sw0 sw0-rp -- set Logical_Switch_Port sw0-rp \
    type=router options:router-port=rp-sw0 \
    -- lsp-set-addresses sw0-rp router

check ovn-nbctl lsp-add sw0 sw0-vm \
    -- lsp-set-addresses sw0-vm "00:00:01:01:02:04 192.168.1.2/24"

check ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port public-rp \
    type=router options:router-port=rp-public \
    -- lsp-set-addresses public-rp router

check ovn-nbctl lsp-add public public-vm \
   -- lsp-set-addresses public-vm "00:00:02:01:02:04 172.16.1.2/24"

ADD_NAMESPACES(sw0-vm)
ADD_VETH(sw0-vm, sw0-vm, br-int, "192.168.1.2/24", "00:00:01:01:02:04", \
         "192.168.1.1")
OVS_WAIT_UNTIL([test "$(ip netns exec sw0-vm ip a | grep fe80 | grep tentative)" = ""])

ADD_NAMESPACES(public-vm)
ADD_VETH(public-vm, public-vm, br-int, "172.16.1.2/24", "00:00:02:01:02:04", \
         "172.16.1.1")

OVS_WAIT_UNTIL([test "$(ip netns exec public-vm ip a | grep fe80 | grep tentative)" = ""])

# Start webservers in 'server'.
OVS_START_L7([sw0-vm], [http])

# Create a load balancer and associate to R1
check ovn-nbctl lb-add lb1 172.16.1.150:80 192.168.1.2:80 \
    -- set load_balancer lb1 options:ct_flush="true"
check ovn-nbctl lr-lb-add R1 lb1

check ovn-nbctl --wait=hv sync

for i in $(seq 1 5); do
    echo Request $i
    NS_CHECK_EXEC([public-vm], [wget 172.16.1.150 -t 5 -T 1 --retry-connrefused -v -o wget$i.log])
done

OVS_WAIT_FOR_OUTPUT([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.1.150) | wc -l ], [0], [dnl
1
])

check ovn-nbctl lb-del lb1

# XXX This check fails because the conntrack entry remains
OVS_WAIT_FOR_OUTPUT([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.1.150) | wc -l ], [0], [dnl
0
])

OVS_APP_EXIT_AND_WAIT([ovn-controller])

as ovn-sb
OVS_APP_EXIT_AND_WAIT([ovsdb-server])

as ovn-nb
OVS_APP_EXIT_AND_WAIT([ovsdb-server])

as northd
OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])

as
OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
/Failed to acquire.*/d
/connection dropped.*/d"])
AT_CLEANUP
])

Comment 2 Ales Musil 2023-06-05 05:50:59 UTC

What needs to be done is pretty clear, we also need to sync LBs attached to LR.
How to achieve that isn't very clear, there is a couple of options:

1) Have a single DP group for both LR and LS. This will be significantly harder to achieve since https://github.com/ovn-org/ovn/commit/53febfbc37768f4d6c4a1fce837cd11d593d4c43
2) Allow two DP groups to be present in the SB DB. 
3) Duplicate the LB in SB if it is applied to both LS and LR. This will also require changes to the CT flush in ovn-controller.

One thing to keep in mind is that this should be backported down to 23.03.

Comment 3 OVN Bot 2023-10-24 17:17:06 UTC

ovn23.09 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2245944

Comment 4 Mark Michelson 2024-03-04 19:03:45 UTC

I'm closing this issue. The issue is fixed in ovn23.09+ . The implementation required the addition of new columns to the database, and was deemed a risk to backport. Therefore, this issue raised against ovn23.03 will not get the fix.

Note You need to log in before you can comment on or make changes to this bug.