The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 2189924 - [RFE] Implement zone-limits to be set per port
Summary: [RFE] Implement zone-limits to be set per port
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn23.09
Version: RHEL 9.0
Hardware: All
OS: All
medium
high
Target Milestone: ---
: ---
Assignee: Ales Musil
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-26 13:03 UTC by Alex Stupnikov
Modified: 2024-10-29 00:13 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-10-29 00:13:37 UTC
Target Upstream Version:
Embargoed:
skaplons: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-2835 0 None None None 2023-04-26 13:05:13 UTC
Red Hat Product Errata RHBA-2024:8555 0 None None None 2024-10-29 00:13:39 UTC

Description Alex Stupnikov 2023-04-26 13:03:37 UTC
Description of problem:

OpenStack Neutron could have enormous amount of entities and connections to track. There is a long-standing RFE bug #1399987 requesting separate connection tracking for each tenant (group of VMs/router/networks/ports/etc).

Originally bug #1399987 was reported for ML2/OVS plugin and it depended on iptables feature. It was implemented recently, but ML2/OVS plugin itself is going to be deprecated quite soon. So now same RFE should be implemented for ML2/OVN plugin.

Ihar helped me to understand current status. It looks like in OVS, there's ct-set-limits and ct-del-limits CLI commands for dpctl tool, but there is nothing similar hooked into OVN code base (nor anything relevant pops up in its documentation).

https://bugzilla.redhat.com/show_bug.cgi?id=1399987#c39

Comment 1 Ales Musil 2023-04-26 13:57:50 UTC
There is already an effort for the OvS side of things: https://patchwork.ozlabs.org/project/openvswitch/patch/20230330081718.196496-1-naveen.yerramneni@nutanix.com/
The commit message mentions that this extension would be later on used in OVN so it seems there is a community work towards this.

Comment 2 Ales Musil 2023-08-28 04:23:40 UTC
I'm sorry I did use this BZ id on other patch series by accident. This is still pending in OvS.

Comment 3 Ales Musil 2024-01-05 11:37:00 UTC
After some discussion it seems like the solution should be the following: 

New option for LS and LR (let's call it "ct-zone-limit") that would apply the limit to both SNAT and DNAT zones in the LR/LS. With addition to the LS which would also apply the limit to all LSP connected to that LS. For this to be more flexible the direct "ct-zone-limit" could be set to specific port which would overwrite the inherited limit from the LS. 

Can you please confirm that this would be sufficient for OSP needs?

Comment 4 Alex Stupnikov 2024-01-05 17:31:42 UTC
Thank you very much for taking this bug Ales.

I am not 100% sure if this will be consistent approach that will cover all possible conntrack entities: I know too little about OVN internals. From customer's perspective, a conntrack for LR/LS should be clear. I am not sure how beneficial the work for separate port would be because AFAIU these conntrack entries will contribute to LR/LS number of tracked connections, so it may not worth the effort.

If it is impossible to implement this per tenant, then fine, but Neutron people should be asked to confirm that selected approach is fine and they will benefit from it and will be able to adopt changes in OVN and implement related features in Neutron (so setting needinfo for Slawek who helped me with original bug #1399987).

P.S. In any case I wanted to ask to consider improving logging, so possible conntrack problems are logged clearly and are easy to troubleshoot...

Comment 5 Ihar Hrachyshka 2024-01-05 17:49:10 UTC
I believe the original request was in scope of security (one tenant affecting another one). If the limit is per-port / router, then neutron would have to somehow translate it into a global limit. (Of course, CMS admin can also set a limit on the number of ports / routers created - and in this way set the upper boundary on the total number of ct entries, but it's cumbersome and doesn't consider the fact that some users may need a single large router while others may need a large number of smaller routers.)

I think it's wise to ask Slawek if this per-port limit would cover the per-tenant need.

Comment 6 Slawek Kaplonski 2024-01-11 15:28:57 UTC
Original u/s RFE for ML2/OVS backend (https://bugs.launchpad.net/neutron/+bug/2020358) was just about separate conntrack limits between tenants. Unfortunately we can't really translate this theoretical "per tenant" setting into "per LS/LR" in OVN. So I think that having separate limit set for each LS/LR in OVN would be enough to address that use case mentioned in the LP bug ("A tenant can cause network issues for other tenants: nf_conntrack: table full, dropping packet."). I don't think we need to have it "per port" (LSP) really.

The other thing is (but this is for even more distant future) is to use e.g. QoS rules to set different number of conntrack entries per network or router. If we would allow that at some point, the way how QoS policies are implemented in Neutron means that we would then probably benefit from setting it per port (per LSP/LRP).

Comment 7 Ihar Hrachyshka 2024-01-16 16:01:48 UTC
I suspect the conundrum is that in OVN, there is no such thing as tenants, and zone IDs are not aggregated / shared either. So from OVN perspective, it may be hard or impossible to aggregate these zones for CMS. And on the other hand, CMS wants to set a limit per tenant (= per subset of ports/routers) and not care about per-resource (per-port, per-router) limits - in terms of multitenancy guarantees, it makes no difference for CMS if a single router wastes 1 mln conntrack entries, or there are 1000 routers, 1000 conntrack entries per each, if they belong to the same user.

Let me know if this is helpful.

Comment 12 Ales Musil 2024-05-29 10:26:23 UTC
v2 is up for review https://patchwork.ozlabs.org/project/ovn/list/?series=408257

Comment 15 Jianlin Shi 2024-10-17 03:21:19 UTC
tested with following script:
systemctl start openvswitch
systemctl start ovn-northd 
ovn-nbctl set-connection ptcp:6641                                       
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:127.0.0.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=127.0.0.1
systemctl restart ovn-controller
                                                                                                
ovn-nbctl lr-add lr

ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls ls-lr
ovn-nbctl lsp-set-type ls-lr router
ovn-nbctl lsp-set-addresses ls-lr router
ovn-nbctl lrp-add lr lr-ls 00:00:00:00:00:01 10.0.0.1

ovn-nbctl lsp-add ls lsp
ovn-nbctl lsp-set-addresses lsp "00:00:00:00:00:02 10.0.0.2"

ovn-nbctl lrp-add lr lrp-gw 01:00:00:00:00:01 172.16.0.1
ovn-nbctl lrp-set-gateway-chassis lrp-gw hv1
ovs-vsctl add-port br-int lsp -- set Interface lsp external-ids:iface-id=lsp type=internal

ovn-nbctl --wait=hv sync

ct_zones=$(ovn-appctl -t ovn-controller ct-zone-list)
lr_dnat_num=$(ovn-appctl -t ovn-controller ct-zone-list | grep -w lr_dnat | cut -d ' ' -f 2)
lr_snat_num=$(ovn-appctl -t ovn-controller ct-zone-list | grep -w lr_snat | cut -d ' ' -f 2)
ls_snat_num=$(ovn-appctl -t ovn-controller ct-zone-list | grep -w ls_snat | cut -d ' ' -f 2)
ls_dnat_num=$(ovn-appctl -t ovn-controller ct-zone-list | grep -w ls_dnat | cut -d ' ' -f 2)
lsp_num=$(ovn-appctl -t ovn-controller ct-zone-list | grep -w lsp | cut -d ' ' -f 2)

ovn-nbctl --wait=hv set Logical_Router lr options:ct-zone-limit=-1
ovn-nbctl --wait=hv set Logical_Router lr options:ct-zone-limit=0
ovn-nbctl --wait=hv set Logical_Router lr options:ct-zone-limit=4294967296
ovn-nbctl --wait=hv set Logical_Router lr options:ct-zone-limit=4294967295
ovs-appctl dpctl/ct-get-limits zone=$lr_dnat_num
ovs-appctl dpctl/ct-get-limits zone=$lr_snat_num
ovs-appctl dpctl/ct-get-limits zone=$ls_snat_num
ovs-appctl dpctl/ct-get-limits zone=$ls_dnat_num
ovs-appctl dpctl/ct-get-limits zone=$lsp_num

ovn-nbctl --wait=hv set Logical_Switch ls other_config:ct-zone-limit=-1
ovn-nbctl --wait=hv set Logical_Switch ls other_config:ct-zone-limit=0
ovn-nbctl --wait=hv set Logical_Switch ls other_config:ct-zone-limit=4294967296
ovn-nbctl --wait=hv set Logical_Switch ls other_config:ct-zone-limit=4294967294
ovs-appctl dpctl/ct-get-limits zone=$lr_dnat_num
ovs-appctl dpctl/ct-get-limits zone=$lr_snat_num
ovs-appctl dpctl/ct-get-limits zone=$ls_snat_num
ovs-appctl dpctl/ct-get-limits zone=$ls_dnat_num
ovs-appctl dpctl/ct-get-limits zone=$lsp_num

ovn-nbctl --wait=hv set Logical_Switch_Port lsp options:ct-zone-limit=-1
ovn-nbctl --wait=hv set Logical_Switch_Port lsp options:ct-zone-limit=0
ovn-nbctl --wait=hv set Logical_Switch_Port lsp options:ct-zone-limit=4294967296
ovn-nbctl --wait=hv set Logical_Switch_Port lsp options:ct-zone-limit=4294967293
ovs-appctl dpctl/ct-get-limits zone=$lr_dnat_num
ovs-appctl dpctl/ct-get-limits zone=$lr_snat_num
ovs-appctl dpctl/ct-get-limits zone=$ls_snat_num
ovs-appctl dpctl/ct-get-limits zone=$ls_dnat_num
ovs-appctl dpctl/ct-get-limits zone=$lsp_num

ovn-nbctl --wait=hv remove Logical_Switch ls other_config ct-zone-limit
ovn-nbctl --wait=hv remove logical_router lr other_config ct-zone-limit
ovn-nbctl --wait=hv remove logical_switch_port lsp other_config ct-zone-limit
ovs-appctl dpctl/ct-get-limits zone=$lr_dnat_num
ovs-appctl dpctl/ct-get-limits zone=$lr_snat_num
ovs-appctl dpctl/ct-get-limits zone=$ls_snat_num
ovs-appctl dpctl/ct-get-limits zone=$ls_dnat_num
ovs-appctl dpctl/ct-get-limits zone=$lsp_num

result on ovn24.09-24.09.0-33.el9:

[root@wsfd-advnetlab18 bz2189924]# rpm -qa | grep -E "openvswitch3.3|ovn24.09"
openvswitch3.3-3.3.0-54.el9fdp.x86_64
python3-openvswitch3.3-3.3.0-54.el9fdp.x86_64
ovn24.09-24.09.0-33.el9fdp.x86_64
ovn24.09-central-24.09.0-33.el9fdp.x86_64
ovn24.09-host-24.09.0-33.el9fdp.x86_64

[root@wsfd-advnetlab18 bz2189924]# bash -x rep.sh 
+ systemctl start openvswitch
+ systemctl start ovn-northd
+ ovn-nbctl set-connection ptcp:6641
+ ovn-sbctl set-connection ptcp:6642
+ ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:127.0.0.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=127.0.0.1
+ systemctl restart ovn-controller
+ ovn-nbctl lr-add lr
+ ovn-nbctl ls-add ls
+ ovn-nbctl lsp-add ls ls-lr
+ ovn-nbctl lsp-set-type ls-lr router
+ ovn-nbctl lsp-set-addresses ls-lr router
+ ovn-nbctl lrp-add lr lr-ls 00:00:00:00:00:01 10.0.0.1
+ ovn-nbctl lsp-add ls lsp
+ ovn-nbctl lsp-set-addresses lsp '00:00:00:00:00:02 10.0.0.2'
+ ovn-nbctl lrp-add lr lrp-gw 01:00:00:00:00:01 172.16.0.1
+ ovn-nbctl lrp-set-gateway-chassis lrp-gw hv1
+ ovs-vsctl add-port br-int lsp -- set Interface lsp external-ids:iface-id=lsp type=internal
+ ovn-nbctl --wait=hv sync
++ ovn-appctl -t ovn-controller ct-zone-list
+ ct_zones='lr_dnat 1
lsp 2
ls_dnat 5
cr-lrp-gw 3
lr_snat 4
ls_snat 6'
++ ovn-appctl -t ovn-controller ct-zone-list
++ grep -w lr_dnat
++ cut -d ' ' -f 2
+ lr_dnat_num=1
++ ovn-appctl -t ovn-controller ct-zone-list
++ grep -w lr_snat
++ cut -d ' ' -f 2
+ lr_snat_num=4
++ ovn-appctl -t ovn-controller ct-zone-list
++ grep -w ls_snat
++ cut -d ' ' -f 2
+ ls_snat_num=6
++ ovn-appctl -t ovn-controller ct-zone-list
++ grep -w ls_dnat
++ cut -d ' ' -f 2
+ ls_dnat_num=5
++ ovn-appctl -t ovn-controller ct-zone-list
++ grep -w lsp
++ cut -d ' ' -f 2
+ lsp_num=2
+ ovn-nbctl --wait=hv set Logical_Router lr options:ct-zone-limit=-1
+ ovn-nbctl --wait=hv set Logical_Router lr options:ct-zone-limit=0
+ ovn-nbctl --wait=hv set Logical_Router lr options:ct-zone-limit=4294967296
+ ovn-nbctl --wait=hv set Logical_Router lr options:ct-zone-limit=4294967295
+ ovs-appctl dpctl/ct-get-limits zone=1
default limit=0
zone=1,limit=4294967295,count=0
+ ovs-appctl dpctl/ct-get-limits zone=4
default limit=0
zone=4,limit=4294967295,count=0
+ ovs-appctl dpctl/ct-get-limits zone=6
default limit=0
zone=6,limit=0,count=0
+ ovs-appctl dpctl/ct-get-limits zone=5
default limit=0
zone=5,limit=0,count=0
+ ovs-appctl dpctl/ct-get-limits zone=2
default limit=0
zone=2,limit=4294967293,count=0
+ ovn-nbctl --wait=hv set Logical_Switch ls other_config:ct-zone-limit=-1
+ ovn-nbctl --wait=hv set Logical_Switch ls other_config:ct-zone-limit=0
+ ovn-nbctl --wait=hv set Logical_Switch ls other_config:ct-zone-limit=4294967296
+ ovn-nbctl --wait=hv set Logical_Switch ls other_config:ct-zone-limit=4294967294
+ ovs-appctl dpctl/ct-get-limits zone=1
default limit=0
zone=1,limit=4294967295,count=0
+ ovs-appctl dpctl/ct-get-limits zone=4
default limit=0
zone=4,limit=4294967295,count=0
+ ovs-appctl dpctl/ct-get-limits zone=6
default limit=0
zone=6,limit=4294967294,count=0
+ ovs-appctl dpctl/ct-get-limits zone=5
default limit=0
zone=5,limit=4294967294,count=0
+ ovs-appctl dpctl/ct-get-limits zone=2
default limit=0
zone=2,limit=4294967294,count=0
+ ovn-nbctl --wait=hv set Logical_Switch_Port lsp options:ct-zone-limit=-1
+ ovn-nbctl --wait=hv set Logical_Switch_Port lsp options:ct-zone-limit=0
+ ovn-nbctl --wait=hv set Logical_Switch_Port lsp options:ct-zone-limit=4294967296
+ ovn-nbctl --wait=hv set Logical_Switch_Port lsp options:ct-zone-limit=4294967293
+ ovs-appctl dpctl/ct-get-limits zone=1
default limit=0
zone=1,limit=4294967295,count=0
+ ovs-appctl dpctl/ct-get-limits zone=4
default limit=0
zone=4,limit=4294967295,count=0
+ ovs-appctl dpctl/ct-get-limits zone=6
default limit=0
zone=6,limit=4294967294,count=0
+ ovs-appctl dpctl/ct-get-limits zone=5
default limit=0
zone=5,limit=4294967294,count=0
+ ovs-appctl dpctl/ct-get-limits zone=2
default limit=0
zone=2,limit=4294967293,count=0
+ ovn-nbctl --wait=hv remove Logical_Switch ls other_config ct-zone-limit
+ ovn-nbctl --wait=hv remove logical_router lr other_config ct-zone-limit
ovn-nbctl: Logical_Router does not contain a column whose name matches "other_config"
+ ovn-nbctl --wait=hv remove logical_switch_port lsp other_config ct-zone-limit
ovn-nbctl: Logical_Switch_Port does not contain a column whose name matches "other_config"
+ ovs-appctl dpctl/ct-get-limits zone=1
default limit=0
zone=1,limit=4294967295,count=0
+ ovs-appctl dpctl/ct-get-limits zone=4
default limit=0
zone=4,limit=4294967295,count=0
+ ovs-appctl dpctl/ct-get-limits zone=6
default limit=0
zone=6,limit=0,count=0
+ ovs-appctl dpctl/ct-get-limits zone=5
default limit=0
zone=5,limit=0,count=0
+ ovs-appctl dpctl/ct-get-limits zone=2
default limit=0
zone=2,limit=4294967293,count=0

<=== the new-added ct-zone-limit for ls,lr and lsp take effect

Comment 17 errata-xmlrpc 2024-10-29 00:13:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn24.09 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:8555


Note You need to log in before you can comment on or make changes to this bug.