Bug 2189924
| Summary: | [RFE] Implement zone-limits to be set per port | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Alex Stupnikov <astupnik> |
| Component: | ovn23.09 | Assignee: | Ales Musil <amusil> |
| Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | RHEL 9.0 | CC: | alisci, amusil, ctrautma, gurpsing, ihrachys, jiji, mmichels, skaplons |
| Target Milestone: | --- | Flags: | skaplons:
needinfo-
|
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-10-29 00:13:37 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Alex Stupnikov
2023-04-26 13:03:37 UTC
There is already an effort for the OvS side of things: https://patchwork.ozlabs.org/project/openvswitch/patch/20230330081718.196496-1-naveen.yerramneni@nutanix.com/ The commit message mentions that this extension would be later on used in OVN so it seems there is a community work towards this. I'm sorry I did use this BZ id on other patch series by accident. This is still pending in OvS. After some discussion it seems like the solution should be the following: New option for LS and LR (let's call it "ct-zone-limit") that would apply the limit to both SNAT and DNAT zones in the LR/LS. With addition to the LS which would also apply the limit to all LSP connected to that LS. For this to be more flexible the direct "ct-zone-limit" could be set to specific port which would overwrite the inherited limit from the LS. Can you please confirm that this would be sufficient for OSP needs? Thank you very much for taking this bug Ales. I am not 100% sure if this will be consistent approach that will cover all possible conntrack entities: I know too little about OVN internals. From customer's perspective, a conntrack for LR/LS should be clear. I am not sure how beneficial the work for separate port would be because AFAIU these conntrack entries will contribute to LR/LS number of tracked connections, so it may not worth the effort. If it is impossible to implement this per tenant, then fine, but Neutron people should be asked to confirm that selected approach is fine and they will benefit from it and will be able to adopt changes in OVN and implement related features in Neutron (so setting needinfo for Slawek who helped me with original bug #1399987). P.S. In any case I wanted to ask to consider improving logging, so possible conntrack problems are logged clearly and are easy to troubleshoot... I believe the original request was in scope of security (one tenant affecting another one). If the limit is per-port / router, then neutron would have to somehow translate it into a global limit. (Of course, CMS admin can also set a limit on the number of ports / routers created - and in this way set the upper boundary on the total number of ct entries, but it's cumbersome and doesn't consider the fact that some users may need a single large router while others may need a large number of smaller routers.) I think it's wise to ask Slawek if this per-port limit would cover the per-tenant need. Original u/s RFE for ML2/OVS backend (https://bugs.launchpad.net/neutron/+bug/2020358) was just about separate conntrack limits between tenants. Unfortunately we can't really translate this theoretical "per tenant" setting into "per LS/LR" in OVN. So I think that having separate limit set for each LS/LR in OVN would be enough to address that use case mentioned in the LP bug ("A tenant can cause network issues for other tenants: nf_conntrack: table full, dropping packet."). I don't think we need to have it "per port" (LSP) really. The other thing is (but this is for even more distant future) is to use e.g. QoS rules to set different number of conntrack entries per network or router. If we would allow that at some point, the way how QoS policies are implemented in Neutron means that we would then probably benefit from setting it per port (per LSP/LRP). I suspect the conundrum is that in OVN, there is no such thing as tenants, and zone IDs are not aggregated / shared either. So from OVN perspective, it may be hard or impossible to aggregate these zones for CMS. And on the other hand, CMS wants to set a limit per tenant (= per subset of ports/routers) and not care about per-resource (per-port, per-router) limits - in terms of multitenancy guarantees, it makes no difference for CMS if a single router wastes 1 mln conntrack entries, or there are 1000 routers, 1000 conntrack entries per each, if they belong to the same user. Let me know if this is helpful. v2 is up for review https://patchwork.ozlabs.org/project/ovn/list/?series=408257 tested with following script:
systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:127.0.0.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=127.0.0.1
systemctl restart ovn-controller
ovn-nbctl lr-add lr
ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls ls-lr
ovn-nbctl lsp-set-type ls-lr router
ovn-nbctl lsp-set-addresses ls-lr router
ovn-nbctl lrp-add lr lr-ls 00:00:00:00:00:01 10.0.0.1
ovn-nbctl lsp-add ls lsp
ovn-nbctl lsp-set-addresses lsp "00:00:00:00:00:02 10.0.0.2"
ovn-nbctl lrp-add lr lrp-gw 01:00:00:00:00:01 172.16.0.1
ovn-nbctl lrp-set-gateway-chassis lrp-gw hv1
ovs-vsctl add-port br-int lsp -- set Interface lsp external-ids:iface-id=lsp type=internal
ovn-nbctl --wait=hv sync
ct_zones=$(ovn-appctl -t ovn-controller ct-zone-list)
lr_dnat_num=$(ovn-appctl -t ovn-controller ct-zone-list | grep -w lr_dnat | cut -d ' ' -f 2)
lr_snat_num=$(ovn-appctl -t ovn-controller ct-zone-list | grep -w lr_snat | cut -d ' ' -f 2)
ls_snat_num=$(ovn-appctl -t ovn-controller ct-zone-list | grep -w ls_snat | cut -d ' ' -f 2)
ls_dnat_num=$(ovn-appctl -t ovn-controller ct-zone-list | grep -w ls_dnat | cut -d ' ' -f 2)
lsp_num=$(ovn-appctl -t ovn-controller ct-zone-list | grep -w lsp | cut -d ' ' -f 2)
ovn-nbctl --wait=hv set Logical_Router lr options:ct-zone-limit=-1
ovn-nbctl --wait=hv set Logical_Router lr options:ct-zone-limit=0
ovn-nbctl --wait=hv set Logical_Router lr options:ct-zone-limit=4294967296
ovn-nbctl --wait=hv set Logical_Router lr options:ct-zone-limit=4294967295
ovs-appctl dpctl/ct-get-limits zone=$lr_dnat_num
ovs-appctl dpctl/ct-get-limits zone=$lr_snat_num
ovs-appctl dpctl/ct-get-limits zone=$ls_snat_num
ovs-appctl dpctl/ct-get-limits zone=$ls_dnat_num
ovs-appctl dpctl/ct-get-limits zone=$lsp_num
ovn-nbctl --wait=hv set Logical_Switch ls other_config:ct-zone-limit=-1
ovn-nbctl --wait=hv set Logical_Switch ls other_config:ct-zone-limit=0
ovn-nbctl --wait=hv set Logical_Switch ls other_config:ct-zone-limit=4294967296
ovn-nbctl --wait=hv set Logical_Switch ls other_config:ct-zone-limit=4294967294
ovs-appctl dpctl/ct-get-limits zone=$lr_dnat_num
ovs-appctl dpctl/ct-get-limits zone=$lr_snat_num
ovs-appctl dpctl/ct-get-limits zone=$ls_snat_num
ovs-appctl dpctl/ct-get-limits zone=$ls_dnat_num
ovs-appctl dpctl/ct-get-limits zone=$lsp_num
ovn-nbctl --wait=hv set Logical_Switch_Port lsp options:ct-zone-limit=-1
ovn-nbctl --wait=hv set Logical_Switch_Port lsp options:ct-zone-limit=0
ovn-nbctl --wait=hv set Logical_Switch_Port lsp options:ct-zone-limit=4294967296
ovn-nbctl --wait=hv set Logical_Switch_Port lsp options:ct-zone-limit=4294967293
ovs-appctl dpctl/ct-get-limits zone=$lr_dnat_num
ovs-appctl dpctl/ct-get-limits zone=$lr_snat_num
ovs-appctl dpctl/ct-get-limits zone=$ls_snat_num
ovs-appctl dpctl/ct-get-limits zone=$ls_dnat_num
ovs-appctl dpctl/ct-get-limits zone=$lsp_num
ovn-nbctl --wait=hv remove Logical_Switch ls other_config ct-zone-limit
ovn-nbctl --wait=hv remove logical_router lr other_config ct-zone-limit
ovn-nbctl --wait=hv remove logical_switch_port lsp other_config ct-zone-limit
ovs-appctl dpctl/ct-get-limits zone=$lr_dnat_num
ovs-appctl dpctl/ct-get-limits zone=$lr_snat_num
ovs-appctl dpctl/ct-get-limits zone=$ls_snat_num
ovs-appctl dpctl/ct-get-limits zone=$ls_dnat_num
ovs-appctl dpctl/ct-get-limits zone=$lsp_num
result on ovn24.09-24.09.0-33.el9:
[root@wsfd-advnetlab18 bz2189924]# rpm -qa | grep -E "openvswitch3.3|ovn24.09"
openvswitch3.3-3.3.0-54.el9fdp.x86_64
python3-openvswitch3.3-3.3.0-54.el9fdp.x86_64
ovn24.09-24.09.0-33.el9fdp.x86_64
ovn24.09-central-24.09.0-33.el9fdp.x86_64
ovn24.09-host-24.09.0-33.el9fdp.x86_64
[root@wsfd-advnetlab18 bz2189924]# bash -x rep.sh
+ systemctl start openvswitch
+ systemctl start ovn-northd
+ ovn-nbctl set-connection ptcp:6641
+ ovn-sbctl set-connection ptcp:6642
+ ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:127.0.0.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=127.0.0.1
+ systemctl restart ovn-controller
+ ovn-nbctl lr-add lr
+ ovn-nbctl ls-add ls
+ ovn-nbctl lsp-add ls ls-lr
+ ovn-nbctl lsp-set-type ls-lr router
+ ovn-nbctl lsp-set-addresses ls-lr router
+ ovn-nbctl lrp-add lr lr-ls 00:00:00:00:00:01 10.0.0.1
+ ovn-nbctl lsp-add ls lsp
+ ovn-nbctl lsp-set-addresses lsp '00:00:00:00:00:02 10.0.0.2'
+ ovn-nbctl lrp-add lr lrp-gw 01:00:00:00:00:01 172.16.0.1
+ ovn-nbctl lrp-set-gateway-chassis lrp-gw hv1
+ ovs-vsctl add-port br-int lsp -- set Interface lsp external-ids:iface-id=lsp type=internal
+ ovn-nbctl --wait=hv sync
++ ovn-appctl -t ovn-controller ct-zone-list
+ ct_zones='lr_dnat 1
lsp 2
ls_dnat 5
cr-lrp-gw 3
lr_snat 4
ls_snat 6'
++ ovn-appctl -t ovn-controller ct-zone-list
++ grep -w lr_dnat
++ cut -d ' ' -f 2
+ lr_dnat_num=1
++ ovn-appctl -t ovn-controller ct-zone-list
++ grep -w lr_snat
++ cut -d ' ' -f 2
+ lr_snat_num=4
++ ovn-appctl -t ovn-controller ct-zone-list
++ grep -w ls_snat
++ cut -d ' ' -f 2
+ ls_snat_num=6
++ ovn-appctl -t ovn-controller ct-zone-list
++ grep -w ls_dnat
++ cut -d ' ' -f 2
+ ls_dnat_num=5
++ ovn-appctl -t ovn-controller ct-zone-list
++ grep -w lsp
++ cut -d ' ' -f 2
+ lsp_num=2
+ ovn-nbctl --wait=hv set Logical_Router lr options:ct-zone-limit=-1
+ ovn-nbctl --wait=hv set Logical_Router lr options:ct-zone-limit=0
+ ovn-nbctl --wait=hv set Logical_Router lr options:ct-zone-limit=4294967296
+ ovn-nbctl --wait=hv set Logical_Router lr options:ct-zone-limit=4294967295
+ ovs-appctl dpctl/ct-get-limits zone=1
default limit=0
zone=1,limit=4294967295,count=0
+ ovs-appctl dpctl/ct-get-limits zone=4
default limit=0
zone=4,limit=4294967295,count=0
+ ovs-appctl dpctl/ct-get-limits zone=6
default limit=0
zone=6,limit=0,count=0
+ ovs-appctl dpctl/ct-get-limits zone=5
default limit=0
zone=5,limit=0,count=0
+ ovs-appctl dpctl/ct-get-limits zone=2
default limit=0
zone=2,limit=4294967293,count=0
+ ovn-nbctl --wait=hv set Logical_Switch ls other_config:ct-zone-limit=-1
+ ovn-nbctl --wait=hv set Logical_Switch ls other_config:ct-zone-limit=0
+ ovn-nbctl --wait=hv set Logical_Switch ls other_config:ct-zone-limit=4294967296
+ ovn-nbctl --wait=hv set Logical_Switch ls other_config:ct-zone-limit=4294967294
+ ovs-appctl dpctl/ct-get-limits zone=1
default limit=0
zone=1,limit=4294967295,count=0
+ ovs-appctl dpctl/ct-get-limits zone=4
default limit=0
zone=4,limit=4294967295,count=0
+ ovs-appctl dpctl/ct-get-limits zone=6
default limit=0
zone=6,limit=4294967294,count=0
+ ovs-appctl dpctl/ct-get-limits zone=5
default limit=0
zone=5,limit=4294967294,count=0
+ ovs-appctl dpctl/ct-get-limits zone=2
default limit=0
zone=2,limit=4294967294,count=0
+ ovn-nbctl --wait=hv set Logical_Switch_Port lsp options:ct-zone-limit=-1
+ ovn-nbctl --wait=hv set Logical_Switch_Port lsp options:ct-zone-limit=0
+ ovn-nbctl --wait=hv set Logical_Switch_Port lsp options:ct-zone-limit=4294967296
+ ovn-nbctl --wait=hv set Logical_Switch_Port lsp options:ct-zone-limit=4294967293
+ ovs-appctl dpctl/ct-get-limits zone=1
default limit=0
zone=1,limit=4294967295,count=0
+ ovs-appctl dpctl/ct-get-limits zone=4
default limit=0
zone=4,limit=4294967295,count=0
+ ovs-appctl dpctl/ct-get-limits zone=6
default limit=0
zone=6,limit=4294967294,count=0
+ ovs-appctl dpctl/ct-get-limits zone=5
default limit=0
zone=5,limit=4294967294,count=0
+ ovs-appctl dpctl/ct-get-limits zone=2
default limit=0
zone=2,limit=4294967293,count=0
+ ovn-nbctl --wait=hv remove Logical_Switch ls other_config ct-zone-limit
+ ovn-nbctl --wait=hv remove logical_router lr other_config ct-zone-limit
ovn-nbctl: Logical_Router does not contain a column whose name matches "other_config"
+ ovn-nbctl --wait=hv remove logical_switch_port lsp other_config ct-zone-limit
ovn-nbctl: Logical_Switch_Port does not contain a column whose name matches "other_config"
+ ovs-appctl dpctl/ct-get-limits zone=1
default limit=0
zone=1,limit=4294967295,count=0
+ ovs-appctl dpctl/ct-get-limits zone=4
default limit=0
zone=4,limit=4294967295,count=0
+ ovs-appctl dpctl/ct-get-limits zone=6
default limit=0
zone=6,limit=0,count=0
+ ovs-appctl dpctl/ct-get-limits zone=5
default limit=0
zone=5,limit=0,count=0
+ ovs-appctl dpctl/ct-get-limits zone=2
default limit=0
zone=2,limit=4294967293,count=0
<=== the new-added ct-zone-limit for ls,lr and lsp take effect
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn24.09 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:8555 |