Description of problem: Create a localnet type ovn lsp named ln_pl, map this lsp to a ovs-br, and add a mlx5 25G port to ovs-br, this can provides network connectivity to external. e.g. ``` # external network ovs-vsctl add-br ext_net ip link set ext_net up ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=external_net:ext_net ovs-vsctl add-port ext_net $port2 ip link set $port2 up ovn-nbctl ls-add public ovn-nbctl lsp-add public ln_p1 ovn-nbctl lsp-set-addresses ln_p1 unknown ovn-nbctl lsp-set-type ln_p1 localnet ovn-nbctl lsp-set-options ln_p1 network_name=external_net ovn-nbctl lrp-add r1 r1_public 40:44:00:00:00:03 172.16.104.1/24 ovn-nbctl lsp-add public public_r1 ovn-nbctl lsp-set-type public_r1 router ovn-nbctl lsp-set-addresses public_r1 router ovn-nbctl lsp-set-options public_r1 router-port=r1_public nat-addresses=router ovn-nbctl lrp-set-gateway-chassis r1_public hv1 ``` set ln_pl qos_max_rate to 200M ``` # qos ovs-vsctl set interface $port2 external-ids:ovn-egress-iface=true ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_min_rate=100000000 ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_max_rate=200000000 ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_burst=220000000 echo "#### check 100M - 200M" tc class show dev $port2 tc class show dev $port2|grep rate.*100M.*ceil.*200M || echo "BUG: 100M 200M" ``` But the actually value is 100M # tc class show dev ens1f1np1 class htb 1:fffe root rate 100Mbit ceil 100Mbit burst 1500b cburst 1500b class htb 1:2 parent 1:fffe prio 0 rate 100Mbit ceil 100Mbit burst 27500000b cburst 27500000b Version-Release number of selected component (if applicable): [root@dell-per740-17 qos]# rpm -qa|grep ovn ovn22.03-22.03.0-106.el9fdp.x86_64 ovn22.03-central-22.03.0-106.el9fdp.x86_64 ovn22.03-host-22.03.0-106.el9fdp.x86_64 [root@dell-per740-17 qos]# rpm -qa|grep openvswitch openvswitch-selinux-extra-policy-1.0-31.el9fdp.noarch openvswitch2.17-2.17.0-49.el9fdp.x86_64 python3-openvswitch2.17-2.17.0-49.el9fdp.x86_64 [root@dell-per740-17 qos]# uname -r 5.14.0-70.30.1.el9_0.x86_64 [root@dell-per740-17 qos]# ethtool -i ens1f1np1 driver: mlx5_core version: 5.14.0-70.30.1.el9_0.x86_64 firmware-version: 16.27.2008 (MT_0000000080) expansion-rom-version: bus-info: 0000:3b:00.1 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes [root@dell-per740-17 qos]# lspci -s 0000:3b:00.1 3b:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] [root@dell-per740-17 qos]# lspci -s 0000:3b:00.1 -v 3b:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] Subsystem: Mellanox Technologies ConnectX®-5 EN network interface card, 10/25GbE dual-port SFP28, PCIe3.0 x8, tall bracket ; MCX512A-ACAT Flags: bus master, fast devsel, latency 0, IRQ 149, NUMA node 0, IOMMU group 63 Memory at 94000000 (64-bit, prefetchable) [size=32M] Expansion ROM at 93d00000 [disabled] [size=1M] Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [48] Vital Product Data Capabilities: [9c] MSI-X: Enable+ Count=64 Masked- Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [40] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [150] Alternative Routing-ID Interpretation (ARI) Capabilities: [180] Single Root I/O Virtualization (SR-IOV) Capabilities: [230] Access Control Services Kernel driver in use: mlx5_core Kernel modules: mlx5_core How reproducible: always Steps to Reproduce: ``` port1=ens1f0np0 port2=ens1f1np1 ip link set $port1 up ip addr add 177.1.1.1/16 dev $port1 &>/dev/null systemctl start openvswitch systemctl start ovn-northd ovn-sbctl set-connection ptcp:6642 ovn-nbctl set-connection ptcp:6641 ovs-vsctl set Open_vSwitch . external-ids:system-id=hv1 ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=tcp:177.1.1.1:6642 ovs-vsctl set Open_vSwitch . external-ids:ovn-encap-type=geneve ovs-vsctl set Open_vSwitch . external-ids:ovn-encap-ip=177.1.1.1 systemctl restart ovn-controller # dhcp options dhcp_102="$(ovn-nbctl create DHCP_Options cidr=172.16.102.0/24 \ options="\"server_id\"=\"172.16.102.1\" \"server_mac\"=\"00:de:ad:ff:01:02\" \ \"lease_time\"=\"3600\" \"router\"=\"172.16.102.1\"")" # r1 ovn-nbctl lr-add r1 ovn-nbctl lrp-add r1 r1_s2 00:de:ad:ff:01:02 172.16.102.1/24 ovn-nbctl lrp-add r1 r1_s3 00:de:ad:ff:01:03 172.16.103.1/24 # s2 ovn-nbctl ls-add s2 # s2 - r1 ovn-nbctl lsp-add s2 s2_r1 ovn-nbctl lsp-set-type s2_r1 router #ovn-nbctl lsp-set-addresses s2_r1 00:de:ad:ff:01:02 ovn-nbctl lsp-set-addresses s2_r1 "00:de:ad:ff:01:02 172.16.102.1" ovn-nbctl lsp-set-options s2_r1 router-port=r1_s2 # s2 - hv1_vm00_vnet1 ovn-nbctl lsp-add s2 hv1_vm00_vnet1 ovn-nbctl lsp-set-addresses hv1_vm00_vnet1 "00:de:ad:01:00:01 172.16.102.11" ovn-nbctl lsp-set-dhcpv4-options hv1_vm00_vnet1 $dhcp_102 # s2 - hv1_vm01_vnet1 ovn-nbctl lsp-add s2 hv1_vm01_vnet1 ovn-nbctl lsp-set-addresses hv1_vm01_vnet1 "00:de:ad:01:01:01 172.16.102.12" ovn-nbctl lsp-set-dhcpv4-options hv1_vm01_vnet1 $dhcp_102 # external network ovs-vsctl add-br ext_net ip link set ext_net up ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=external_net:ext_net ovs-vsctl add-port ext_net $port2 ip link set $port2 up ovn-nbctl ls-add public ovn-nbctl lsp-add public ln_p1 ovn-nbctl lsp-set-addresses ln_p1 unknown ovn-nbctl lsp-set-type ln_p1 localnet ovn-nbctl lsp-set-options ln_p1 network_name=external_net ovn-nbctl lrp-add r1 r1_public 40:44:00:00:00:03 172.16.104.1/24 ovn-nbctl lsp-add public public_r1 ovn-nbctl lsp-set-type public_r1 router ovn-nbctl lsp-set-addresses public_r1 router ovn-nbctl lsp-set-options public_r1 router-port=r1_public nat-addresses=router ovn-nbctl lrp-set-gateway-chassis r1_public hv1 # create virtual vm00 ovs-vsctl add-port br-int hv1_vm00_vnet1 -- set interface hv1_vm00_vnet1 type=internal ip netns add hv1_vm00_vnet1 ip link set hv1_vm00_vnet1 netns hv1_vm00_vnet1 ip netns exec hv1_vm00_vnet1 ip link set lo up ip netns exec hv1_vm00_vnet1 ip link set hv1_vm00_vnet1 up ip netns exec hv1_vm00_vnet1 ip link set hv1_vm00_vnet1 address 00:de:ad:01:00:01 #pkill dhclient #ip netns exec hv1_vm00_vnet1 dhclient -v hv1_vm00_vnet1 ip netns exec hv1_vm00_vnet1 ip addr add 172.16.102.11/24 dev hv1_vm00_vnet1 ip netns exec hv1_vm00_vnet1 ip route add default via 172.16.102.1 dev hv1_vm00_vnet1 ovs-vsctl set Interface hv1_vm00_vnet1 external_ids:iface-id=hv1_vm00_vnet1 # create virtual vm01 ovs-vsctl add-port br-int hv1_vm01_vnet1 -- set interface hv1_vm01_vnet1 type=internal ip netns add hv1_vm01_vnet1 ip link set hv1_vm01_vnet1 netns hv1_vm01_vnet1 ip netns exec hv1_vm01_vnet1 ip link set lo up ip netns exec hv1_vm01_vnet1 ip link set hv1_vm01_vnet1 up ip netns exec hv1_vm01_vnet1 ip link set hv1_vm01_vnet1 address 00:de:ad:01:01:01 #pkill dhclient #ip netns exec hv1_vm01_vnet1 dhclient -v hv1_vm01_vnet1 ip netns exec hv1_vm01_vnet1 ip addr add 172.16.102.12/24 dev hv1_vm01_vnet1 ip netns exec hv1_vm01_vnet1 ip route add default via 172.16.102.1 dev hv1_vm01_vnet1 ovs-vsctl set Interface hv1_vm01_vnet1 external_ids:iface-id=hv1_vm01_vnet1 sleep 5 dmesg -C # qos ovs-vsctl set interface $port2 external-ids:ovn-egress-iface=true #ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_min_rate=10000000 #ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_max_rate=20000000 #ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_burst=22000000 #echo "#### check 10M - 20M" #tc class show dev $port2 #tc class show dev $port2|grep rate.*10M.*ceil.*20M || echo "BUG: 10M 20M" ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_min_rate=100000000 ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_max_rate=200000000 ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_burst=220000000 echo "#### check 100M - 200M" tc class show dev $port2 tc class show dev $port2|grep rate.*100M.*ceil.*200M || echo "BUG: 100M 200M" ``` Actual results: qos_max_rate can't be set to more than 200M Expected results: Additional info:
This issue happen when using NIC described in comment #0, but doesn't happen when using below NIC: [root@dell-per740-23 ~]# ethtool -i ens3f0np0 driver: mlx5_core version: 5.14.0-70.30.1.rt21.102.el9_0.x firmware-version: 22.34.4000 (MT_0000000359) expansion-rom-version: bus-info: 0000:5e:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes [root@dell-per740-23 ~]# lspci -s 0000:5e:00.0 -v 5e:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] Subsystem: Mellanox Technologies Device 0016 Flags: bus master, fast devsel, latency 0, IRQ 89, NUMA node 0, IOMMU group 81 Memory at bc000000 (64-bit, prefetchable) [size=32M] Expansion ROM at b8800000 [disabled] [size=1M] Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [48] Vital Product Data Capabilities: [9c] MSI-X: Enable+ Count=64 Masked- Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [40] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [150] Alternative Routing-ID Interpretation (ARI) Capabilities: [180] Single Root I/O Virtualization (SR-IOV) Capabilities: [1c0] Secondary PCI Express Capabilities: [230] Access Control Services Capabilities: [320] Lane Margining at the Receiver <?> Capabilities: [370] Physical Layer 16.0 GT/s <?> Capabilities: [420] Data Link Feature <?> Kernel driver in use: mlx5_core Kernel modules: mlx5_core
Using tc command, I can set NIC qos rate to a more than 200M value. # tc qdisc show dev ens1f1np1 qdisc htb 1: root refcnt 641 r2q 10 default 0x1 direct_packets_stat 6 direct_qlen 1000 # tc class add dev ens1f1np1 parent root classid 1:fffe htb prio 0 rate 200000000 ceil 3000000000 # tc class show dev ens1f1np1 class htb 1:fffe root prio 0 rate 200Mbit ceil 3Gbit burst 1600b cburst 1125b
This is an interesting case. Since you show that the same settings in OVN didn't work with one NIC (NIC A) but did work for another other (NIC B), I immediately thought that this must be a tc issue since you've proven that OVN is capable of configuring a NIC with the desired QoS. But then you showed how you can get the QOS settings to work properly by using tc directly. So tc is apparently capable of setting QoS properly on NIC A. OVN always uses the same API calls to set QoS regardless of the NIC, and it worked properly for NIC B. And since this worked properly with NIC B, it should also mean that OVS has no issues getting these values passed to tc. I can only assume then that OVS is passing the values properly to tc with NIC A as well. So this to me still points to something being wrong with tc or the NIC driver itself. I'm going to pass this issue over to the OVS team to begin with so that they can analyze the OVS code to ensure my assumptions are correct. If they are, then this likely needs to be punted down another layer.
OVN is basically using 2 functions to setup QoS: 1. netdev_set_qos(netdev, type, NULL); 2. netdev_set_queue(netdev_phy, sb_info->queue_id, &queue_details); The first one is to configure the parent QoS class and the second to configure each queue within that class. The key point here is a 'NULL' in the first call. It means that we have no configuration provided for the class itself. Since there is no configuration provided, netdev_set_qos() will try to determine the max-rate on it's own. The logic for that is try to get the link speed of the port and use it as a global max-rate that will cap total of max-rate's of individual queues. Sounds reasonable. Here goes the issue. OVS can't determine the link speed of the 25Gbps link. There is no support for that. And it's not a bug. This may also happen for other reasons or with more exotic link speeds for which there are no known enumerations. According to the documentation, OVS is using 100 Mbps instead in the case where it is unable to determine the speed. Queue speeds that OVN is trying to set are capped by that value, since rate of an individual queue can not be larger than a global rate for the class. I have a patch in works to increase the default value up to 10Gbps, and we can re-work the link speed detection and add more enum items for other link speeds. But that doesn't mean that OVN is configuring QoS correctly. There are cases where link speed can not actually be determined and, for example, tap interfaces report speed of 10 Mbps always. So, OVN will not be able to configure meaningful QoS values for such ports. One way to fix will be to get the sum of all the individual queue rates and use that value as a max-rate for the class. Maybe choose a bit higher value to not re-configure the class every time. I'll move this back to OVN for now. Will also open an RFE for OVS to better support link speeds, though this will likely be OVS 3.1 material as it will require some internal re-work and move to new ethtool APIs.
(In reply to Ilya Maximets from comment #4) > One way to fix will be to get the sum of all the individual queue > rates and use that value as a max-rate for the class. Maybe > choose a bit higher value to not re-configure the class every time. That will probably not work, because not all the traffic should be limited. I suppose, something like UINT64_MAX will be a better solution. Or whatever the maximum value tc will accept.
Based on Ilya's comment, I am re-classifying this issue as being an openvswitch issue.
(In reply to Mark Michelson from comment #6) > Based on Ilya's comment, I am re-classifying this issue as being an > openvswitch issue. I would disagree with that. Yes, there are things we can change in OVS to make this work, but ultimately OVN doesn't configure QoS correctly in the first place.
I created an RFE for OVS to detect more link speeds here: BZ 2137567 Suggesting to move the current BZ back to OVN, so this bug can be fixed without waiting for OVS 3.1+.
Moving back to OVN as suggested by Ilya.
Posted an OVN fix for review: https://patchwork.ozlabs.org/project/ovn/patch/20221101140032.734440-1-i.maximets@ovn.org/
llya, Is (2^32 - 1) * 8 equal to 32Gbps ? Is this enough? Because there are 100Gbps NICs. Liang.
(In reply to LiLiang from comment #11) > Is (2^32 - 1) * 8 equal to 32Gbps ? > Is this enough? Because there are 100Gbps NICs. It's 34 Gbps, but yes, that might not be enough. Users are typically not using values that high though. There is an RFE to start using 64-bit netlink attributes that will allow configuring higher values: BZ 2137619. At the same time OVN is currently limited to just 4 Gbps, not even 34. See BZ 2139100.
ovn22.03 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2144963 ovn22.06 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2144964 ovn22.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2144965 ovn22.09 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2144966 ovn22.09 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2144967
I guess ovn22.03-22.03.0-120.el8fdp has not been built yet because I can't find it at https://download.eng.bos.redhat.com/brewroot/packages/ovn22.03/22.03.0/ ?
verified https://beaker.engineering.redhat.com/recipes/13040241#tasks
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn22.03 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:9059