Bug 2136716

Summary: ovn qos_max_rate can't be set to bigger than 200M when using a 25G mlx5 NIC with localnet port
Product: Red Hat Enterprise Linux Fast Datapath Reporter: LiLiang <liali>
Component: ovn22.03Assignee: Ilya Maximets <i.maximets>
Status: CLOSED ERRATA QA Contact: LiLiang <liali>
Severity: medium Docs Contact:
Priority: medium    
Version: FDP 22.JCC: ctrautma, fleitner, i.maximets, jhsiao, jiji, mmichels, qding, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovn22.03-22.03.0-120.el8fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-12-15 15:26:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description LiLiang 2022-10-21 03:31:42 UTC
Description of problem:

Create a localnet type ovn lsp named ln_pl, map this lsp to a ovs-br, and add a mlx5 25G port to ovs-br, this can provides network connectivity to external.

e.g.
```
# external network
ovs-vsctl add-br ext_net
ip link set ext_net up
ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=external_net:ext_net
ovs-vsctl add-port ext_net $port2
ip link set $port2 up

ovn-nbctl ls-add public
ovn-nbctl lsp-add public ln_p1
ovn-nbctl lsp-set-addresses ln_p1 unknown
ovn-nbctl lsp-set-type ln_p1 localnet
ovn-nbctl lsp-set-options ln_p1 network_name=external_net

ovn-nbctl lrp-add r1 r1_public 40:44:00:00:00:03 172.16.104.1/24
ovn-nbctl lsp-add public public_r1
ovn-nbctl lsp-set-type public_r1 router
ovn-nbctl lsp-set-addresses public_r1 router
ovn-nbctl lsp-set-options public_r1 router-port=r1_public nat-addresses=router
ovn-nbctl lrp-set-gateway-chassis r1_public hv1
```

set ln_pl qos_max_rate to 200M
```
# qos
ovs-vsctl set interface $port2 external-ids:ovn-egress-iface=true
ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_min_rate=100000000
ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_max_rate=200000000
ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_burst=220000000
echo "#### check 100M - 200M"
tc class show dev $port2
tc class show dev $port2|grep rate.*100M.*ceil.*200M || echo "BUG: 100M 200M"
```

But the actually value is 100M

# tc class show dev ens1f1np1
class htb 1:fffe root rate 100Mbit ceil 100Mbit burst 1500b cburst 1500b 
class htb 1:2 parent 1:fffe prio 0 rate 100Mbit ceil 100Mbit burst 27500000b cburst 27500000b 

Version-Release number of selected component (if applicable):
[root@dell-per740-17 qos]# rpm -qa|grep ovn
ovn22.03-22.03.0-106.el9fdp.x86_64
ovn22.03-central-22.03.0-106.el9fdp.x86_64
ovn22.03-host-22.03.0-106.el9fdp.x86_64
[root@dell-per740-17 qos]# rpm -qa|grep openvswitch
openvswitch-selinux-extra-policy-1.0-31.el9fdp.noarch
openvswitch2.17-2.17.0-49.el9fdp.x86_64
python3-openvswitch2.17-2.17.0-49.el9fdp.x86_64

[root@dell-per740-17 qos]# uname -r
5.14.0-70.30.1.el9_0.x86_64
[root@dell-per740-17 qos]# ethtool -i ens1f1np1
driver: mlx5_core
version: 5.14.0-70.30.1.el9_0.x86_64
firmware-version: 16.27.2008 (MT_0000000080)
expansion-rom-version: 
bus-info: 0000:3b:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
[root@dell-per740-17 qos]# lspci -s 0000:3b:00.1
3b:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
[root@dell-per740-17 qos]# lspci -s 0000:3b:00.1 -v
3b:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
	Subsystem: Mellanox Technologies ConnectX®-5 EN network interface card, 10/25GbE dual-port SFP28, PCIe3.0 x8, tall bracket ; MCX512A-ACAT
	Flags: bus master, fast devsel, latency 0, IRQ 149, NUMA node 0, IOMMU group 63
	Memory at 94000000 (64-bit, prefetchable) [size=32M]
	Expansion ROM at 93d00000 [disabled] [size=1M]
	Capabilities: [60] Express Endpoint, MSI 00
	Capabilities: [48] Vital Product Data
	Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
	Capabilities: [c0] Vendor Specific Information: Len=18 <?>
	Capabilities: [40] Power Management version 3
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [180] Single Root I/O Virtualization (SR-IOV)
	Capabilities: [230] Access Control Services
	Kernel driver in use: mlx5_core
	Kernel modules: mlx5_core

How reproducible:
always

Steps to Reproduce:
```
port1=ens1f0np0
port2=ens1f1np1

ip link set $port1 up
ip addr add 177.1.1.1/16 dev $port1 &>/dev/null

systemctl start openvswitch
systemctl start ovn-northd
ovn-sbctl set-connection ptcp:6642
ovn-nbctl set-connection ptcp:6641

ovs-vsctl set Open_vSwitch . external-ids:system-id=hv1
ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=tcp:177.1.1.1:6642
ovs-vsctl set Open_vSwitch . external-ids:ovn-encap-type=geneve
ovs-vsctl set Open_vSwitch . external-ids:ovn-encap-ip=177.1.1.1
systemctl restart ovn-controller


# dhcp options
dhcp_102="$(ovn-nbctl create DHCP_Options cidr=172.16.102.0/24 \
        options="\"server_id\"=\"172.16.102.1\" \"server_mac\"=\"00:de:ad:ff:01:02\" \
        \"lease_time\"=\"3600\" \"router\"=\"172.16.102.1\"")" 

# r1
ovn-nbctl lr-add r1
ovn-nbctl lrp-add r1 r1_s2 00:de:ad:ff:01:02 172.16.102.1/24
ovn-nbctl lrp-add r1 r1_s3 00:de:ad:ff:01:03 172.16.103.1/24

# s2
ovn-nbctl ls-add s2

# s2 - r1
ovn-nbctl lsp-add s2 s2_r1
ovn-nbctl lsp-set-type s2_r1 router
#ovn-nbctl lsp-set-addresses s2_r1 00:de:ad:ff:01:02
ovn-nbctl lsp-set-addresses s2_r1 "00:de:ad:ff:01:02 172.16.102.1"
ovn-nbctl lsp-set-options s2_r1 router-port=r1_s2

# s2 - hv1_vm00_vnet1
ovn-nbctl lsp-add s2 hv1_vm00_vnet1
ovn-nbctl lsp-set-addresses hv1_vm00_vnet1 "00:de:ad:01:00:01 172.16.102.11"
ovn-nbctl lsp-set-dhcpv4-options hv1_vm00_vnet1 $dhcp_102

# s2 - hv1_vm01_vnet1
ovn-nbctl lsp-add s2 hv1_vm01_vnet1
ovn-nbctl lsp-set-addresses hv1_vm01_vnet1 "00:de:ad:01:01:01 172.16.102.12"
ovn-nbctl lsp-set-dhcpv4-options hv1_vm01_vnet1 $dhcp_102

# external network
ovs-vsctl add-br ext_net
ip link set ext_net up
ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=external_net:ext_net
ovs-vsctl add-port ext_net $port2
ip link set $port2 up

ovn-nbctl ls-add public
ovn-nbctl lsp-add public ln_p1
ovn-nbctl lsp-set-addresses ln_p1 unknown
ovn-nbctl lsp-set-type ln_p1 localnet
ovn-nbctl lsp-set-options ln_p1 network_name=external_net

ovn-nbctl lrp-add r1 r1_public 40:44:00:00:00:03 172.16.104.1/24
ovn-nbctl lsp-add public public_r1
ovn-nbctl lsp-set-type public_r1 router
ovn-nbctl lsp-set-addresses public_r1 router
ovn-nbctl lsp-set-options public_r1 router-port=r1_public nat-addresses=router
ovn-nbctl lrp-set-gateway-chassis r1_public hv1

# create virtual vm00
ovs-vsctl add-port br-int hv1_vm00_vnet1 -- set interface hv1_vm00_vnet1 type=internal
ip netns add hv1_vm00_vnet1
ip link set hv1_vm00_vnet1 netns hv1_vm00_vnet1
ip netns exec hv1_vm00_vnet1 ip link set lo up
ip netns exec hv1_vm00_vnet1 ip link set hv1_vm00_vnet1 up
ip netns exec hv1_vm00_vnet1 ip link set hv1_vm00_vnet1 address 00:de:ad:01:00:01
#pkill dhclient
#ip netns exec hv1_vm00_vnet1 dhclient -v hv1_vm00_vnet1
ip netns exec hv1_vm00_vnet1 ip addr add 172.16.102.11/24 dev hv1_vm00_vnet1
ip netns exec hv1_vm00_vnet1 ip route add default via 172.16.102.1 dev hv1_vm00_vnet1
ovs-vsctl set Interface hv1_vm00_vnet1 external_ids:iface-id=hv1_vm00_vnet1

# create virtual vm01
ovs-vsctl add-port br-int hv1_vm01_vnet1 -- set interface hv1_vm01_vnet1 type=internal
ip netns add hv1_vm01_vnet1
ip link set hv1_vm01_vnet1 netns hv1_vm01_vnet1
ip netns exec hv1_vm01_vnet1 ip link set lo up
ip netns exec hv1_vm01_vnet1 ip link set hv1_vm01_vnet1 up
ip netns exec hv1_vm01_vnet1 ip link set hv1_vm01_vnet1 address 00:de:ad:01:01:01
#pkill dhclient
#ip netns exec hv1_vm01_vnet1 dhclient -v hv1_vm01_vnet1
ip netns exec hv1_vm01_vnet1 ip addr add 172.16.102.12/24 dev hv1_vm01_vnet1
ip netns exec hv1_vm01_vnet1 ip route add default via 172.16.102.1 dev hv1_vm01_vnet1
ovs-vsctl set Interface hv1_vm01_vnet1 external_ids:iface-id=hv1_vm01_vnet1

sleep 5
dmesg -C

# qos
ovs-vsctl set interface $port2 external-ids:ovn-egress-iface=true
#ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_min_rate=10000000
#ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_max_rate=20000000
#ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_burst=22000000
#echo "#### check 10M - 20M"
#tc class show dev $port2
#tc class show dev $port2|grep rate.*10M.*ceil.*20M || echo "BUG: 10M 20M"

ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_min_rate=100000000
ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_max_rate=200000000
ovn-nbctl set Logical_Switch_Port ln_p1 options:qos_burst=220000000
echo "#### check 100M - 200M"
tc class show dev $port2
tc class show dev $port2|grep rate.*100M.*ceil.*200M || echo "BUG: 100M 200M"

```

Actual results:
qos_max_rate can't be set to more than 200M

Expected results:


Additional info:

Comment 1 LiLiang 2022-10-21 03:53:42 UTC
This issue happen when using NIC described in comment #0, but doesn't happen when using below NIC:

[root@dell-per740-23 ~]# ethtool -i ens3f0np0
driver: mlx5_core
version: 5.14.0-70.30.1.rt21.102.el9_0.x
firmware-version: 22.34.4000 (MT_0000000359)
expansion-rom-version: 
bus-info: 0000:5e:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
[root@dell-per740-23 ~]# lspci -s 0000:5e:00.0 -v
5e:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
	Subsystem: Mellanox Technologies Device 0016
	Flags: bus master, fast devsel, latency 0, IRQ 89, NUMA node 0, IOMMU group 81
	Memory at bc000000 (64-bit, prefetchable) [size=32M]
	Expansion ROM at b8800000 [disabled] [size=1M]
	Capabilities: [60] Express Endpoint, MSI 00
	Capabilities: [48] Vital Product Data
	Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
	Capabilities: [c0] Vendor Specific Information: Len=18 <?>
	Capabilities: [40] Power Management version 3
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [180] Single Root I/O Virtualization (SR-IOV)
	Capabilities: [1c0] Secondary PCI Express
	Capabilities: [230] Access Control Services
	Capabilities: [320] Lane Margining at the Receiver <?>
	Capabilities: [370] Physical Layer 16.0 GT/s <?>
	Capabilities: [420] Data Link Feature <?>
	Kernel driver in use: mlx5_core
	Kernel modules: mlx5_core

Comment 2 LiLiang 2022-10-21 04:08:45 UTC
Using tc command, I can set NIC qos rate to a more than 200M value.

# tc qdisc show dev ens1f1np1
qdisc htb 1: root refcnt 641 r2q 10 default 0x1 direct_packets_stat 6 direct_qlen 1000

# tc class add dev ens1f1np1 parent root classid 1:fffe htb prio 0 rate 200000000 ceil 3000000000 
# tc class show dev ens1f1np1
class htb 1:fffe root prio 0 rate 200Mbit ceil 3Gbit burst 1600b cburst 1125b

Comment 3 Mark Michelson 2022-10-21 20:13:06 UTC
This is an interesting case. Since you show that the same settings in OVN didn't work with one NIC (NIC A) but did work for another other (NIC B), I immediately thought that this must be a tc issue since you've proven that OVN is capable of configuring a NIC with the desired QoS.

But then you showed how you can get the QOS settings to work properly by using tc directly. So tc is apparently capable of setting QoS properly on NIC A.

OVN always uses the same API calls to set QoS regardless of the NIC, and it worked properly for NIC B. And since this worked properly with NIC B, it should also mean that OVS has no issues getting these values passed to tc. I can only assume then that OVS is passing the values properly to tc with NIC A as well. So this to me still points to something being wrong with tc or the NIC driver itself.

I'm going to pass this issue over to the OVS team to begin with so that they can analyze the OVS code to ensure my assumptions are correct. If they are, then this likely needs to be punted down another layer.

Comment 4 Ilya Maximets 2022-10-21 20:51:49 UTC
OVN is basically using 2 functions to setup QoS:

1. netdev_set_qos(netdev, type, NULL);
2. netdev_set_queue(netdev_phy, sb_info->queue_id, &queue_details);

The first one is to configure the parent QoS class and the second
to configure each queue within that class.

The key point here is a 'NULL' in the first call.  It means that
we have no configuration provided for the class itself.

Since there is no configuration provided, netdev_set_qos() will
try to determine the max-rate on it's own.  The logic for that is
try to get the link speed of the port and use it as a global
max-rate that will cap total of max-rate's of individual queues.
Sounds reasonable.

Here goes the issue.  OVS can't determine the link speed of the
25Gbps link.  There is no support for that.  And it's not a bug.
This may also happen for other reasons or with more exotic link
speeds for which there are no known enumerations.

According to the documentation, OVS is using 100 Mbps instead in
the case where it is unable to determine the speed.

Queue speeds that OVN is trying to set are capped by that value,
since rate of an individual queue can not be larger than a global
rate for the class.

I have a patch in works to increase the default value up to
10Gbps, and we can re-work the link speed detection and add more
enum items for other link speeds.
But that doesn't mean that OVN is configuring QoS correctly.
There are cases where link speed can not actually be determined
and, for example, tap interfaces report speed of 10 Mbps always.
So, OVN will not be able to configure meaningful QoS values for
such ports.

One way to fix will be to get the sum of all the individual queue
rates and use that value as a max-rate for the class.  Maybe
choose a bit higher value to not re-configure the class every time.

I'll move this back to OVN for now.

Will also open an RFE for OVS to better support link speeds,
though this will likely be OVS 3.1 material as it will require
some internal re-work and move to new ethtool APIs.

Comment 5 Ilya Maximets 2022-10-21 21:01:06 UTC
(In reply to Ilya Maximets from comment #4)
> One way to fix will be to get the sum of all the individual queue
> rates and use that value as a max-rate for the class.  Maybe
> choose a bit higher value to not re-configure the class every time.

That will probably not work, because not all the traffic should be
limited.  I suppose, something like UINT64_MAX will be a better
solution.  Or whatever the maximum value tc will accept.

Comment 6 Mark Michelson 2022-10-25 13:09:04 UTC
Based on Ilya's comment, I am re-classifying this issue as being an openvswitch issue.

Comment 7 Ilya Maximets 2022-10-25 13:40:15 UTC
(In reply to Mark Michelson from comment #6)
> Based on Ilya's comment, I am re-classifying this issue as being an
> openvswitch issue.

I would disagree with that.  Yes, there are things we can change in OVS
to make this work, but ultimately OVN doesn't configure QoS correctly
in the first place.

Comment 8 Ilya Maximets 2022-10-25 14:23:02 UTC
I created an RFE for OVS to detect more link speeds here: BZ 2137567

Suggesting to move the current BZ back to OVN, so this bug can be
fixed without waiting for OVS 3.1+.

Comment 9 Eelco Chaudron 2022-10-31 10:03:17 UTC
Moving back to OVN as suggested by Ilya.

Comment 10 Ilya Maximets 2022-11-01 14:05:38 UTC
Posted an OVN fix for review:
  https://patchwork.ozlabs.org/project/ovn/patch/20221101140032.734440-1-i.maximets@ovn.org/

Comment 11 LiLiang 2022-11-02 01:54:35 UTC
llya,

Is (2^32 - 1) * 8 equal to 32Gbps ? 
Is this enough? Because there are 100Gbps NICs.

Liang.

Comment 12 Ilya Maximets 2022-11-02 10:13:59 UTC
(In reply to LiLiang from comment #11)
> Is (2^32 - 1) * 8 equal to 32Gbps ? 
> Is this enough? Because there are 100Gbps NICs.

It's 34 Gbps, but yes, that might not be enough.  Users are typically
not using values that high though.

There is an RFE to start using 64-bit netlink attributes that will
allow configuring higher values: BZ 2137619.

At the same time OVN is currently limited to just 4 Gbps, not
even 34.  See BZ 2139100.

Comment 13 OVN Bot 2022-11-22 19:38:33 UTC
ovn22.03 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2144963
ovn22.06 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2144964
ovn22.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2144965
ovn22.09 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2144966
ovn22.09 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2144967

Comment 14 LiLiang 2022-11-23 03:41:47 UTC
I guess ovn22.03-22.03.0-120.el8fdp has not been built yet because I can't find it at https://download.eng.bos.redhat.com/brewroot/packages/ovn22.03/22.03.0/ ?

Comment 19 errata-xmlrpc 2022-12-15 15:26:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn22.03 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:9059