Bug 2239793 - [cx5 HWOL]The max_tx_rate attribute cannot accurately limit traffic (the tolerance value exceeds 20%)
Summary: [cx5 HWOL]The max_tx_rate attribute cannot accurately limit traffic (the tole...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: openvswitch
Version: RHEL 8.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Amir Tzin (Mellanox)
QA Contact: mhou
URL:
Whiteboard:
Depends On:
Blocks: 2172622
TreeView+ depends on / blocked
 
Reported: 2023-09-20 09:00 UTC by mhou
Modified: 2023-09-21 12:05 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-3188 0 None None None 2023-09-20 09:01:22 UTC

Description mhou 2023-09-20 09:00:30 UTC
Description of problem:
limit the vf_rep rate to 5000Mb/s but run netperf on guest got over 60000Mb/s.
ip link set $pf vf $vf max_tx_rate 5000 

Version-Release number of selected component (if applicable):
Distro: RHEL-8.9.0-updates-20230917.32
ovs:openvswitch3.1-3.1.0-50.el8fdp & 41.el8fdp --failed
openvswitch3.1-3.1.0-61.el8fdp --pass

# ethtool -i ens1f0
driver: mlx5_core
version: 4.18.0-513.2.1.el8_9.x86_64
firmware-version: 16.35.2000 (MT_0000000080)
expansion-rom-version: 
bus-info: 0000:3b:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

How reproducible: 100%


Steps to Reproduce:
1. create vf_rep
nic_name=ens1f0
nic_pci="$(ethtool -i ${nic_name} | sed -n '/bus-info: / s/bus-info: //p')"
echo 0 > /sys/bus/pci/devices/${nic_pci}/sriov_numvfs
devlink dev eswitch set pci/${nic_pci} mode legacy
devlink dev param set pci/${nic_pci} name flow_steering_mode value smfs cmode runtime
devlink dev param show pci/${nic_pci} name flow_steering_mode
ip link set ${nic_name} vf 0 mac 00:de:ad:02:00:01
ip link set ${nic_name} vf 1 mac 00:de:ad:02:00:02
cat > /etc/udev/rules.d/80-persistent-${nic_name}.rules <<-EOF
	SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", KERNELS=="${nic_pci}", NAME="${nic_name}"
EOF
echo 2 > /sys/bus/pci/devices/${nic_pci}/sriov_numvfs
virtfn=$(ls -l /sys/bus/pci/devices/${nic_pci}/ | grep virtfn | sed 's/.*virtfn[0-9]\+ -> ..\/\(.*\)/\1/' | xargs)
for vf in $virtfn
do
	echo "echo $vf > /sys/bus/pci/drivers/mlx5_core/unbind"
	echo $vf > /sys/bus/pci/drivers/mlx5_core/unbind
done
sleep 10
devlink dev eswitch set pci/$nic_pci mode switchdev
sleep 5
phys_switch_id=$(cat /sys/class/net/${nic_name}/phys_switch_id)
for iface in $(ls /sys/class/net/)
do
	[[ "$(cat /sys/class/net/$iface/phys_switch_id 2>/dev/null)" = "$phys_switch_id" ]] && ip link set $iface up
done
# enable tc offloading
ethtool -K ${nic_name} hw-tc-offload on
ethtool -k ${nic_name}
devlink dev eswitch show pci/$nic_pci
lspci | grep -i Mellanox

2. enable offload in ovs and add pf & vf_rep to ovs bridge
systemctl status openvswitch &>/dev/null || systemctl start openvswitch
ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
ovs-vsctl --if-exists del-br ovsbr0
ovs-vsctl add-br ovsbr0
ovs-vsctl add-port ovsbr0 ens1f0
ovs-vsctl add-port ovsbr0 eth0

3. attach vf xml to guest and configure a ip address.
virsh attach-device eth0.xml
virsh console g1
ip addr add 192.168.124.1/24 dev eth0
4. set the vf max_tx_rate to 5000.
ip link set ens1f0 vf 0 max_tx_rate 5000

5. run netperf on guest to peer side
[root@localhost ~]# netperf -H 192.168.124.2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.124.2 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.01    7206.41   

Actual results:
got the throughput result over than 6000Mb/s(5000 * (1 + 0.2))

Beaker job: 
RHEL-8.9.0-updates-20230917.32 openvswitch3.1-3.1.0-50.el8fdp
https://beaker.engineering.redhat.com/recipes/14648208#task166425057 -- failed(got result 7206.41Mb/s )

RHEL-8.9.0-updates-20230917.32 openvswitch3.1-3.1.0-41.el8fdp
https://beaker.engineering.redhat.com/recipes/14648613#task166428239 -- failed(got result  6032.34Mb/s )

Expected results:
QE hopes that the actual test Tolerance will be less than 20%

Additional info:
After internal synchronization of qe, it was found that this may be a unstable issue of rhel + ovs.

When run 23.G, this case can pass.
test info:
kernel-4.18.0-372.70.1.el8_6 + openvswitch3.1-3.1.0-50.el8fdp.x86_64

nic info:
ethtool -i enp4s0f0
driver: mlx5_core
version: 4.18.0-372.70.1.el8_6.x86_64
firmware-version: 16.35.2000 (MT_0000000012)
expansion-rom-version: 
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

beaker job:
https://beaker.engineering.redhat.com/recipes/14446382#task164824831
https://beaker.engineering.redhat.com/recipes/14446382/tasks/164824831/results/770672838/logs/resultoutputfile.log --pass

When run RHEL-8.9.0-updates-20230917.32 + openvswitch3.1-3.1.0-61.el8fdp, this case can also pass
https://beaker.engineering.redhat.com/recipes/14648919#tasks

But run RHEL-8.6.0-updates-20230919.5 + openvswitch3.1-3.1.0-61.el8fdp, this case failed.
https://beaker.engineering.redhat.com/jobs/8334719

Comment 1 Marcelo Ricardo Leitner 2023-09-20 19:14:47 UTC
Hi,

(In reply to mhou from comment #0)
> Description of problem:
> limit the vf_rep rate to 5000Mb/s but run netperf on guest got over
> 60000Mb/s.
> ip link set $pf vf $vf max_tx_rate 5000 

This command is independent from OVS. So we're talking about only driver/NIC here.

> 
> Version-Release number of selected component (if applicable):
> Distro: RHEL-8.9.0-updates-20230917.32
> ovs:openvswitch3.1-3.1.0-50.el8fdp & 41.el8fdp --failed
> openvswitch3.1-3.1.0-61.el8fdp --pass
> 
> # ethtool -i ens1f0
> driver: mlx5_core
> version: 4.18.0-513.2.1.el8_9.x86_64

So this would be a bad kernel.

> firmware-version: 16.35.2000 (MT_0000000080)
> expansion-rom-version: 
> bus-info: 0000:3b:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: no
> supports-register-dump: no
> supports-priv-flags: yes
> 
> How reproducible: 100%
> 
> 
> Steps to Reproduce:
> 1. create vf_rep
> nic_name=ens1f0
> nic_pci="$(ethtool -i ${nic_name} | sed -n '/bus-info: / s/bus-info: //p')"
> echo 0 > /sys/bus/pci/devices/${nic_pci}/sriov_numvfs
> devlink dev eswitch set pci/${nic_pci} mode legacy
> devlink dev param set pci/${nic_pci} name flow_steering_mode value smfs
> cmode runtime
> devlink dev param show pci/${nic_pci} name flow_steering_mode
> ip link set ${nic_name} vf 0 mac 00:de:ad:02:00:01
> ip link set ${nic_name} vf 1 mac 00:de:ad:02:00:02
> cat > /etc/udev/rules.d/80-persistent-${nic_name}.rules <<-EOF
> 	SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", KERNELS=="${nic_pci}",
> NAME="${nic_name}"
> EOF
> echo 2 > /sys/bus/pci/devices/${nic_pci}/sriov_numvfs
> virtfn=$(ls -l /sys/bus/pci/devices/${nic_pci}/ | grep virtfn | sed
> 's/.*virtfn[0-9]\+ -> ..\/\(.*\)/\1/' | xargs)
> for vf in $virtfn
> do
> 	echo "echo $vf > /sys/bus/pci/drivers/mlx5_core/unbind"
> 	echo $vf > /sys/bus/pci/drivers/mlx5_core/unbind
> done
> sleep 10
> devlink dev eswitch set pci/$nic_pci mode switchdev
> sleep 5
> phys_switch_id=$(cat /sys/class/net/${nic_name}/phys_switch_id)
> for iface in $(ls /sys/class/net/)
> do
> 	[[ "$(cat /sys/class/net/$iface/phys_switch_id 2>/dev/null)" =
> "$phys_switch_id" ]] && ip link set $iface up
> done
> # enable tc offloading
> ethtool -K ${nic_name} hw-tc-offload on
> ethtool -k ${nic_name}
> devlink dev eswitch show pci/$nic_pci
> lspci | grep -i Mellanox
> 
> 2. enable offload in ovs and add pf & vf_rep to ovs bridge
> systemctl status openvswitch &>/dev/null || systemctl start openvswitch
> ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> ovs-vsctl --if-exists del-br ovsbr0
> ovs-vsctl add-br ovsbr0
> ovs-vsctl add-port ovsbr0 ens1f0
> ovs-vsctl add-port ovsbr0 eth0
> 
> 3. attach vf xml to guest and configure a ip address.
> virsh attach-device eth0.xml
> virsh console g1
> ip addr add 192.168.124.1/24 dev eth0
> 4. set the vf max_tx_rate to 5000.
> ip link set ens1f0 vf 0 max_tx_rate 5000
> 
> 5. run netperf on guest to peer side
> [root@localhost ~]# netperf -H 192.168.124.2
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.124.2 () port 0 AF_INET
> Recv   Send    Send                          
> Socket Socket  Message  Elapsed              
> Size   Size    Size     Time     Throughput  
> bytes  bytes   bytes    secs.    10^6bits/sec  
> 
>  87380  16384  16384    10.01    7206.41   
> 
> Actual results:
> got the throughput result over than 6000Mb/s(5000 * (1 + 0.2))
> 
> Beaker job: 
> RHEL-8.9.0-updates-20230917.32 openvswitch3.1-3.1.0-50.el8fdp
> https://beaker.engineering.redhat.com/recipes/14648208#task166425057 --
> failed(got result 7206.41Mb/s )
> 
> RHEL-8.9.0-updates-20230917.32 openvswitch3.1-3.1.0-41.el8fdp
> https://beaker.engineering.redhat.com/recipes/14648613#task166428239 --
> failed(got result  6032.34Mb/s )
> 
> Expected results:
> QE hopes that the actual test Tolerance will be less than 20%
> 
> Additional info:
> After internal synchronization of qe, it was found that this may be a
> unstable issue of rhel + ovs.
> 
> When run 23.G, this case can pass.
> test info:
> kernel-4.18.0-372.70.1.el8_6 + openvswitch3.1-3.1.0-50.el8fdp.x86_64
> 
> nic info:
> ethtool -i enp4s0f0
> driver: mlx5_core
> version: 4.18.0-372.70.1.el8_6.x86_64
> firmware-version: 16.35.2000 (MT_0000000012)
> expansion-rom-version: 
> bus-info: 0000:04:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: no
> supports-register-dump: no
> supports-priv-flags: yes
> 
> beaker job:
> https://beaker.engineering.redhat.com/recipes/14446382#task164824831
> https://beaker.engineering.redhat.com/recipes/14446382/tasks/164824831/
> results/770672838/logs/resultoutputfile.log --pass
> 
> When run RHEL-8.9.0-updates-20230917.32 + openvswitch3.1-3.1.0-61.el8fdp,
> this case can also pass
> https://beaker.engineering.redhat.com/recipes/14648919#tasks

But this one is also using kernel-4.18.0-513.2.1.el8_9.
Even OVS version is the same here. What's the difference from this to the bad test case above?

> 
> But run RHEL-8.6.0-updates-20230919.5 + openvswitch3.1-3.1.0-61.el8fdp, this
> case failed.
> https://beaker.engineering.redhat.com/jobs/8334719

Comment 2 mhou 2023-09-21 01:51:43 UTC
(In reply to Marcelo Ricardo Leitner from comment #1)
? But this one is also using kernel-4.18.0-513.2.1.el8_9.
> Even OVS version is the same here. What's the difference from this to the bad test case above?

Let's draw a table to try to explain the current situation.

kernel/ ovs	                4.18.0-513.2.1.el8_9	kernel-4.18.0-372.70.1.el8_6	kernel-4.18.0-372.74.1.el8_6.
openvswitch3.1-3.1.0-41.el8fdp	FAIL		
openvswitch3.1-3.1.0-50.el8fdp	FAIL	                    PASS	
openvswitch3.1-3.1.0-61.el8fdp	PASS		                                              FAIL

Based on the same ovs version (e.g. 50 or 60), we see different results on 8.6.z and 8.9.

What are your thoughts on this and how QE should narrow down the issue?

Comment 3 Marcelo Ricardo Leitner 2023-09-21 12:05:11 UTC
(In reply to mhou from comment #2)
> What are your thoughts on this and how QE should narrow down the issue?

Really need to hear from Nvidia now. This is very weird.
As I said, that command goes directly to the driver/NIC and OVS shouldn't interfere.
Maybe when the driver configures the HW, something is conflicting somehow.

Are you sure the test is conclusive?
It would be nice if you could test 372.70.1 with ovs -61 and 371.74.1 with ovs -50. Trying to understand what went wrong on that update..


Note You need to log in before you can comment on or make changes to this bug.