Description of problem: limit the vf_rep rate to 5000Mb/s but run netperf on guest got over 60000Mb/s. ip link set $pf vf $vf max_tx_rate 5000 Version-Release number of selected component (if applicable): Distro: RHEL-8.9.0-updates-20230917.32 ovs:openvswitch3.1-3.1.0-50.el8fdp & 41.el8fdp --failed openvswitch3.1-3.1.0-61.el8fdp --pass # ethtool -i ens1f0 driver: mlx5_core version: 4.18.0-513.2.1.el8_9.x86_64 firmware-version: 16.35.2000 (MT_0000000080) expansion-rom-version: bus-info: 0000:3b:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes How reproducible: 100% Steps to Reproduce: 1. create vf_rep nic_name=ens1f0 nic_pci="$(ethtool -i ${nic_name} | sed -n '/bus-info: / s/bus-info: //p')" echo 0 > /sys/bus/pci/devices/${nic_pci}/sriov_numvfs devlink dev eswitch set pci/${nic_pci} mode legacy devlink dev param set pci/${nic_pci} name flow_steering_mode value smfs cmode runtime devlink dev param show pci/${nic_pci} name flow_steering_mode ip link set ${nic_name} vf 0 mac 00:de:ad:02:00:01 ip link set ${nic_name} vf 1 mac 00:de:ad:02:00:02 cat > /etc/udev/rules.d/80-persistent-${nic_name}.rules <<-EOF SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", KERNELS=="${nic_pci}", NAME="${nic_name}" EOF echo 2 > /sys/bus/pci/devices/${nic_pci}/sriov_numvfs virtfn=$(ls -l /sys/bus/pci/devices/${nic_pci}/ | grep virtfn | sed 's/.*virtfn[0-9]\+ -> ..\/\(.*\)/\1/' | xargs) for vf in $virtfn do echo "echo $vf > /sys/bus/pci/drivers/mlx5_core/unbind" echo $vf > /sys/bus/pci/drivers/mlx5_core/unbind done sleep 10 devlink dev eswitch set pci/$nic_pci mode switchdev sleep 5 phys_switch_id=$(cat /sys/class/net/${nic_name}/phys_switch_id) for iface in $(ls /sys/class/net/) do [[ "$(cat /sys/class/net/$iface/phys_switch_id 2>/dev/null)" = "$phys_switch_id" ]] && ip link set $iface up done # enable tc offloading ethtool -K ${nic_name} hw-tc-offload on ethtool -k ${nic_name} devlink dev eswitch show pci/$nic_pci lspci | grep -i Mellanox 2. enable offload in ovs and add pf & vf_rep to ovs bridge systemctl status openvswitch &>/dev/null || systemctl start openvswitch ovs-vsctl set Open_vSwitch . other_config:hw-offload=true ovs-vsctl --if-exists del-br ovsbr0 ovs-vsctl add-br ovsbr0 ovs-vsctl add-port ovsbr0 ens1f0 ovs-vsctl add-port ovsbr0 eth0 3. attach vf xml to guest and configure a ip address. virsh attach-device eth0.xml virsh console g1 ip addr add 192.168.124.1/24 dev eth0 4. set the vf max_tx_rate to 5000. ip link set ens1f0 vf 0 max_tx_rate 5000 5. run netperf on guest to peer side [root@localhost ~]# netperf -H 192.168.124.2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.124.2 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.01 7206.41 Actual results: got the throughput result over than 6000Mb/s(5000 * (1 + 0.2)) Beaker job: RHEL-8.9.0-updates-20230917.32 openvswitch3.1-3.1.0-50.el8fdp https://beaker.engineering.redhat.com/recipes/14648208#task166425057 -- failed(got result 7206.41Mb/s ) RHEL-8.9.0-updates-20230917.32 openvswitch3.1-3.1.0-41.el8fdp https://beaker.engineering.redhat.com/recipes/14648613#task166428239 -- failed(got result 6032.34Mb/s ) Expected results: QE hopes that the actual test Tolerance will be less than 20% Additional info: After internal synchronization of qe, it was found that this may be a unstable issue of rhel + ovs. When run 23.G, this case can pass. test info: kernel-4.18.0-372.70.1.el8_6 + openvswitch3.1-3.1.0-50.el8fdp.x86_64 nic info: ethtool -i enp4s0f0 driver: mlx5_core version: 4.18.0-372.70.1.el8_6.x86_64 firmware-version: 16.35.2000 (MT_0000000012) expansion-rom-version: bus-info: 0000:04:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes beaker job: https://beaker.engineering.redhat.com/recipes/14446382#task164824831 https://beaker.engineering.redhat.com/recipes/14446382/tasks/164824831/results/770672838/logs/resultoutputfile.log --pass When run RHEL-8.9.0-updates-20230917.32 + openvswitch3.1-3.1.0-61.el8fdp, this case can also pass https://beaker.engineering.redhat.com/recipes/14648919#tasks But run RHEL-8.6.0-updates-20230919.5 + openvswitch3.1-3.1.0-61.el8fdp, this case failed. https://beaker.engineering.redhat.com/jobs/8334719
Hi, (In reply to mhou from comment #0) > Description of problem: > limit the vf_rep rate to 5000Mb/s but run netperf on guest got over > 60000Mb/s. > ip link set $pf vf $vf max_tx_rate 5000 This command is independent from OVS. So we're talking about only driver/NIC here. > > Version-Release number of selected component (if applicable): > Distro: RHEL-8.9.0-updates-20230917.32 > ovs:openvswitch3.1-3.1.0-50.el8fdp & 41.el8fdp --failed > openvswitch3.1-3.1.0-61.el8fdp --pass > > # ethtool -i ens1f0 > driver: mlx5_core > version: 4.18.0-513.2.1.el8_9.x86_64 So this would be a bad kernel. > firmware-version: 16.35.2000 (MT_0000000080) > expansion-rom-version: > bus-info: 0000:3b:00.0 > supports-statistics: yes > supports-test: yes > supports-eeprom-access: no > supports-register-dump: no > supports-priv-flags: yes > > How reproducible: 100% > > > Steps to Reproduce: > 1. create vf_rep > nic_name=ens1f0 > nic_pci="$(ethtool -i ${nic_name} | sed -n '/bus-info: / s/bus-info: //p')" > echo 0 > /sys/bus/pci/devices/${nic_pci}/sriov_numvfs > devlink dev eswitch set pci/${nic_pci} mode legacy > devlink dev param set pci/${nic_pci} name flow_steering_mode value smfs > cmode runtime > devlink dev param show pci/${nic_pci} name flow_steering_mode > ip link set ${nic_name} vf 0 mac 00:de:ad:02:00:01 > ip link set ${nic_name} vf 1 mac 00:de:ad:02:00:02 > cat > /etc/udev/rules.d/80-persistent-${nic_name}.rules <<-EOF > SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", KERNELS=="${nic_pci}", > NAME="${nic_name}" > EOF > echo 2 > /sys/bus/pci/devices/${nic_pci}/sriov_numvfs > virtfn=$(ls -l /sys/bus/pci/devices/${nic_pci}/ | grep virtfn | sed > 's/.*virtfn[0-9]\+ -> ..\/\(.*\)/\1/' | xargs) > for vf in $virtfn > do > echo "echo $vf > /sys/bus/pci/drivers/mlx5_core/unbind" > echo $vf > /sys/bus/pci/drivers/mlx5_core/unbind > done > sleep 10 > devlink dev eswitch set pci/$nic_pci mode switchdev > sleep 5 > phys_switch_id=$(cat /sys/class/net/${nic_name}/phys_switch_id) > for iface in $(ls /sys/class/net/) > do > [[ "$(cat /sys/class/net/$iface/phys_switch_id 2>/dev/null)" = > "$phys_switch_id" ]] && ip link set $iface up > done > # enable tc offloading > ethtool -K ${nic_name} hw-tc-offload on > ethtool -k ${nic_name} > devlink dev eswitch show pci/$nic_pci > lspci | grep -i Mellanox > > 2. enable offload in ovs and add pf & vf_rep to ovs bridge > systemctl status openvswitch &>/dev/null || systemctl start openvswitch > ovs-vsctl set Open_vSwitch . other_config:hw-offload=true > ovs-vsctl --if-exists del-br ovsbr0 > ovs-vsctl add-br ovsbr0 > ovs-vsctl add-port ovsbr0 ens1f0 > ovs-vsctl add-port ovsbr0 eth0 > > 3. attach vf xml to guest and configure a ip address. > virsh attach-device eth0.xml > virsh console g1 > ip addr add 192.168.124.1/24 dev eth0 > 4. set the vf max_tx_rate to 5000. > ip link set ens1f0 vf 0 max_tx_rate 5000 > > 5. run netperf on guest to peer side > [root@localhost ~]# netperf -H 192.168.124.2 > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > 192.168.124.2 () port 0 AF_INET > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^6bits/sec > > 87380 16384 16384 10.01 7206.41 > > Actual results: > got the throughput result over than 6000Mb/s(5000 * (1 + 0.2)) > > Beaker job: > RHEL-8.9.0-updates-20230917.32 openvswitch3.1-3.1.0-50.el8fdp > https://beaker.engineering.redhat.com/recipes/14648208#task166425057 -- > failed(got result 7206.41Mb/s ) > > RHEL-8.9.0-updates-20230917.32 openvswitch3.1-3.1.0-41.el8fdp > https://beaker.engineering.redhat.com/recipes/14648613#task166428239 -- > failed(got result 6032.34Mb/s ) > > Expected results: > QE hopes that the actual test Tolerance will be less than 20% > > Additional info: > After internal synchronization of qe, it was found that this may be a > unstable issue of rhel + ovs. > > When run 23.G, this case can pass. > test info: > kernel-4.18.0-372.70.1.el8_6 + openvswitch3.1-3.1.0-50.el8fdp.x86_64 > > nic info: > ethtool -i enp4s0f0 > driver: mlx5_core > version: 4.18.0-372.70.1.el8_6.x86_64 > firmware-version: 16.35.2000 (MT_0000000012) > expansion-rom-version: > bus-info: 0000:04:00.0 > supports-statistics: yes > supports-test: yes > supports-eeprom-access: no > supports-register-dump: no > supports-priv-flags: yes > > beaker job: > https://beaker.engineering.redhat.com/recipes/14446382#task164824831 > https://beaker.engineering.redhat.com/recipes/14446382/tasks/164824831/ > results/770672838/logs/resultoutputfile.log --pass > > When run RHEL-8.9.0-updates-20230917.32 + openvswitch3.1-3.1.0-61.el8fdp, > this case can also pass > https://beaker.engineering.redhat.com/recipes/14648919#tasks But this one is also using kernel-4.18.0-513.2.1.el8_9. Even OVS version is the same here. What's the difference from this to the bad test case above? > > But run RHEL-8.6.0-updates-20230919.5 + openvswitch3.1-3.1.0-61.el8fdp, this > case failed. > https://beaker.engineering.redhat.com/jobs/8334719
(In reply to Marcelo Ricardo Leitner from comment #1) ? But this one is also using kernel-4.18.0-513.2.1.el8_9. > Even OVS version is the same here. What's the difference from this to the bad test case above? Let's draw a table to try to explain the current situation. kernel/ ovs 4.18.0-513.2.1.el8_9 kernel-4.18.0-372.70.1.el8_6 kernel-4.18.0-372.74.1.el8_6. openvswitch3.1-3.1.0-41.el8fdp FAIL openvswitch3.1-3.1.0-50.el8fdp FAIL PASS openvswitch3.1-3.1.0-61.el8fdp PASS FAIL Based on the same ovs version (e.g. 50 or 60), we see different results on 8.6.z and 8.9. What are your thoughts on this and how QE should narrow down the issue?
(In reply to mhou from comment #2) > What are your thoughts on this and how QE should narrow down the issue? Really need to hear from Nvidia now. This is very weird. As I said, that command goes directly to the driver/NIC and OVS shouldn't interfere. Maybe when the driver configures the HW, something is conflicting somehow. Are you sure the test is conclusive? It would be nice if you could test 372.70.1 with ovs -61 and 371.74.1 with ovs -50. Trying to understand what went wrong on that update..