Bug 2310427

Summary: [bug][RHOS17.1] Infra vlans not working when deploying a compute with it's bond on a nic-partitioned vf
Product: Red Hat OpenStack Reporter: Luigi Tamagnone <ltamagno>
Component: openstack-tripleo-heat-templatesAssignee: RHOSP:NFV_Eng <rhosp-nfv-int>
Status: CLOSED ERRATA QA Contact: Miguel Angel Nieto <mnietoji>
Severity: medium Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: apevec, bpoirier, chrisw, dhill, ekuris, eshulman, gregraka, jelle.hoylaerts.ext, jfindysz, jpalanis, ksundara, ktordeur, madgupta, mariel, mburns, mnietoji, njohnston, vcandapp
Target Milestone: z4Keywords: Triaged
Target Release: 17.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-14.3.1-17.1.20240919130756.el9ost tripleo-ansible-3.3.1-17.1.20240920151437.el9ost Doc Type: Release Note
Doc Text:
Before this release, when upgrading from RHOSP 16.x to RHOSP 17.x with NIC partitioning on NVIDIA Mellanox cards, connectivity was lost on the Linux bond. With this update, this issue has been fixed. To use this fix, ensure that you set the Ansible variable, `dpdk_extra` in your bare metal node definition file before upgrading to RHOSP 17.1.4. For more information, see link:https://docs.redhat.com/en/documentation/red_hat_openstack_platform/17.1/html/configuring_network_functions_virtualization/config-dpdk-deploy_rhosp-nfv#create-bare-metal-nodes-def-file_cfgdpdk-nfv[Creating a bare metal nodes definition file] in _Configuring network functions virtualization_.
Story Points: ---
Clone Of:
: 2323844 (view as bug list) Environment:
Last Closed: 2024-11-21 09:30:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2323844    
Attachments:
Description Flags
Config.yaml none

Description Luigi Tamagnone 2024-09-06 14:59:00 UTC
Description of problem:
- If we setup two nic-partitioning to run bond0 with all infra vlans on top of 2 vf's.
  Network configuration is as follows:
           ┌───────┐
           │       ┼─────┐
           │  PF   │     │
           └───────┘     ┌────┐┌───┐
     Connectx-6 LX NIC#1 │ VF ││   │
                         └────┘│   │             ┌────────┐
                               │ bond0 (mode=1)──│vlan39XX├─ 192.168.2.X
     Connectx-6 LX NIC#2 ┌────┐│   │             └────────┘
           ┌───────┐     │ VF ││   │
           │  PF   │     ├────┘└───┘
           │       ┼─────┘
           └───────┘
- This setup works fine on osp16.2 / RHEL8.4.
- On osp17.1/RHEL9.2 this works only if the VF is in promisc mode:
10: p1p1_0: <BROADCAST,MULTICAST,PROMISC,SLAVE,UP,LOWER_UP> mtu 9050 qdisc mq master bond0 state UP group default qlen 1000

- In the NFV docs it's mentioned that you can indeed put the vf in promiscuous mode.
  But it doesn't specify if it's a must do for being able to run your infra vlans on top of it.

Version-Release number of selected component (if applicable):
[redhat-release] Red Hat Enterprise Linux release 9.2 (Plow)
[rhosp-release] Red Hat OpenStack Platform release 17.1.3 (Wallaby)
openvswitch3.1-3.1.0-104.el9fdp.x86_64

How reproducible:
Every time.

Steps to Reproduce:
- OSP17.1 environments that have been upgraded from OSP16.2

Actual results:
Works only 
- if set the vf in promisc mode 
- disable openvswitch
 . systemctl ENABLE tripleo*
 . systemctl disable openvswitch.service
 . moved away /usr/lib/systemd/system/ovsdb-server.service
 . moved away /usr/lib/systemd/system/ovs-delete-transient-ports.service
 . moved away /usr/lib/systemd/system/ovs-vswitchd.service

Expected results:
Should work as on 16.2 with exact same configuration

Additional info:
  - The problem reproduces as well if the VLAN is configured on top of the VF (without bonding)
     Connect-6 LX NIC#1
              ┌───────┐      ┌────┐    ┌────────┐
              │   PF  ┼──────┼ VF ├────│vlan39xx├── 192.168.2.x
              └───────┘      └────┘    └────────┘

  - If the VLAN is configured on top of the PF interface, everything works and no promisc mode is needed.
     Connect-6 LX NIC#1
              ┌───────┐     ┌────────┐
              │  PF   ┼─────│vlan39xx├── 192.168.2.x
              └───────┘     └────────┘
- tried firmware for osp17:
 * 26.41.1002
 * 26.39.1002
 * 26.38.1002
 * 26.36.1010

Comment 4 Benjamin Poirier 2024-09-10 14:45:12 UTC
I passed on the information from this ticket to Maor Dickman from Nvidia. He thinks this issue is not related to OVS and he asked:
> Did you tried to reproduce with simple OVS configuration? Or Legacy SRIOV?

Comment 9 Benjamin Poirier 2024-09-20 14:40:51 UTC
I tried a few different ways based on the ascii art diagrams and the problem
did not reproduce. For instance, I tried the following:

devlink dev eswitch set pci/0000:08:00.0 mode switchdev
echo 1 > /sys/bus/pci/devices/0000:08:00.0/sriov_numvfs
udevadm settle
ip link add br0 up type bridge
ip link set dev eth2 up master br0  # PF
ip link set dev eth4 up master br0  # VF PR
ip link set dev eth5 up  # actual VF
ip addr add 192.168.1.1/24 dev eth5
ping -c4 192.168.1.2  # ok
ip link add eth5.39 link eth5 up type vlan id 39
ip addr add 192.168.2.1/24 dev eth5.39
ping -c4 192.168.2.2  # ok
systemctl start openvswitch.service
ip link show dev eth5  # no "PROMISC" flag
ping -c4 192.168.2.2  # ok

In the above, I used kernel 5.14.0-284.30.1.el9_2.x86_64, adapter CX-6 Lx with
firmware 26.41.1000.

Presumably, more specific openvswitch configuration is needed to reproduce the
problem but I can't guess what it is, especially given that I have next to no
experience with OVS.

Can you to try to simplify the reproduction environment (ie. without OSP)
and provide detailed reproduction instructions?

Comment 10 Karthik Sundaravel 2024-09-23 09:27:37 UTC
Hi Benjamin,

I'll try to make a simplified reproducer without OSP.
The issue is seen with legacy SR-IOV and not switchdev.

Comment 16 Karthik Sundaravel 2024-09-26 18:02:19 UTC
Steps to reproduce
------------------
STEP 1) download the config file. Please modify the entries tagged with " => CHANGE ME"
STEP 2) 
    download os-net-config from the git repo https://github.com/os-net-config/os-net-config.git
    cd os-net-config; git fetch -v --all; git switch -c stable/wallaby origin/stable/wallaby
    python setup.py install --prefix=/usr
    os-net-config -d -c <path to the config file>
    Dependencies:
    Python 3.7.0 or higher is required. Other modules could be installed via pip

STEP 3) repeat the above steps on second machine with different ip address 

STEP 4) Ping from one machine to another. It works now.

STEP 5) Reboot one machine. Ping doesn't work.

STEP 6) On the rebooted machine, do
        ip link set dev <device name for VF-id> promisc on => repeat this for second interface as well.
        Ping works now.
       
        or 
       
        ip link set dev <device name for VF-id> down => Ping works in my setup

Kernel version
[tripleo-admin@compute-0 ~]$ uname -r
5.14.0-284.82.1.el9_2.x86_64

Driver/FW version:
[tripleo-admin@compute-0 ~]$ ethtool -i ens2f0np0
driver: mlx5_core
version: 5.14.0-284.82.1.el9_2.x86_64
firmware-version: 26.36.1010 (MT_0000000532)
expansion-rom-version: 
bus-info: 0000:17:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

Device: Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]

I did the above steps to reproduce the issue.

Comment 17 Benjamin Poirier 2024-09-27 21:37:07 UTC
I followed the instructions in comment 16 but faced a few errors and
ultimately there was no "mellanox_bond" interface.

I used an up to date RHEL-9.2 install. Here are the commands that I ran:

# git clone https://github.com/os-net-config/os-net-config.git
# cd os-net-config/
# git fetch -v --all
# git switch -c stable/wallaby origin/stable/wallaby
# python setup.py install --prefix=/usr
# os-net-config -d -c ~/config_mellanox_no_promisc.yaml
# pip install oslo_concurrency
# os-net-config -d -c ~/config_mellanox_no_promisc.yaml
# pip install pyudev
# os-net-config -d -c ~/config_mellanox_no_promisc.yaml
# pip install jsonschema
# os-net-config -d -c ~/config_mellanox_no_promisc.yaml
[...]
NoneType: None
Traceback (most recent call last):
  File "/usr/bin/os-net-config", line 10, in <module>
    sys.exit(main())
  File "/usr/lib/python3.9/site-packages/os_net_config/cli.py", line 360, in main
    pf_files_changed = provider.apply(cleanup=opts.cleanup,
  File "/usr/lib/python3.9/site-packages/os_net_config/impl_ifcfg.py", line 2020, in apply
    self.ifdown(interface)
  File "/usr/lib/python3.9/site-packages/os_net_config/__init__.py", line 500, in ifdown
    self.execute(msg, '/sbin/ifdown', interface, check_exit_code=False)
  File "/usr/lib/python3.9/site-packages/os_net_config/__init__.py", line 480, in execute
    out, err = processutils.execute(cmd, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/oslo_concurrency/processutils.py", line 401, in execute
    obj = subprocess.Popen(cmd,
  File "/usr/lib64/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib64/python3.9/subprocess.py", line 1821, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/sbin/ifdown'
# dnf install -y NetworkManager-initscripts-updown
# os-net-config -d -c ~/config_mellanox_no_promisc.yaml
[...]
2024-09-28 00:14:00.203 INFO os_net_config.execute running ifup on interface: enp8s0f0v1
2024-09-28 00:14:00.394 INFO os_net_config.execute running ifup on interface: enp8s0f1v1
2024-09-28 00:14:00.582 INFO os_net_config.execute running ifup on interface: mellanox_bond
2024-09-28 00:14:00.612 ERROR os_net_config.impl_ifcfg.apply Failure(s) occurred when applying configuration
2024-09-28 00:14:00.612 ERROR os_net_config.impl_ifcfg.apply stdout: , stderr: Error: unknown connection '/etc/sysconfig/network-scripts/ifcfg-enp8s0f0v1'.
Failure to activate file "enp8s0f0v1"!

See all profiles with `nmcli connection`.
Reload files from disk with `nmcli connection reload`
Activate the desired profile with `nmcli connection up \"$NAME\"`

2024-09-28 00:14:00.612 ERROR os_net_config.impl_ifcfg.apply stdout: , stderr: Error: unknown connection '/etc/sysconfig/network-scripts/ifcfg-enp8s0f1v1'.
Failure to activate file "enp8s0f1v1"!

See all profiles with `nmcli connection`.
Reload files from disk with `nmcli connection reload`
Activate the desired profile with `nmcli connection up \"$NAME\"`

2024-09-28 00:14:00.612 ERROR os_net_config.impl_ifcfg.apply stdout: , stderr: Error: unknown connection '/etc/sysconfig/network-scripts/ifcfg-mellanox_bond'.
Failure to activate file "mellanox_bond"!

See all profiles with `nmcli connection`.
Reload files from disk with `nmcli connection reload`
Activate the desired profile with `nmcli connection up \"$NAME\"`

2024-09-28 00:14:00.612 ERROR os_net_config.main ***Failed to configure with ifcfg provider***
ConfigurationError('Failure(s) occurred when applying configuration')
2024-09-28 00:14:00.612 ERROR os_net_config.common.log_exceptions Traceback (most recent call last):
  File "/usr/bin/os-net-config", line 10, in <module>
    sys.exit(main())
  File "/usr/lib/python3.9/site-packages/os_net_config/cli.py", line 392, in main
    files_changed = provider.apply(cleanup=opts.cleanup,
  File "/usr/lib/python3.9/site-packages/os_net_config/impl_ifcfg.py", line 2147, in apply
    raise os_net_config.ConfigurationError(message)
os_net_config.ConfigurationError: Failure(s) occurred when applying configuration
NoneType: None
Traceback (most recent call last):
  File "/usr/bin/os-net-config", line 10, in <module>
    sys.exit(main())
  File "/usr/lib/python3.9/site-packages/os_net_config/cli.py", line 392, in main
    files_changed = provider.apply(cleanup=opts.cleanup,
  File "/usr/lib/python3.9/site-packages/os_net_config/impl_ifcfg.py", line 2147, in apply
    raise os_net_config.ConfigurationError(message)
os_net_config.ConfigurationError: Failure(s) occurred when applying configuration
# ls /etc/sysconfig/network-scripts/
ifcfg-enp8s0f0np0  ifcfg-enp8s0f0v1  ifcfg-enp8s0f1np1  ifcfg-enp8s0f1v1  ifcfg-mellanox_bond  readme-ifcfg-rh.txt
# nmcli con
NAME                UUID                                  TYPE      DEVICE
enp5s0              bbb03040-9469-4436-9537-4e6ecafadeff  ethernet  enp5s0
enp4s0              d05673ca-6f4f-44be-ae6b-353b18a83f1d  ethernet  enp4s0
lo                  205e4428-2079-4e7a-89da-4cb811c0ce8d  loopback  lo
System enp8s0f0np0  8cfe20f3-2c47-a269-16cf-ed6e17919c74  ethernet  enp8s0f0np0
System enp8s0f1np1  a3c65d4a-a91d-7bd5-63bd-f6f55fd22cc8  ethernet  enp8s0f1np1


All of the ifcfg-* files under /etc/sysconfig/network-scripts/ were created by
os-net-config but NetworkManager only loads ifcfg-enp8s0f0np0 and
ifcfg-enp8s0f1np1. I noticed this difference:

# grep NM_CONTROLLED ifcfg-*
ifcfg-enp8s0f0np0:NM_CONTROLLED=yes
ifcfg-enp8s0f0v1:NM_CONTROLLED=no
ifcfg-enp8s0f1np1:NM_CONTROLLED=yes
ifcfg-enp8s0f1v1:NM_CONTROLLED=no
ifcfg-mellanox_bond:NM_CONTROLLED=no

So it seems expected that NetworkManager will not load some of those files.

Do the files have a similar content when you follow the instructions. Does
NetworkManager load them?

Since you did not mention installing NetworkManager-initscripts-updown, is it
expected that I ran into the first quoted error (FileNotFoundError) before
installing that package?

Let me know if you have some additionnal suggestions.

Comment 18 Karthik Sundaravel 2024-09-28 10:30:41 UTC
Hi Benjamin,

In OSP, we use the package openstack-network-scripts (aka initscripts) for the ifup / ifdown commands.
So please remove the package `NetworkManager-initscripts-updown` and install openstack-network-scripts.

I fetched the version of openstack-network-scripts from another system, and it should be more or less same as the one I have used for reproducing the issue.

Name         : openstack-network-scripts
Version      : 10.11.1
Release      : 9.17_1.1.el9ost
Architecture : x86_64
Size         : 161 k
Source       : openstack-network-scripts-10.11.1-9.17_1.1.el9ost.src.rpm
Repository   : @System
From repo    : rhos-17.1
Summary      : Legacy scripts for manipulating of network devices
URL          : https://github.com/fedora-sysv/initscripts
License      : GPLv2

Comment 20 Benjamin Poirier 2024-10-02 20:59:10 UTC
I installed os-net-config from the rhoso-18.0-for-rhel-9-x86_64-rpms
repository.

> STEP 4) Ping from one machine to another. It works now.

Indeed

> STEP 5) Reboot one machine. Ping doesn't work.

After reboot, the mellanox_bond interface does not exist.
I started the "network" init script (part of os-net-config) manually but it
reported some errors and failed:

Oct 02 23:30:49 c-236-4-240-243 network[2110]: Bringing up interface mellanox_bond:
Oct 02 23:30:49 c-236-4-240-243 network[2477]: ERROR     : [/etc/sysconfig/network-scripts/ifup-eth] Device enp8s0f0
v1 does not seem to be present, delaying initialization.
Oct 02 23:30:49 c-236-4-240-243 /etc/sysconfig/network-scripts/ifup-eth[2500]: Device enp8s0f0v1 does not seem to be
 present, delaying initialization.
Oct 02 23:30:49 c-236-4-240-243 network[2407]: WARN      : [/etc/sysconfig/network-scripts/ifup-eth] Unable to start
 slave device ifcfg-enp8s0f0v1 for master mellanox_bond.
Oct 02 23:30:49 c-236-4-240-243 /etc/sysconfig/network-scripts/ifup-eth[2501]: Unable to start slave device ifcfg-enp8s0f0v1 for master mellanox_bond.
Oct 02 23:30:49 c-236-4-240-243 network[2502]: ERROR     : [/etc/sysconfig/network-scripts/ifup-eth] Device enp8s0f1v1 does not seem to be present, delaying initialization.
Oct 02 23:30:49 c-236-4-240-243 /etc/sysconfig/network-scripts/ifup-eth[2525]: Device enp8s0f1v1 does not seem to be present, delaying initialization.
Oct 02 23:30:49 c-236-4-240-243 network[2407]: WARN      : [/etc/sysconfig/network-scripts/ifup-eth] Unable to start slave device ifcfg-enp8s0f1v1 for master mellanox_bond.

The VF interfaces are not present. While config_mellanox_no_promisc.yaml
includes a directive to create 4 VFs:

- type: sriov_pf
  name: nic11 => CHANGE ME
  mtu: 9000
  numvfs: 4

... this information does not seem to be reflected in the files that were
created under /etc/sysconfig/network-scripts:

root@c-236-4-240-243:/etc/sysconfig/network-scripts# cat ifcfg-enp8s0f0np0
# This file is autogenerated by os-net-config
DEVICE=enp8s0f0np0
ONBOOT=yes
HOTPLUG=no
NM_CONTROLLED=yes
PEERDNS=no
BOOTPROTO=none
MTU=9000
DEFROUTE=no
root@c-236-4-240-243:/etc/sysconfig/network-scripts# cat ifcfg-enp8s0f0v1
# This file is autogenerated by os-net-config
DEVICE=enp8s0f0v1
ONBOOT=yes
HOTPLUG=no
NM_CONTROLLED=no
PEERDNS=no
MASTER=mellanox_bond
SLAVE=yes
BOOTPROTO=none

So I'm not sure how this is supposed to work.

Did you try the reproduction instructions on RHEL-9.2? How were the interfaces
defined in the yaml file created after boot?

Comment 21 Benjamin Poirier 2024-10-02 21:02:02 UTC
> I installed os-net-config from the rhoso-18.0-for-rhel-9-x86_64-rpms
              ^
I meant "openstack-network-scripts", sorry.

Comment 22 Karthik Sundaravel 2024-10-03 01:54:15 UTC
os-net-config creates /var/lib/os-net-config/sriov_config.yaml, where the numvfs and other VF configurations are present.
Also os-net-config adds a service file sriov_config.
During reboot, os-net-config sriov_config service will read the sriov_config.yaml and apply the settings.

And then network service brings up the bonds configured in the ifcfg files.

Comment 23 Benjamin Poirier 2024-10-04 13:10:22 UTC
> During reboot, os-net-config sriov_config service will read the sriov_config.yaml and apply the settings.

At the time when I wrote comment 20, "sriov_config.service" was failing and I
didn't notice. It was failing because I had installed os-net-config in a venv
instead of system-wide and the service file doesn't handle that. I installed
it under /usr like the original instructions said, I also enabled
"network.service" and then the network config was applied at boot as expected.

> STEP 5) Reboot one machine. Ping doesn't work.

In my case, now that the network services are starting properly, the problem
does not reproduce; ping works after reboot and the vf interfaces do NOT have
the promisc flag. I had a call with Karthik yesterday and showed him that.

I guess the problem depends on some more specific configuration to reproduce.
Can you please try to narrow it down?

Comment 25 Karthik Sundaravel 2024-10-12 01:29:28 UTC
Benjamin (Partner engineer from Nvidia) is working on the issue. This needs investigation from Nvidia, since the PF/VF configurations applied by os-net-config in both working (OSP16.2) and non working (OSP17.1) are the same, but seeing different behaviour from the SR-IOV nic. 

@Madhur
We have reproduced the issue in our development machines and given access to Benjamin to investigate. We have "ConnectX-5 Ex" in our lab, while the customer has seen this issue in "ConnectX-6 Lx". If we could get couple of machines from the customer (where the issue is seen) for Benjamin , it could be helpful as well.

@Benjamin, we have a deadline of 1st November. Please note that we have a high priority and date pressure to have a fix by then.

Comment 26 Benjamin Poirier 2024-10-15 16:12:11 UTC
Karthik provided access to a system at Red Hat where the problem occurs. I
began to investigate the situation on that system. It did not use vlans, it
was just a bond over two VFs. I observed the following:
*)
When the problem occurred, I deleted the bond and assigned the ip address
directly on the VF that was the active bond member. The problem continued, so
might not be related to bond or vlan. In the same way as reported in the
description, after setting that VF to promisc mode, the problem was resolved
(ping worked).
*)
When the problem occurs, `ip -s link` shows that the packet RX counter on the
VF section of the PF netdev increases, but the packet RX counter on the VF
netdev itself does not increase.
`ethtool -S` on the VF shows that the rx_steer_missed_packets counter
increases.

I tried to dump the steering rules on the adapter using 'mlxdump fsdump' but
it did not work. I opened a ticket for this at Nvidia (RM-4124320).
*)
If I do `systemctl disable openvswitch.service` and reboot, the problem does
not occur. However, openvswitch still gets started at boot by network.service.
So there might be different behavior depending on how/when OVS is started.
Moreover, the OVS configuration does not actually include the ConnectX nic
AFAIK. It includes two Intel nics.

Can you try again to provide simple but complete reproduction instructions?

Comment 27 Karthik Sundaravel 2024-10-16 14:23:16 UTC
Benjamin,

I'll try to reproduce the issue on non openstack setup. I'll share the steps when I have one.

Meanwhile, as we speak the ovs bridges were all cleaned up in those machines and we still see some interference between openvswitch and the Mellanox cards.
Does this call for a look up from the openvswitch team ?

Comment 29 Karthik Sundaravel 2024-10-16 16:42:17 UTC
Hi Madhur,

We (Benjamin and myself) have found that disabling DPDK solves the connectivity issue. We would like to understand if in OSP16.2, does the customer use DPDK on any port (need not be mellanox nics) in the affected node ?

Comment 30 Madhur Gupta 2024-10-17 13:28:17 UTC
(In reply to Karthik Sundaravel from comment #29)
> Hi Madhur,
> 
> We (Benjamin and myself) have found that disabling DPDK solves the
> connectivity issue. We would like to understand if in OSP16.2, does the
> customer use DPDK on any port (need not be mellanox nics) in the affected
> node ?

Hi Karthik,

>We would like to understand if in OSP16.2, does the customer use DPDK on any port (need not be mellanox nics) in the affected node ?

Yes, the customer has confirmed that with DPDK enabled workloads they faced the issues, but the customer will try to reproduce it with non-dpdk environment.

However, for the customer dpdk is important for their workload.

Let me know if you you both need anything else?

Comment 31 Madhur Gupta 2024-10-17 17:03:31 UTC
Hello Karthik and Benjamin,

Here is the response from the customer contact:


"
Hey Guys,

 

I just had a look on it and indeed we only see the issue on the computes that have also dpdk.

We have similar computes which don’t have dpdk but still the same vf setup, they are not affected by the issue.

1 caveat to make and it’s also mentioned in the case already and I think even one of the engineers metioned it again is.

That the interfaces that are used for ovs-dpdk are not the same interfaces as the ones used for the vf’s and infra vlans.

They come from completely different network cards.

Yet for some reason the fact of having dpdk in the host appears to make some difference."

Comment 33 Karthik Sundaravel 2024-10-18 04:37:10 UTC
Hi Benjamin

Here are the steps performed on a standalone machine to reproduce the issue on CX5 cards.

Prerequisites
---------------
RHEL 9.2 (5.14.0-284.66.1.el9_2.x86_64)
Python 3.9
Python3-pip
openstack-network-scripts
Openvswitch
systemctl start openvswitch
systemctl enable openvswitch
systemctl enable network
ovs-vsctl set o . other_config:dpdk-init=true
systemctl restart openvswitch


Download and install os-net-config
----------------------------------
git clone https://github.com/os-net-config/os-net-config.git -b stable/wallaby
pip install pyroute2 jsonschema oslo_concurrency
cd os-net-config
python setup.py install --prefix=/usr

Generate the config.yaml
-----------------------
Download the config.yaml from the BZ and modify 'CHANGEME' to appropriate nics/vlans/ip address.
The nic mapping could be found by running 'os-net-config -i'

Generate the ifcfgs
--------------------
os-net-config -c ~/config.yaml -p ifcfg - d

Test
-----
Run ping test from one machine to another
Ping test fails

Workaround to enable ping
-------------------------
Option A:
ovs-vsctl set o . other_config:dpdk-extra=“-a 0000:00:00.0
systemctl restart openvswitch
Check if ping works, if not 'systemctl restart network'


Option B:
ip link set dev <vf device> promisc on

Option C: (may not work always)
ifdown <first member of the bond>

Comment 46 Benjamin Poirier 2024-10-25 22:02:49 UTC
By using Karthik's instructions, I was able to reproduce the problem at
Nvidia. I was also able to simplify the instructions so that os-net-config is
not needed:

Prepare host 1:
subscription-manager repos --enable fast-datapath-for-rhel-9-x86_64-rpms
dnf install --allowerasing -y openvswitch3.3

grubby --update-kernel ALL --args="hugepages=512"
grub2-mkconfig -o /boot/grub2/grub.cfg

systemctl start openvswitch.service
ovs-vsctl set o . other_config:dpdk-init=true

reboot

Prepare host 2:
ip link set dev eth2 up
ip addr add 192.168.1.2/24 dev eth2

Reproduce problem on host 1:
echo 1 > /sys/class/net/eth2/device/sriov_numvfs
systemctl start openvswitch.service
ip link set dev eth4 up  # eth4 is the new vf netdev
ip addr add 192.168.1.1/24 dev eth4

From host 2, ping 192.168.1.1. Does not work, rx_steer_missed_packets
increases.

As we can see, vlan and bond are not needed to reproduce the problem.

Also, if we change the reproduction command sequence to:
systemctl start openvswitch.service
echo 1 > /sys/class/net/eth2/device/sriov_numvfs
ip link set dev eth4 up
ip addr add 192.168.1.1/24 dev eth4

The result is good. So the problem seems related to something that ovs
configures at startup.

> I tried to dump the steering rules on the adapter using 'mlxdump fsdump' but
> it did not work. I opened a ticket for this at Nvidia (RM-4124320).

It did not work because a special license is needed. I was able to run the
tool on Nvidia systems. In both the bad and good cases above, the steering
rules are almost the same. The only difference is related to the vf mac
address which changes each time the vf is created. So this did not provide an
insight on why traffic is dropped. I asked my coworkers for advice on how to
get more info on why the rx_steer_missed_packets counter is increasing but
didn't get any reply. Note that many of them are on vacation.

Meanwhile, I also tried different ovs package versions on RHEL-9 and noticed
that the problem also reproduces with openvswitch3.1 but not with
openvswitch3.0.

I reproduced the issue using upstream ovs and dpdk releases and, after testing
various combinations, narrowed it down to the following two:
* openvswitch-3.0.7 dpdk-21.11.8
	good
* openvswitch-3.0.7 dpdk-22.03
	bad

I then bisected on the dpdk repository which identified the following commit:
87af0d1e1bcc15ca414060263091a0f880ad3a86 is the first bad commit
commit 87af0d1e1bcc15ca414060263091a0f880ad3a86
Author: Michael Baum <michaelba>
Date:   Mon Feb 14 11:35:06 2022 +0200

    net/mlx5: concentrate all device configurations

    Move all device configure to be performed by mlx5_os_cap_config()
    function instead of the spawn function.
    In addition move all relevant fields from mlx5_dev_config structure to
    mlx5_dev_cap.

    Signed-off-by: Michael Baum <michaelba>
    Acked-by: Matan Azrad <matan>

I will contact the respective developers.

Comment 50 Benjamin Poirier 2024-11-04 13:53:52 UTC
> I will contact the respective developers.

I explained the issue to Michael Baum last week. He later said that he reviewed
the commit and did not find a problem.

We (Inbox team) are still trying to get help from someone who is familiar with
OVS and/or dpdk.

Comment 60 errata-xmlrpc 2024-11-21 09:30:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHOSP 17.1.4 (openstack-tripleo-heat-templates) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:9978

Comment 69 Red Hat Bugzilla 2025-04-17 04:25:05 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days