Bug 1986423 - os-net-config failing on reboot when using nic partitioning and there is a vm with vf/pf
Summary: os-net-config failing on reboot when using nic partitioning and there is a vm...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-net-config
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z3
: 16.2 (Train on RHEL 8.4)
Assignee: Karthik Sundaravel
QA Contact: Miguel Angel Nieto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-27 13:56 UTC by Miguel Angel Nieto
Modified: 2022-06-23 10:38 UTC (History)
14 users (show)

Fixed In Version: os-net-config-11.5.1-2.20220404114957.173ef73.el8ost
Doc Type: Known Issue
Doc Text:
Rebooting a node with a virtual function (VF) attached to OVS-DPDK (vfio-pci driver) results in VF uninitialized on that physical function (PF). As a result, virtual machines are unable to use the VFs from that PF. If a second VF is used for another OSP network, it does not function as expected after reboot. + As a workaround, perform the following steps on the Compute node before you reboot the node: + . Delete the file `/etc/udev/rules.d/70-os-net-config-sriov.rules`. . Modify the `Before` criteria of `/etc/systemd/system/sriov_config.service` file to add `network-pre.target`. The modified `Before` should look like: + ---- Before=network-pre.target openvswitch.service ---- + The workaround fixes the issue and all the VFs initialize correctly.
Clone Of:
Environment:
Last Closed: 2022-06-23 10:38:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 811780 0 None MERGED Updating sriov_config.service to run before driverctl.slice and network-pre 2021-11-11 07:12:49 UTC
OpenStack gerrit 814426 0 None MERGED Update sriov config service to handle nic partitioned PF 2021-11-11 07:12:49 UTC
OpenStack gerrit 832753 0 None MERGED Fix failure in dpdk driver binding with VF during reboot 2022-07-12 08:37:20 UTC
OpenStack gerrit 836227 0 None MERGED Fix failure in dpdk driver binding with VF during reboot 2022-10-06 12:45:57 UTC
Red Hat Issue Tracker NFV-2246 0 None None None 2021-08-17 16:57:51 UTC
Red Hat Issue Tracker NFV-2283 0 None None None 2022-02-03 06:14:28 UTC
Red Hat Issue Tracker OSP-6455 0 None None None 2021-11-11 07:16:27 UTC

Description Miguel Angel Nieto 2021-07-27 13:56:11 UTC
Description of problem:
1. Deploy nic partitioning templates. It is configured the following bond:
- linux bond: enp130s0f0v0 and enp130s0f1v0
- linux bond: enp130s0f0v1 and enp130s0f1v1
- ovs bond: enp130s0f0v2 and enp130s0f1v2
I can see vfs properly configured:
12: enp130s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether f8:f2:1e:03:a5:40 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether f6:b3:8f:c1:9a:db brd ff:ff:ff:ff:ff:ff, vlan 120, spoof checking off, link-state auto, trust on
    vf 1     link/ether 3a:2e:2b:38:2c:2e brd ff:ff:ff:ff:ff:ff, vlan 122, spoof checking off, link-state auto, trust on
    vf 2     link/ether be:c6:46:8f:27:c7 brd ff:ff:ff:ff:ff:ff, vlan 121, spoof checking off, link-state auto, trust on
    vf 3     link/ether 4e:2b:bd:bf:26:dc brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 4     link/ether 9e:ed:26:7d:88:a3 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 5     link/ether ea:b0:85:fe:fc:fd brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 6     link/ether 0a:49:f7:f7:76:13 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 7     link/ether ea:c3:e5:69:f0:88 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 8     link/ether 5e:dc:17:f7:c1:fa brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 9     link/ether ca:b4:be:bc:43:b0 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
13: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether f8:f2:1e:03:a5:42 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 1a:97:59:f4:0b:bc brd ff:ff:ff:ff:ff:ff, vlan 120, spoof checking off, link-state auto, trust on
    vf 1     link/ether 52:a7:06:26:7e:07 brd ff:ff:ff:ff:ff:ff, vlan 122, spoof checking off, link-state auto, trust on
    vf 2     link/ether 26:96:8a:1c:50:52 brd ff:ff:ff:ff:ff:ff, vlan 121, spoof checking off, link-state auto, trust on
    vf 3     link/ether 42:47:7e:b1:21:4d brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 4     link/ether 3e:b5:9a:71:1f:b4 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 5     link/ether 2e:85:b6:cf:c3:7a brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 6     link/ether 9e:da:60:02:c2:96 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 7     link/ether a6:11:85:d3:85:4d brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 8     link/ether fe:2e:4c:7d:fb:e0 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 9     link/ether ca:1d:81:80:a0:37 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off

2. Execute testcase nfv_tempest_plugin.tests.scenario.day2.test_hypervisor_usecases.TestHypervisorScenarios.test_hypervisor_reboot when execute the following actions:
a. create a vm with a geneve port for management, a vf in enp6s0f2 and a pf in enp6s0f3
b. check that there is ping
c. shutdown vm
d. reboot hypervisor
e. start vm
f. check ping. There is no connectivity to the floating ip of the vm

I have checked that after reboot, this is the status of vfs. Vf2 is not configured properly. This is the vf used for tenant traffic using an ovs bond, so there is no connectivity to the floating ip
12: enp130s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether f8:f2:1e:03:a5:40 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 5e:d5:a2:42:4e:56 brd ff:ff:ff:ff:ff:ff, vlan 120, spoof checking off, link-state auto, trust on
    vf 1     link/ether 9e:2b:51:f7:f0:e5 brd ff:ff:ff:ff:ff:ff, vlan 122, spoof checking off, link-state auto, trust on
    vf 2     link/ether 7e:01:35:68:57:ac brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 3     link/ether 4e:2b:bd:bf:26:dc brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 4     link/ether 9e:ed:26:7d:88:a3 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 5     link/ether ea:b0:85:fe:fc:fd brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 6     link/ether 0a:49:f7:f7:76:13 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 7     link/ether ea:c3:e5:69:f0:88 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 8     link/ether 5e:dc:17:f7:c1:fa brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 9     link/ether ca:b4:be:bc:43:b0 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
13: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether f8:f2:1e:03:a5:42 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 5e:d5:a2:42:4e:56 brd ff:ff:ff:ff:ff:ff, vlan 120, spoof checking off, link-state auto, trust on
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 122, spoof checking off, link-state auto, trust on
    vf 2     link/ether b2:45:3b:f0:ad:ce brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 3     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 4     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 5     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 6     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 7     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 8     link/ether fe:2e:4c:7d:fb:e0 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 9     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off

If i configure vf properly, then i recover connectivity.
I have seen these errors in logs at the time the testcase was executed:
Jul 27 11:13:04 computeovndpdksriov-0 systemd[1]: Starting SR-IOV numvfs configuration...
Jul 27 11:13:17 computeovndpdksriov-0 os-net-config-sriov[1632]: [2021/07/27 11:13:17 AM] [ERROR] Failed to execute ip link set dev enp130s0f0v1 promisc off
Jul 27 11:13:17 computeovndpdksriov-0 os-net-config-sriov[1632]: Traceback (most recent call last):
Jul 27 11:13:17 computeovndpdksriov-0 os-net-config-sriov[1632]:   File "/usr/bin/os-net-config-sriov", line 10, in <module>
Jul 27 11:13:17 computeovndpdksriov-0 os-net-config-sriov[1632]:     sys.exit(main())
Jul 27 11:13:17 computeovndpdksriov-0 os-net-config-sriov[1632]:   File "/usr/lib/python3.6/site-packages/os_net_config/sriov_config.py", line 617, in main
Jul 27 11:13:17 computeovndpdksriov-0 os-net-config-sriov[1632]:     configure_sriov_vf()
Jul 27 11:13:17 computeovndpdksriov-0 os-net-config-sriov[1632]:   File "/usr/lib/python3.6/site-packages/os_net_config/sriov_config.py", line 553, in configure_sriov_vf
Jul 27 11:13:17 computeovndpdksriov-0 os-net-config-sriov[1632]:     'promisc', item['promisc'])
Jul 27 11:13:17 computeovndpdksriov-0 os-net-config-sriov[1632]:   File "/usr/lib/python3.6/site-packages/os_net_config/sriov_config.py", line 451, in run_ip_config_cmd
Jul 27 11:13:17 computeovndpdksriov-0 os-net-config-sriov[1632]:     processutils.execute(*cmd, delay_on_retry=True, attempts=10, **kwargs)
Jul 27 11:13:17 computeovndpdksriov-0 os-net-config-sriov[1632]:   File "/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py", line 431, in execute
Jul 27 11:13:17 computeovndpdksriov-0 os-net-config-sriov[1632]:     cmd=sanitized_cmd)
Jul 27 11:13:17 computeovndpdksriov-0 os-net-config-sriov[1632]: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.



Version-Release number of selected component (if applicable):
RHOS-16.2-RHEL-8-20210722.n.0

How reproducible:
See description

Actual results:
Lost connectivity to vm


Expected results:
I should not lost connectivity to vm


Additional info:

Comment 2 Dan Sneddon 2021-07-27 20:11:21 UTC
It appears that NetworkManager is attempting to manage the NIC:

Jul 27 11:13:11 computeovndpdksriov-0 NetworkManager[1980]: <info>  [1627384391.4649] device (enp130s0f1): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')

In general when os-net-config is used to configure the NICs NetworkManager is not used. Can you please attach the NIC config template that was used from the templates on the undercloud? I want to confirm that the templates do not include "nm_controlled: true" for the PF configuration.

If that is not the case, do you know when these NICs were configured as NetworkManager connections?

It would also be helpful if you could paste the contents of these two files:

/etc/sysconfig/network-scripts/ifcfg-enp130s0f1
/etc/sysconfig/network-scripts/ifcfg-enp130s0f1v1

Comment 3 Miguel Angel Nieto 2021-07-28 07:44:59 UTC
Hi, these are the templates I have used:
https://code.engineering.redhat.com/gerrit/gitweb?p=nfv-qe.git;a=tree;f=tht/ospd-16.2-geneve-ovn-dpdk-sriov-ctlplane-dataplane-bonding-hybrid;h=1c4e355418ca13163bb557e5df74d37171e56dcb;hb=HEAD

we have nm_controlled: true, could that be the issue? we didnt have any issue before

[root@computeovndpdksriov-1 heat-admin]# cat /etc/sysconfig/network-scripts/ifcfg-enp130s0f1
# This file is autogenerated by os-net-config
DEVICE=enp130s0f1
ONBOOT=yes
HOTPLUG=no
NM_CONTROLLED=yes
PEERDNS=no
BOOTPROTO=none
MTU=9000
DEFROUTE=no
[root@computeovndpdksriov-1 heat-admin]# cat /etc/sysconfig/network-scripts/ifcfg-enp130s0f1v1 
# This file is autogenerated by os-net-config
DEVICE=enp130s0f1v1
ONBOOT=yes
HOTPLUG=no
NM_CONTROLLED=no
PEERDNS=no
MASTER=storage_bond
SLAVE=yes
BOOTPROTO=none

Comment 4 Miguel Angel Nieto 2021-07-28 10:16:44 UTC
I configured nm_controlled: false but i reploduce same issue, so it is not related with this parameter
https://gitlab.cee.redhat.com/mnietoji/deployment_templates/-/commit/64c0c113b012761b5f907fad13ecbe154b23f175

Comment 5 Miguel Angel Nieto 2021-07-28 10:32:02 UTC
I can see in /var/log/extra/failed_services.txt
-- Logs begin at Wed 2021-07-28 09:14:38 UTC, end at Wed 2021-07-28 10:10:51 UTC. --
Jul 28 10:01:28 computeovndpdksriov-0 systemd[1]: Starting SR-IOV numvfs configuration...
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]: [2021/07/28 10:01:41 AM] [ERROR] Failed to execute ip link set dev enp130s0f1v1 promisc off
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]: Traceback (most recent call last):
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]:   File "/usr/bin/os-net-config-sriov", line 10, in <module>
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]:     sys.exit(main())
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]:   File "/usr/lib/python3.6/site-packages/os_net_config/sriov_config.py", line 617, in main
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]:     configure_sriov_vf()
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]:   File "/usr/lib/python3.6/site-packages/os_net_config/sriov_config.py", line 553, in configure_sriov_vf
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]:     'promisc', item['promisc'])
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]:   File "/usr/lib/python3.6/site-packages/os_net_config/sriov_config.py", line 451, in run_ip_config_cmd
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]:     processutils.execute(*cmd, delay_on_retry=True, attempts=10, **kwargs)
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]:   File "/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py", line 431, in execute
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]:     cmd=sanitized_cmd)
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]: Command: ip link set dev enp130s0f1v1 promisc off
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]: Exit code: 1
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]: Stdout: ''
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]: Stderr: 'Cannot find device "enp130s0f1v1"\n'
Jul 28 10:01:41 computeovndpdksriov-0 systemd[1]: sriov_config.service: Main process exited, code=exited, status=1/FAILURE
Jul 28 10:01:41 computeovndpdksriov-0 systemd[1]: sriov_config.service: Failed with result 'exit-code'.
Jul 28 10:01:41 computeovndpdksriov-0 systemd[1]: Failed to start SR-IOV numvfs configuration.


In the deployment it was executed sucessfully os-net-config, I do not understand why it is executed on reboot.
[root@computeovndpdksriov-0 ~]# ip a |  grep enp130s0f1
13: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
51: enp130s0f1v9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
52: enp130s0f1v3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
53: enp130s0f1v4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
54: enp130s0f1v5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
55: enp130s0f1v6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
56: enp130s0f1v7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
59: enp130s0f1v0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond_api state UP group default qlen 1000


enp130s0f1v1 should not be missing, it is used by a linux bond and it is not used for dpdk. enp130s0f1v1 has pci address 0000:82:06.1
[root@computeovndpdksriov-0 ~]# driverctl  list-overrides
0000:82:02.2 vfio-pci
0000:82:06.2 vfio-pci
0000:82:0a.0 vfio-pci
0000:82:0e.0 vfio-pci

Comment 7 Miguel Angel Nieto 2021-07-28 15:37:53 UTC
I have seen in this doc [1] that the recomended way to define tenant vlan is diferent. I configured as in this doc.

Replaced this:
              - type: ovs_user_bridge
                name: br-link0
                use_dhcp: false
                addresses:
                  - ip_netmask:
                      get_param: TenantIpSubnet
                members:
                  - type: ovs_dpdk_bond
                    name: dpdkbond0
                    mtu: 9000
                    rx_queue: 1
                    members:
                      - type: ovs_dpdk_port
                        name: dpdk0
                        members:
                          - type: sriov_vf
                            device: nic3
                            vfid: 2
                            vlan_id:
                              get_param: TenantNetworkVlanID

with this:

              - type: ovs_user_bridge
                name: br-link0
                use_dhcp: false
                ovs_extra:                                                      
                  - str_replace:                                                
                      template: set port br-link0 tag=_VLAN_TAG_                
                      params:                                                   
                        _VLAN_TAG_:                                             
                           get_param: TenantNetworkVlanID     
                addresses:
                  - ip_netmask:
                      get_param: TenantIpSubnet
                members:
                  - type: ovs_dpdk_bond
                    name: dpdkbond0
                    mtu: 9000
                    rx_queue: 1
                    members:
                      - type: ovs_dpdk_port
                        name: dpdk0
                        members:
                          - type: sriov_vf
                            device: nic3
                            vfid: 2


But same issue happens, after rebooting several times, i can see:
1. vf1 is unconfigured, vlan 121 is missing
2. In this case vf2 should not have vlan configured as we used the new configuration, but for some reason tenant network is not working either

So, this bug is not related with the way tenant network is defined

12: enp130s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether f8:f2:1e:03:a5:40 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 0a:9d:6e:56:de:8c brd ff:ff:ff:ff:ff:ff, vlan 120, spoof checking off, link-state auto, trust on
    vf 1     link/ether 46:4d:5c:83:d1:71 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 2     link/ether e6:29:99:53:87:e7 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 3     link/ether ce:ca:f6:a7:19:df brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 4     link/ether fa:13:22:50:c7:49 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 5     link/ether a6:63:af:95:8c:71 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 6     link/ether b2:eb:b5:46:fa:2f brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 7     link/ether 86:7d:84:4d:f5:8a brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 8     link/ether 36:7d:e3:3b:b6:96 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 9     link/ether 1e:ad:46:2e:68:48 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
13: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether f8:f2:1e:03:a5:42 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 120, spoof checking off, link-state auto, trust on
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 2     link/ether 1a:f9:19:46:d0:6e brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 3     link/ether f6:7b:74:83:09:6a brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 4     link/ether 2a:0e:01:58:52:c1 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 5     link/ether ba:f1:d9:81:d7:e9 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off



[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/network_functions_virtualization_planning_and_configuration_guide/assembly_config-vxlan-dpdk-sriov-hybrid

Comment 9 Saravanan KR 2021-07-29 06:37:46 UTC
Jul 28 10:01:41 computeovndpdksriov-0 os-net-config-sriov[1645]: Stderr: 'Cannot find device "enp130s0f1v1"\n'

This happens when the VF creating is taking time and not yet available for the kernel to configure the VF [or] when the VF is bound to vfio-pci driver. The command is working for other VFs and only failing for VF attached to ovs_bond (other VFs are attached to linux_bond), which rules out the delay scenario. Need to confirm whether ifup scripts have been triggered before sriov_config.service is completed. Can you share the complete boot logs after the reboot?

How do you apply the configuration after the reboot is completed to make it work?

Does it happen only with specific hardware [or] nic partitioning is not working after reboot in all type of nodes?

Comment 10 Saravanan KR 2021-07-29 07:48:13 UTC
(In reply to Miguel Angel Nieto from comment #7)
> I have seen in this doc [1] that the recomended way to define tenant vlan is
> diferent. I configured as in this doc.
> 

The referenced doc above is specific to dpdk port on the interface, not applicable for dpdk port on VF. As per os-net-config sample, your earlier configuration is correct.
https://github.com/openstack/os-net-config/blob/master/etc/os-net-config/samples/sriov_pf_ovs_dpdk.yaml#L71

If this is not present in the document for nic-partitioning, then it's good to add.

Comment 11 Haresh Khandelwal 2021-07-29 08:59:10 UTC
we did couple of times manual run (reboot and validations) and didn't find any issue. This may be related timing issue with automation. 
Miguel is going to test few more times and update.

Thanks

Comment 12 Miguel Angel Nieto 2021-07-29 09:23:44 UTC
there must be some kind of race condition, this time i needed to execute the testcase 4 or 5 times to make it fail. After failing, i have no tenant traffic ping and I can see that vf configuration changed:

before reboot
12: enp130s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether f8:f2:1e:03:b4:80 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether f2:54:9c:3b:63:da brd ff:ff:ff:ff:ff:ff, vlan 120, spoof checking off, link-state auto, trust on
    vf 1     link/ether ca:7e:f1:58:29:61 brd ff:ff:ff:ff:ff:ff, vlan 122, spoof checking off, link-state auto, trust on
    vf 2     link/ether 02:20:e0:a5:1a:67 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on

after reboot
12: enp130s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether f8:f2:1e:03:b4:80 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 1e:8b:b3:38:b7:cb brd ff:ff:ff:ff:ff:ff, vlan 120, spoof checking off, link-state auto, trust on
    vf 1     link/ether 86:69:7e:62:85:ce brd ff:ff:ff:ff:ff:ff, vlan 122, spoof checking off, link-state auto, trust on
    vf 2     link/ether ca:49:53:a9:a4:09 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off

Comment 13 Miguel Angel Nieto 2021-07-29 09:28:17 UTC
I didt put in my previous comment enp130s0f1, but same behaviour, vf 2 is unconfigured
13: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether f8:f2:1e:03:b4:82 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 1e:8b:b3:38:b7:cb brd ff:ff:ff:ff:ff:ff, vlan 120, spoof checking off, link-state auto, trust on
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 122, spoof checking off, link-state auto, trust on
    vf 2     link/ether 16:2f:f1:b5:14:a7 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off

This is something I do not understand either, for enp130s0f1, virtual functions are missing from the kernel
[root@computeovndpdksriov-1 extra]# ip a | grep enp130s0f0
12: enp130s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
34: enp130s0f0v0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond_api state UP group default qlen 1000
35: enp130s0f0v5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
36: enp130s0f0v6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
37: enp130s0f0v7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
38: enp130s0f0v4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
39: enp130s0f0v3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
40: enp130s0f0v9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
41: enp130s0f0v1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master storage_bond state UP group default qlen 1000
42: enp130s0f0v8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
[root@computeovndpdksriov-1 extra]# ip a | grep enp130s0f1
13: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
44: enp130s0f1v0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond_api state UP group default qlen 1000

Comment 14 Sanjay Upadhyay 2021-07-29 13:53:35 UTC
Since we are randomly seeing this issue in our CI and with random failures in CI, I am marking this as test blocker.

Comment 19 Sanjay Upadhyay 2021-08-06 16:43:29 UTC
@hareshkhandelwal I tried with reduced VF and also changes as listed in https://docs.google.com/document/d/1CbyPTQ7ZDpcIDaqYAobNGZ_92Iprfon0f7YoAm9GgVA/edit#heading=h.v3t78c1gzj59

However, intermittently all day2 tests are failing - I ran 4 times, and once it failed this is with the compose RHOS-16.2-RHEL-8-20210804.n.0

Comment 28 Saravanan KR 2021-08-16 10:37:00 UTC
RHEL Clone - https://bugzilla.redhat.com/show_bug.cgi?id=1993882

Comment 45 Vijayalakshmi Candappa 2022-02-16 06:32:44 UTC
@sanjay, I checked the compose at http://rhos-qe-mirror-tlv.usersys.redhat.com/rcm-guest/puddles/OpenStack/16.2-RHEL-8/RHOS-16.2-RHEL-8-20220210.n.1/compose/OpenStack/x86_64/os/Packages/,
and the patches (there are 2 pacthes for this BZ) are merged in this
Updated the fixed in version

Comment 53 Miguel Angel Nieto 2022-06-15 13:41:17 UTC
I have reboot the server serveral times and I have not been able to reproduce it
RHOS-16.2-RHEL-8-20220610.n.1
os-net-config-11.5.1-2.20220404114957.173ef73.el8ost.noarch


10: enp4s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 04:3f:72:b8:be:f6 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 170, spoof checking off, link-state auto, trust on, query_rss off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 172, spoof checking off, link-state auto, trust on, query_rss off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on, query_rss off
    vf 3     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust on, query_rss off
    vf 4     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 5     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 6     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 7     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 8     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 9     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off

Comment 54 OSP Team 2022-06-23 10:38:32 UTC
According to our records, this should be resolved by os-net-config-11.5.1-2.20220404114957.173ef73.el8ost.  This build is available now.


Note You need to log in before you can comment on or make changes to this bug.