Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2290818

Summary: OVS-dpdk intermittently fails to attach dpdk interfaces on reboot
Product: Red Hat OpenStack Reporter: David Sedgmen <dsedgmen>
Component: openvswitchAssignee: RHOSP:NFV_Eng <rhosp-nfv-int>
Status: CLOSED MIGRATED QA Contact: Eran Kuris <ekuris>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: aconole, apevec, chrisw, dmarchan, fleitner, fpalin, hakhande, jmarti, jveiraca, ksundara, njohnston, spapa
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-12-15 11:20:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Sedgmen 2024-06-07 03:46:44 UTC
Description of problem: On reboot intermittently ovs will get a permission denied when try to open VFIO container.

~~~
2024-06-06T05:35:39.857Z|00013|dpdk|INFO|EAL ARGS: ovs-vswitchd -n 4 --socket-mem 4096,4096 --socket-limit 4096,4096 -l 0.
2024-06-06T05:35:39.861Z|00014|dpdk|INFO|EAL: Detected 112 lcore(s)
2024-06-06T05:35:39.861Z|00015|dpdk|INFO|EAL: Detected 2 NUMA nodes
2024-06-06T05:35:39.861Z|00016|dpdk|INFO|EAL: Detected static linkage of DPDK
2024-06-06T05:35:39.862Z|00017|dpdk|INFO|EAL: Multi-process socket /var/run/openvswitch/dpdk/rte/mp_socket
2024-06-06T05:35:39.884Z|00018|dpdk|INFO|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
2024-06-06T05:35:39.884Z|00019|dpdk|INFO|EAL: Selected IOVA mode 'VA'
2024-06-06T05:35:39.884Z|00020|dpdk|INFO|EAL: 5120 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
2024-06-06T05:35:39.885Z|00021|dpdk|INFO|EAL: Probing VFIO support...
2024-06-06T05:35:39.885Z|00022|dpdk|ERR|EAL:   cannot open VFIO container, error 13 (Permission denied)
2024-06-06T05:35:39.885Z|00023|dpdk|INFO|EAL: VFIO support could not be initialized
2024-06-06T05:35:41.005Z|00024|dpdk|ERR|EAL: Requested device 0000:aa:00.0 cannot be used
2024-06-06T05:35:41.005Z|00025|dpdk|ERR|EAL: Requested device 0000:aa:00.1 cannot be used
2024-06-06T05:35:41.005Z|00026|dpdk|ERR|EAL: Requested device 0000:ff:00.0 cannot be used
2024-06-06T05:35:41.005Z|00027|dpdk|ERR|EAL: Requested device 0000:ff:00.1 cannot be used
2024-06-06T05:35:41.005Z|00028|dpdk|INFO|EAL: No legacy callbacks, legacy socket not created
2024-06-06T05:35:41.007Z|00029|dpdk|INFO|DPDK Enabled - initialized
~~~

This cause the dpdk interfaces to fail to attach to the ovs bridge

~~~
    Bridge br-ex1
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        datapath_type: netdev
        Port dpdkbond3
            Interface dpdk1
                type: dpdk
                options: {dpdk-devargs="0000:aa:00.1", n_rxq="2"}
                error: "Error attaching device '0000:aa:00.1' to DPDK"
            Interface dpdk0
                type: dpdk
                options: {dpdk-devargs="0000:aa:00.0", n_rxq="2"}
                error: "Error attaching device '0000:aa:00.0' to DPDK"
    Bridge br-ex2
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        datapath_type: netdev
        Port br-ex2
            Interface br-ex2
                type: internal
        Port dpdkbond4
            Interface dpdk3
                type: dpdk
                options: {dpdk-devargs="0000:ff:00.1", n_rxq="2"}
                error: "Error attaching device '0000:ff:00.1' to DPDK"
            Interface dpdk2
                type: dpdk
                options: {dpdk-devargs="0000:ff:00.0", n_rxq="2"}
                error: "Error attaching device '0000:ff:00.0' to DPDK"
~~~

After restarting ovs it will attach

~~~
2024-06-06T01:28:29.679Z|00014|dpdk|INFO|EAL: Detected 112 lcore(s)
2024-06-06T01:28:29.679Z|00015|dpdk|INFO|EAL: Detected 2 NUMA nodes
2024-06-06T01:28:29.680Z|00016|dpdk|INFO|EAL: Detected static linkage of DPDK
2024-06-06T01:28:29.680Z|00017|dpdk|INFO|EAL: Multi-process socket /var/run/openvswitch/dpdk/rte/mp_socket
2024-06-06T01:28:29.710Z|00018|dpdk|INFO|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
2024-06-06T01:28:29.710Z|00019|dpdk|INFO|EAL: Selected IOVA mode 'VA'
2024-06-06T01:28:29.710Z|00020|dpdk|INFO|EAL: 5120 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
2024-06-06T01:28:29.710Z|00021|dpdk|INFO|EAL: Probing VFIO support...
2024-06-06T01:28:29.710Z|00022|dpdk|INFO|EAL: VFIO support initialized
2024-06-06T01:28:31.089Z|00023|dpdk|INFO|EAL:   using IOMMU type 1 (Type 1)
2024-06-06T01:28:31.689Z|00024|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:aa:00.0 (socket 0)
2024-06-06T01:28:32.186Z|00025|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.16.0, ICE OS Default Package
2024-06-06T01:28:32.446Z|00026|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:aa:00.1 (socket 0)
2024-06-06T01:28:32.520Z|00027|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.16.0, ICE OS Default Package
2024-06-06T01:28:32.905Z|00028|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:ff:00.0 (socket 1)
2024-06-06T01:28:33.291Z|00029|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.16.0, ICE OS Default Package
2024-06-06T01:28:33.558Z|00030|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:ff:00.1 (socket 1)
2024-06-06T01:28:33.618Z|00031|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.16.0, ICE OS Default Package
2024-06-06T01:28:33.674Z|00032|dpdk|INFO|EAL: No legacy callbacks, legacy socket not created
2024-06-06T01:28:33.677Z|00033|dpdk|INFO|DPDK Enabled - initialized
~~~

Version-Release number of selected component (if applicable):

openvswitch2.15-2.15.0-142.el8fdp.x86_64                    
rhosp-openvswitch-2.15-4.el8ost.1.noarch

How reproducible:

It it occuring about in about 1 out 5 reboots of the compute nodes


Actual results:

Sometime compute nodes start without networking for ovs-dpdk

Expected results:

compute nodes start with networking for ovs-dpdk

Additional info:

We tried this workaround but did not work.
https://access.redhat.com/solutions/4093751
https://mail.openvswitch.org/pipermail/ovs-dev/2019-April/358322.html

Bugs was open for osp 10 but was closed with INSUFFICIENT_DATA
https://bugzilla.redhat.com/show_bug.cgi?id=1683817

Looking through the logs and comparing what is happening after ovs is restarted, I think we are hitting a race condition. 

Even on the compute nodes that are working we can see permission errors, but they are on opening the devs not /dev/vifo/vifo. 
~~~
2024-06-06T05:36:02.026Z|00021|dpdk|INFO|EAL: Probing VFIO support...
2024-06-06T05:36:02.026Z|00022|dpdk|INFO|EAL: VFIO support initialized
2024-06-06T05:36:03.491Z|00023|dpdk|ERR|EAL: Cannot open /dev/vfio/34: Permission denied
2024-06-06T05:36:03.491Z|00024|dpdk|ERR|EAL: Failed to open group 34
2024-06-06T05:36:03.491Z|00025|dpdk|ERR|EAL: Requested device 0000:aa:00.0 cannot be used
2024-06-06T05:36:03.491Z|00026|dpdk|ERR|EAL: Cannot open /dev/vfio/35: Permission denied
2024-06-06T05:36:03.491Z|00027|dpdk|ERR|EAL: Failed to open group 35
2024-06-06T05:36:03.491Z|00028|dpdk|ERR|EAL: Requested device 0000:aa:00.1 cannot be used
2024-06-06T05:36:03.491Z|00029|dpdk|ERR|EAL: Cannot open /dev/vfio/173: Permission denied
2024-06-06T05:36:03.491Z|00030|dpdk|ERR|EAL: Failed to open group 173
2024-06-06T05:36:03.491Z|00031|dpdk|ERR|EAL: Requested device 0000:ff:00.0 cannot be used
2024-06-06T05:36:03.491Z|00032|dpdk|ERR|EAL: Cannot open /dev/vfio/174: Permission denied
2024-06-06T05:36:03.491Z|00033|dpdk|ERR|EAL: Failed to open group 174
2024-06-06T05:36:03.491Z|00034|dpdk|ERR|EAL: Requested device 0000:ff:00.1 cannot be used
2024-06-06T05:36:03.491Z|00035|dpdk|INFO|EAL: No legacy callbacks, legacy socket not created
2024-06-06T05:36:03.494Z|00036|dpdk|INFO|DPDK Enabled - initialized
2024-06-06T05:36:03.498Z|00037|pmd_perf|INFO|DPDK provided TSC frequency: 2190000 KHz
~~~