This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
Bug 2290818 - OVS-dpdk intermittently fails to attach dpdk interfaces on reboot
Summary: OVS-dpdk intermittently fails to attach dpdk interfaces on reboot
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: RHOSP:NFV_Eng
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-06-07 03:46 UTC by David Sedgmen
Modified: 2024-12-15 11:20 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-12-15 11:20:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-32228 0 None None None 2024-06-07 03:48:11 UTC
Red Hat Issue Tracker OSP-33235 0 None None None 2024-12-15 11:20:33 UTC
Red Hat Issue Tracker   OSPRH-12451 0 None None None 2024-12-15 11:20:43 UTC

Description David Sedgmen 2024-06-07 03:46:44 UTC
Description of problem: On reboot intermittently ovs will get a permission denied when try to open VFIO container.

~~~
2024-06-06T05:35:39.857Z|00013|dpdk|INFO|EAL ARGS: ovs-vswitchd -n 4 --socket-mem 4096,4096 --socket-limit 4096,4096 -l 0.
2024-06-06T05:35:39.861Z|00014|dpdk|INFO|EAL: Detected 112 lcore(s)
2024-06-06T05:35:39.861Z|00015|dpdk|INFO|EAL: Detected 2 NUMA nodes
2024-06-06T05:35:39.861Z|00016|dpdk|INFO|EAL: Detected static linkage of DPDK
2024-06-06T05:35:39.862Z|00017|dpdk|INFO|EAL: Multi-process socket /var/run/openvswitch/dpdk/rte/mp_socket
2024-06-06T05:35:39.884Z|00018|dpdk|INFO|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
2024-06-06T05:35:39.884Z|00019|dpdk|INFO|EAL: Selected IOVA mode 'VA'
2024-06-06T05:35:39.884Z|00020|dpdk|INFO|EAL: 5120 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
2024-06-06T05:35:39.885Z|00021|dpdk|INFO|EAL: Probing VFIO support...
2024-06-06T05:35:39.885Z|00022|dpdk|ERR|EAL:   cannot open VFIO container, error 13 (Permission denied)
2024-06-06T05:35:39.885Z|00023|dpdk|INFO|EAL: VFIO support could not be initialized
2024-06-06T05:35:41.005Z|00024|dpdk|ERR|EAL: Requested device 0000:aa:00.0 cannot be used
2024-06-06T05:35:41.005Z|00025|dpdk|ERR|EAL: Requested device 0000:aa:00.1 cannot be used
2024-06-06T05:35:41.005Z|00026|dpdk|ERR|EAL: Requested device 0000:ff:00.0 cannot be used
2024-06-06T05:35:41.005Z|00027|dpdk|ERR|EAL: Requested device 0000:ff:00.1 cannot be used
2024-06-06T05:35:41.005Z|00028|dpdk|INFO|EAL: No legacy callbacks, legacy socket not created
2024-06-06T05:35:41.007Z|00029|dpdk|INFO|DPDK Enabled - initialized
~~~

This cause the dpdk interfaces to fail to attach to the ovs bridge

~~~
    Bridge br-ex1
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        datapath_type: netdev
        Port dpdkbond3
            Interface dpdk1
                type: dpdk
                options: {dpdk-devargs="0000:aa:00.1", n_rxq="2"}
                error: "Error attaching device '0000:aa:00.1' to DPDK"
            Interface dpdk0
                type: dpdk
                options: {dpdk-devargs="0000:aa:00.0", n_rxq="2"}
                error: "Error attaching device '0000:aa:00.0' to DPDK"
    Bridge br-ex2
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        datapath_type: netdev
        Port br-ex2
            Interface br-ex2
                type: internal
        Port dpdkbond4
            Interface dpdk3
                type: dpdk
                options: {dpdk-devargs="0000:ff:00.1", n_rxq="2"}
                error: "Error attaching device '0000:ff:00.1' to DPDK"
            Interface dpdk2
                type: dpdk
                options: {dpdk-devargs="0000:ff:00.0", n_rxq="2"}
                error: "Error attaching device '0000:ff:00.0' to DPDK"
~~~

After restarting ovs it will attach

~~~
2024-06-06T01:28:29.679Z|00014|dpdk|INFO|EAL: Detected 112 lcore(s)
2024-06-06T01:28:29.679Z|00015|dpdk|INFO|EAL: Detected 2 NUMA nodes
2024-06-06T01:28:29.680Z|00016|dpdk|INFO|EAL: Detected static linkage of DPDK
2024-06-06T01:28:29.680Z|00017|dpdk|INFO|EAL: Multi-process socket /var/run/openvswitch/dpdk/rte/mp_socket
2024-06-06T01:28:29.710Z|00018|dpdk|INFO|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
2024-06-06T01:28:29.710Z|00019|dpdk|INFO|EAL: Selected IOVA mode 'VA'
2024-06-06T01:28:29.710Z|00020|dpdk|INFO|EAL: 5120 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
2024-06-06T01:28:29.710Z|00021|dpdk|INFO|EAL: Probing VFIO support...
2024-06-06T01:28:29.710Z|00022|dpdk|INFO|EAL: VFIO support initialized
2024-06-06T01:28:31.089Z|00023|dpdk|INFO|EAL:   using IOMMU type 1 (Type 1)
2024-06-06T01:28:31.689Z|00024|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:aa:00.0 (socket 0)
2024-06-06T01:28:32.186Z|00025|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.16.0, ICE OS Default Package
2024-06-06T01:28:32.446Z|00026|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:aa:00.1 (socket 0)
2024-06-06T01:28:32.520Z|00027|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.16.0, ICE OS Default Package
2024-06-06T01:28:32.905Z|00028|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:ff:00.0 (socket 1)
2024-06-06T01:28:33.291Z|00029|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.16.0, ICE OS Default Package
2024-06-06T01:28:33.558Z|00030|dpdk|INFO|EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:ff:00.1 (socket 1)
2024-06-06T01:28:33.618Z|00031|dpdk|INFO|ice_load_pkg_type(): Active package is: 1.3.16.0, ICE OS Default Package
2024-06-06T01:28:33.674Z|00032|dpdk|INFO|EAL: No legacy callbacks, legacy socket not created
2024-06-06T01:28:33.677Z|00033|dpdk|INFO|DPDK Enabled - initialized
~~~

Version-Release number of selected component (if applicable):

openvswitch2.15-2.15.0-142.el8fdp.x86_64                    
rhosp-openvswitch-2.15-4.el8ost.1.noarch

How reproducible:

It it occuring about in about 1 out 5 reboots of the compute nodes


Actual results:

Sometime compute nodes start without networking for ovs-dpdk

Expected results:

compute nodes start with networking for ovs-dpdk

Additional info:

We tried this workaround but did not work.
https://access.redhat.com/solutions/4093751
https://mail.openvswitch.org/pipermail/ovs-dev/2019-April/358322.html

Bugs was open for osp 10 but was closed with INSUFFICIENT_DATA
https://bugzilla.redhat.com/show_bug.cgi?id=1683817

Looking through the logs and comparing what is happening after ovs is restarted, I think we are hitting a race condition. 

Even on the compute nodes that are working we can see permission errors, but they are on opening the devs not /dev/vifo/vifo. 
~~~
2024-06-06T05:36:02.026Z|00021|dpdk|INFO|EAL: Probing VFIO support...
2024-06-06T05:36:02.026Z|00022|dpdk|INFO|EAL: VFIO support initialized
2024-06-06T05:36:03.491Z|00023|dpdk|ERR|EAL: Cannot open /dev/vfio/34: Permission denied
2024-06-06T05:36:03.491Z|00024|dpdk|ERR|EAL: Failed to open group 34
2024-06-06T05:36:03.491Z|00025|dpdk|ERR|EAL: Requested device 0000:aa:00.0 cannot be used
2024-06-06T05:36:03.491Z|00026|dpdk|ERR|EAL: Cannot open /dev/vfio/35: Permission denied
2024-06-06T05:36:03.491Z|00027|dpdk|ERR|EAL: Failed to open group 35
2024-06-06T05:36:03.491Z|00028|dpdk|ERR|EAL: Requested device 0000:aa:00.1 cannot be used
2024-06-06T05:36:03.491Z|00029|dpdk|ERR|EAL: Cannot open /dev/vfio/173: Permission denied
2024-06-06T05:36:03.491Z|00030|dpdk|ERR|EAL: Failed to open group 173
2024-06-06T05:36:03.491Z|00031|dpdk|ERR|EAL: Requested device 0000:ff:00.0 cannot be used
2024-06-06T05:36:03.491Z|00032|dpdk|ERR|EAL: Cannot open /dev/vfio/174: Permission denied
2024-06-06T05:36:03.491Z|00033|dpdk|ERR|EAL: Failed to open group 174
2024-06-06T05:36:03.491Z|00034|dpdk|ERR|EAL: Requested device 0000:ff:00.1 cannot be used
2024-06-06T05:36:03.491Z|00035|dpdk|INFO|EAL: No legacy callbacks, legacy socket not created
2024-06-06T05:36:03.494Z|00036|dpdk|INFO|DPDK Enabled - initialized
2024-06-06T05:36:03.498Z|00037|pmd_perf|INFO|DPDK provided TSC frequency: 2190000 KHz
~~~


Note You need to log in before you can comment on or make changes to this bug.