Bug 2055731 - dpdk drivers not loaded when booting a compute
Summary: dpdk drivers not loaded when booting a compute
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-net-config
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z2
: 16.2 (Train on RHEL 8.4)
Assignee: Karthik Sundaravel
QA Contact: Miguel Angel Nieto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-17 15:26 UTC by Miguel Angel Nieto
Modified: 2022-03-23 22:13 UTC (History)
15 users (show)

Fixed In Version: os-net-config-11.5.1-2.20211207004924.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-23 22:13:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker NFV-2421 0 None None None 2022-02-23 15:18:43 UTC
Red Hat Issue Tracker OSP-12754 0 None None None 2022-02-17 15:31:13 UTC
Red Hat Product Errata RHBA-2022:1001 0 None None None 2022-03-23 22:13:15 UTC

Description Miguel Angel Nieto 2022-02-17 15:26:49 UTC
Description of problem:
After rebooting a compute, dpdk interfaces are broken because the driver is not loaded in those interfaces

Before reboot
[        Port dpdkbond0
            Interface dpdk1
                type: dpdk
                options: {dpdk-devargs="0000:06:00.1", n_rxq="1"}
            Interface dpdk0
                type: dpdk
                options: {dpdk-devargs="0000:06:00.0", n_rxq="1"}
        Port br-link0
--
    Bridge br-dpdk0
        fail_mode: standalone
--
        Port br-dpdk0
            Interface br-dpdk0
                type: internal
        Port dpdk2
            Interface dpdk2
                type: dpdk
                options: {dpdk-devargs="0000:82:00.2"}
    Bridge br-int
--
    Bridge br-dpdk1
        fail_mode: standalone
--
        Port dpdk3
            Interface dpdk3
                type: dpdk
                options: {dpdk-devargs="0000:82:00.3"}
        Port br-dpdk1
            Interface br-dpdk1
                type: internal

[root@computedpdksriov-r730-0 heat-admin]# driverctl list-overrides
0000:06:00.0 vfio-pci
0000:06:00.1 vfio-pci
0000:82:00.2 vfio-pci
0000:82:00.3 vfio-pci

After reboot
[root@computedpdksriov-r730-0 heat-admin]# ovs-vsctl show | grep -A1 dpdk
    Bridge br-dpdk0
        fail_mode: standalone
--
        Port dpdk2
            Interface dpdk2
                type: dpdk
                options: {dpdk-devargs="0000:82:00.2"}
                error: "Error attaching device '0000:82:00.2' to DPDK"
        Port br-dpdk0
            Interface br-dpdk0
                type: internal
--
    Bridge br-dpdk1
        fail_mode: standalone
--
        Port dpdk3
            Interface dpdk3
                type: dpdk
                options: {dpdk-devargs="0000:82:00.3"}
                error: "Error attaching device '0000:82:00.3' to DPDK"
        Port br-dpdk1
            Interface br-dpdk1
                type: internal
--
        Port dpdkbond0
            Interface dpdk1
                type: dpdk
                options: {dpdk-devargs="0000:06:00.1", n_rxq="1"}
                error: "Error attaching device '0000:06:00.1' to DPDK"
            Interface dpdk0
                type: dpdk
                options: {dpdk-devargs="0000:06:00.0", n_rxq="1"}
                error: "Error attaching device '0000:06:00.0' to DPDK"

[root@computedpdksriov-r730-0 heat-admin]# driverctl list-overrides
driverctl: No overridable devices found. Kernel too old?

If i execute os-net-config: [root@computedpdksriov-r730-0 heat-admin]# os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes

[root@computedpdksriov-r730-0 heat-admin]# ovs-vsctl show | grep -A1 dpdk
    Bridge br-dpdk0
        fail_mode: standalone
--
        Port dpdk2
            Interface dpdk2
                type: dpdk
                options: {dpdk-devargs="0000:82:00.2"}
        Port br-dpdk0
            Interface br-dpdk0
                type: internal
--
    Bridge br-dpdk1
        fail_mode: standalone
--
        Port dpdk3
            Interface dpdk3
                type: dpdk
                options: {dpdk-devargs="0000:82:00.3"}
        Port br-dpdk1
            Interface br-dpdk1
                type: internal
--
        Port dpdkbond0
            Interface dpdk1
                type: dpdk
                options: {dpdk-devargs="0000:06:00.1", n_rxq="1"}
            Interface dpdk0
                type: dpdk
                options: {dpdk-devargs="0000:06:00.0", n_rxq="1"}
    ovs_version: "2.15.4"

[root@computedpdksriov-r730-0 heat-admin]# driverctl list-overrides
0000:06:00.0 vfio-pci
0000:06:00.1 vfio-pci
0000:82:00.2 vfio-pci
0000:82:00.3 vfio-pci






Version-Release number of selected component (if applicable):
RHOS-16.2-RHEL-8-20220210.n.1(undercloud) 

How reproducible:
1. deploy osp16.2. Templates must configure dpdk on one or more nics
2. reboot one compute
3. check if dpdk is properly configured



Actual results:
os-net-config is not executed after reboot and dpdk nics are not properly configured


Expected results:
os-net-config should be executed and dpdk nics should be properly configured


Additional info:
i will upload sos report and templates used

Comment 8 Karthik Sundaravel 2022-02-24 05:20:43 UTC
After reboot, I executed the below commands.

[root@computedpdksriov-r740-0 heat-admin]# lsmod | grep vfio                                                                                                                                  
vfio_iommu_type1       36864  0                                                                                                                                                               
vfio                   36864  2 vfio_iommu_type1 

[root@computedpdksriov-r740-0 heat-admin]# modprobe vfio-pci

[root@computedpdksriov-r740-0 heat-admin]# lsmod | grep vfio
vfio_pci               61440  0   
vfio_virqfd            16384  1 vfio_pci
irqbypass              16384  2 vfio_pci,kvm                               
vfio_iommu_type1       36864  0
vfio                   36864  3 vfio_iommu_type1,vfio_pci

[root@computedpdksriov-r740-0 heat-admin]# driverctl list-overrides
driverctl: No overridable devices found. Kernel too old?


[root@computedpdksriov-r740-0 heat-admin]# driverctl set-override 0000:af:00.2 vfio-pci
[root@computedpdksriov-r740-0 heat-admin]# sudo ovs-vsctl show             
c9cf7aa8-60ef-46ef-9afc-0c322c12ebf3
    Manager "ptcp:6640:127.0.0.1" 
        is_connected: true  
    Bridge br-link0                                                        
        fail_mode: standalone
        datapath_type: netdev
        Port dpdkbond0       
            Interface dpdk0
                type: dpdk    
                options: {dpdk-devargs="0000:af:00.2", n_rxq="1"}
            Interface dpdk1
                type: dpdk 
                options: {dpdk-devargs="0000:af:00.3", n_rxq="1"}
                error: "Error attaching device '0000:af:00.3' to DPDK"
        Port br-link0                                                 
            tag: 171     
            Interface br-link0            
                type: internal
    Bridge br-dpdk0
        fail_mode: standalone
        datapath_type: netdev
        Port dpdk2
            Interface dpdk2
                type: dpdk
                options: {dpdk-devargs="0000:3b:00.0"}
                error: "Error attaching device '0000:3b:00.0' to DPDK"
        Port br-dpdk0
            Interface br-dpdk0
                type: internal

**** Conclusion *****
So, while re running os-net-config, the below commands helps in fixing the issue.
modprobe vfio-pci
driverctl set-override 

We need to understand what change has created the need for these.

Comment 23 errata-xmlrpc 2022-03-23 22:13:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.2), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1001


Note You need to log in before you can comment on or make changes to this bug.