Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1847651

Summary: Failing to configure nic partitioning over mellanox network cards
Product: Red Hat OpenStack Reporter: Miguel Angel Nieto <mnietoji>
Component: os-net-configAssignee: RHOS Maint <rhos-maint>
Status: CLOSED NOTABUG QA Contact: nlevinki <nlevinki>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 13.0 (Queens)CC: bfournie, cfields, hakhande, hbrock, jslagle, mburns, supadhya
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-17 15:52:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Miguel Angel Nieto 2020-06-16 18:38:06 UTC
Description of problem:
Failing os-net-config to configure openvswitch with VF ports belonging to a mellanox network card

Template used:

              - type: sriov_pf
                name: nic11
                mtu: 9000
                numvfs: 14
                use_dhcp: false
                defroute: false
                nm_controlled: true
                hotplug: true
                promisc: false

              - type: sriov_pf
                name: nic12
                mtu: 9000
                numvfs: 14
                use_dhcp: false
                defroute: false
                nm_controlled: true
                hotplug: true
                promisc: false

              - type: linux_bond
                name: storage_bond
                bonding_options: mode=active-backup
                use_dhcp: false
                members:
                - type: sriov_vf
                  device: nic11
                  vfid: 2
                - type: sriov_vf
                  device: nic12
                  vfid: 2

              - type: ovs_user_bridge
                name: br-link0
                use_dhcp: false
                ovs_extra:
                  - str_replace:
                      template: set port br-link0 tag=_VLAN_TAG_
                      params:
                        _VLAN_TAG_:
                          get_param: TenantNetworkVlanID
                addresses:
                  - ip_netmask:
                      get_param: TenantIpSubnet
                members:
                  - type: ovs_dpdk_bond
                    name: dpdkbond0
                    mtu: 9000
                    rx_queue: 2
                    members:
                      - type: ovs_dpdk_port
                        name: dpdk0
                        members:
                          - type: sriov_vf
                            device: nic12
                            vfid: 3
                      - type: ovs_dpdk_port
                        name: dpdk1
                        members:
                          - type: sriov_vf
                            device: nic11
                            vfid: 3
nic11 and nic12 are: 
      nic11: p6p1
      nic12: p6p2
In the compute I can see VFs:
[root@computeovsdpdksriov-1 log]# lspci | grep Mellanox
04:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
04:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
04:00.2 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:00.3 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:00.4 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:00.5 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:00.6 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:00.7 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:01.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:01.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:01.2 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:01.3 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:01.4 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:01.5 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:01.6 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:01.7 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:02.2 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:02.3 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:02.4 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:02.5 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:02.6 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:02.7 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:03.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:03.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:03.2 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:03.3 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:03.4 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:03.5 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:03.6 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:03.7 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]

[root@computeovsdpdksriov-1 log]# find /sys | grep p6p1 | awk -F '/' '{print $6,$8}' | sort | uniq
 
0000:04:00.0 p6p1
0000:04:00.2 p6p1_0
0000:04:00.3 p6p1_1
0000:04:00.4 p6p1_2
0000:04:00.6 p6p1_4
0000:04:00.7 p6p1_5
0000:04:01.0 p6p1_6
0000:04:01.1 p6p1_7
0000:04:01.2 p6p1_8
0000:04:01.3 p6p1_9
0000:04:01.4 p6p1_10
0000:04:01.5 p6p1_11
0000:04:01.6 p6p1_12
0000:04:01.7 p6p1_13
internal_bond 
storage_bond 
[root@computeovsdpdksriov-1 log]# find /sys | grep p6p2 | awk -F '/' '{print $6,$8}' | sort | uniq
 
0000:04:00.1 p6p2
0000:04:02.2 p6p2_0
0000:04:02.3 p6p2_1
0000:04:02.4 p6p2_2
0000:04:02.6 p6p2_4
0000:04:02.7 p6p2_5
0000:04:03.0 p6p2_6
0000:04:03.1 p6p2_7
0000:04:03.2 p6p2_8
0000:04:03.3 p6p2_9
0000:04:03.4 p6p2_10
0000:04:03.5 p6p2_11
0000:04:03.6 p6p2_12
0000:04:03.7 p6p2_13
internal_bond 
storage_bond 

Getting this warning with os-net-config
[2020/06/16 02:30:19 PM] [INFO] Active nics are ['em1', 'em2', 'p4p1', 'p4p2', 'p6p1', 'p6p1_3', 'p6p2', 'p6p2_3', 'p7p3', 'p7p4']
[2020/06/16 02:30:19 PM] [WARNING] no mapping for interface p6p1_3 because nic6 is mapped to p4p4
[2020/06/16 02:30:19 PM] [WARNING] no mapping for interface p6p2_3 because nic8 is mapped to p7p2

OVS
[root@computeovsdpdksriov-1 log]# ovs-vsctl show
22f91518-3484-4708-9844-0d72d34c28a9
    Bridge "br-link0"
        fail_mode: standalone
        Port "dpdkbond0"
            Interface "dpdk1"
                type: dpdk
                options: {dpdk-devargs="c", n_rxq="2"}
                error: "Error attaching device '0000:04:00.5' to DPDK"
            Interface "dpdk0"
                type: dpdk
                options: {dpdk-devargs="0000:04:02.5", n_rxq="2"}
                error: "Error attaching device '0000:04:02.5' to DPDK"
        Port "br-link0"
            tag: 121
            Interface "br-link0"
                type: internal
    ovs_version: "2.11.0"

ovs is being configured with addresses 0000:04:00.5 and 0000:04:02.5 that does not exist, so, it fail to attach to dpdk




Version-Release number of selected component (if applicable):
2020-06-09.2(undercloud)

How reproducible:
Configure a bonding in ovs using mellanox VFs


Actual results:
Deployment fails


Expected results:
Deployment shoud work


Additional info:

Comment 1 Miguel Angel Nieto 2020-06-16 18:48:25 UTC
Sorry, addresses exist, but ovs fails to attach to them

04:00.5 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
04:02.5 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]

[root@computeovsdpdksriov-1 etc]# ovs-vsctl -t 10 -- --if-exists del-port br-link0 dpdkbond0 -- add-bond br-link0 dpdkbond0 dpdk0 dpdk1 -- set interface dpdk0 type=dpdk -- set interface dpdk1 type=dpdk -- set Interface dpdk0 options:dpdk-devargs=0000:04:02.5 -- set Interface dpdk1 options:dpdk-devargs=0000:04:00.5 -- set Interface dpdk0 mtu_request=9000 -- set Interface dpdk1 mtu_request=9000 -- set Interface dpdk0 options:n_rxq=2 -- set Interface dpdk1 options:n_rxq=2
ovs-vsctl: Error detected while setting up 'dpdk0': Error attaching device '0000:04:02.5' to DPDK.  See ovs-vswitchd log for details.
ovs-vsctl: Error detected while setting up 'dpdk1': Error attaching device '0000:04:00.5' to DPDK.  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch".

020-06-16T18:46:13.798Z|00563|dpdk|INFO|EAL: PCI device 0000:04:02.5 on NUMA socket 0
2020-06-16T18:46:13.798Z|00564|dpdk|INFO|EAL:   probe driver: 15b3:1018 net_mlx5
2020-06-16T18:46:13.800Z|00565|dpdk|WARN|net_mlx5: no Verbs device matches PCI device 0000:04:02.5, are kernel drivers loaded?
2020-06-16T18:46:13.800Z|00566|dpdk|ERR|EAL: Driver cannot attach the device (0000:04:02.5)
2020-06-16T18:46:13.800Z|00567|dpdk|ERR|EAL: Failed to attach device on primary process
2020-06-16T18:46:13.800Z|00568|netdev_dpdk|WARN|Error attaching device '0000:04:02.5' to DPDK
2020-06-16T18:46:13.800Z|00569|netdev|WARN|dpdk0: could not set configuration (Invalid argument)
2020-06-16T18:46:13.800Z|00570|dpdk|ERR|Invalid port_id=128
2020-06-16T18:46:43.660Z|00571|dpdk|INFO|EAL: PCI device 0000:04:00.5 on NUMA socket 0
2020-06-16T18:46:43.660Z|00572|dpdk|INFO|EAL:   probe driver: 15b3:1018 net_mlx5
2020-06-16T18:46:43.662Z|00573|dpdk|WARN|net_mlx5: no Verbs device matches PCI device 0000:04:00.5, are kernel drivers loaded?
2020-06-16T18:46:43.662Z|00574|dpdk|ERR|EAL: Driver cannot attach the device (0000:04:00.5)
2020-06-16T18:46:43.662Z|00575|dpdk|ERR|EAL: Failed to attach device on primary process
2020-06-16T18:46:43.662Z|00576|netdev_dpdk|WARN|Error attaching device '0000:04:00.5' to DPDK
2020-06-16T18:46:43.662Z|00577|netdev|WARN|dpdk1: could not set configuration (Invalid argument)
2020-06-16T18:46:43.662Z|00578|dpdk|ERR|Invalid port_id=128
2020-06-16T18:46:43.707Z|00579|dpdk|INFO|EAL: PCI device 0000:04:02.5 on NUMA socket 0
2020-06-16T18:46:43.707Z|00580|dpdk|INFO|EAL:   probe driver: 15b3:1018 net_mlx5
2020-06-16T18:46:43.709Z|00581|dpdk|WARN|net_mlx5: no Verbs device matches PCI device 0000:04:02.5, are kernel drivers loaded?
2020-06-16T18:46:43.709Z|00582|dpdk|ERR|EAL: Driver cannot attach the device (0000:04:02.5)
2020-06-16T18:46:43.709Z|00583|dpdk|ERR|EAL: Failed to attach device on primary process
2020-06-16T18:46:43.709Z|00584|netdev_dpdk|WARN|Error attaching device '0000:04:02.5' to DPDK
2020-06-16T18:46:43.709Z|00585|netdev|WARN|dpdk0: could not set configuration (Invalid argument)
2020-06-16T18:46:43.709Z|00586|dpdk|ERR|Invalid port_id=128

Comment 2 Miguel Angel Nieto 2020-06-17 15:52:46 UTC
Closing as configuration was wrong, missing driver. It should be something like this:

             - type: ovs_user_bridge
                name: br-link0
                use_dhcp: false
                ovs_extra:
                  - str_replace:
                      template: set port br-link0 tag=_VLAN_TAG_
                      params:
                        _VLAN_TAG_:
                          get_param: TenantNetworkVlanID
                addresses:
                  - ip_netmask:
                      get_param: TenantIpSubnet
                members:
                  - type: ovs_dpdk_bond
                    name: dpdkbond0
                    mtu: 9000
                    rx_queue: 2
                    members:
                      - type: ovs_dpdk_port
                        driver: mlx5_core
                        name: dpdk0
                        members:
                          - type: sriov_vf
                            device: nic12
                            vfid: 3
                      - type: ovs_dpdk_port
                        driver: mlx5_core
                        name: dpdk1
                        members:
                          - type: sriov_vf
                            device: nic11
                            vfid: 3

Comment 3 Chris Fields 2020-06-17 19:09:26 UTC
Well documented, Miguel.  I wrote this KCS based on your findings to get these symptoms and the solution into customers hands:

https://access.redhat.com/solutions/5165331 


CFields