Bug 2023595

Summary: RHOSP17.0 dpdk Command: ethtool -I error Stderr: 'Cannot get driver information: No such device\\n'",
Product: Red Hat OpenStack Reporter: XinhuaLi <xili>
Component: os-net-configAssignee: Karthik Sundaravel <ksundara>
Status: CLOSED ERRATA QA Contact: nlevinki <nlevinki>
Severity: high Docs Contact:
Priority: high    
Version: 16.2 (Train)CC: bfournie, eshulman, hakhande, hbrock, jschluet, jslagle, ksundara, mburns, sbaker
Target Milestone: gaKeywords: Bugfix, Triaged
Target Release: 17.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: os-net-config-14.2.1-0.20220427221831.19307e0.el9ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-21 12:17:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description XinhuaLi 2021-11-16 06:04:14 UTC
Description of problem:
We can see error when deployment overcloud with mellanox nic card when use dpdk.
error messages as below .
--------------------------------------------
"Traceback (most recent call last):",
        "  File \"/usr/lib/python3.6/site-packages/os_net_config/utils.py\", line 327, in bind_dpdk_interfaces",
        "    attempts=10)",
        "  File \"/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py\", line 431, in execute",
        "    cmd=sanitized_cmd)",
        "oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.",
        "Command: ethtool -i ens5f1",
        "Exit code: 71",
        "Stdout: ''",
        "Stderr: 'Cannot get driver information: No such device\\n'",
        "",
        "During handling of the above exception, another exception occurred:",
        "",
        "Traceback (most recent call last):",
        "  File \"/bin/os-net-config\", line 10, in <module>",
        "    sys.exit(main())",
        "  File \"/usr/lib/python3.6/site-packages/os_net_config/cli.py\", line 343, in main",
        "    provider.add_object(obj)",
        "  File \"/usr/lib/python3.6/site-packages/os_net_config/__init__.py\", line 70, in add_object",
        "    self.add_object(member)",
        "  File \"/usr/lib/python3.6/site-packages/os_net_config/__init__.py\", line 104, in add_object",
        "    self.add_ovs_dpdk_bond(obj)",
        "  File \"/usr/lib/python3.6/site-packages/os_net_config/impl_ifcfg.py\", line 973, in add_ovs_dpdk_bond",
        "    utils.bind_dpdk_interfaces(ifname, dpdk_port.driver, self.noop)",
        "  File \"/usr/lib/python3.6/site-packages/os_net_config/utils.py\", line 331, in bind_dpdk_interfaces",
        "    raise OvsDpdkBindException(msg)",
        "os_net_config.utils.OvsDpdkBindException: Failed to bind interface ens5f1 with dpdk",
        "+ RETVAL=1",
        "+ set -e",
        "+ [[ 1 == 2 ]]",
        "+ [[ 1 != 0 ]]",
        "+ echo 'ERROR: os-net-config configuration failed.'",
        "ERROR: os-net-config configuration failed.",
        "+ exit 1",
        "+ configure_safe_defaults",
        "+ [[ 1 == 0 ]]",
        "+ cat",
--------------------------------------------------

Version-Release number of selected component (if applicable):
- os-net-config-11.5.0-2.20210528113720.48c6710.el8ost.2.noarch
- RHOSP 16.2

How reproducible:
Deployment dkdp via mellanox nic 

Steps to Reproduce:
1.
2.
3.

Actual results:
error message show as above

Expected results:

No error 

Additional info:

def bind_dpdk_interfaces(ifname, driver, noop):
    iface_driver = get_interface_driver(ifname)
    if iface_driver == driver:
        logger.info("Driver (%s) is already bound to the device (%s)" %
                    (driver, ifname))
        return
    pci_address = get_pci_address(ifname, noop)
    if not noop:
        if pci_address:
            # modbprobe of the driver has to be done before binding.
            # for reboots, puppet will add the modprobe to /etc/rc.modules
            if 'vfio-pci' in driver:
                try:
                    processutils.execute('modprobe', 'vfio-pci')
                except processutils.ProcessExecutionError:
                    msg = "Failed to modprobe vfio-pci module"
                    raise OvsDpdkBindException(msg)

            mac_address = interface_mac(ifname)
            vendor_id = get_vendor_id(ifname)
            try:
                out, err = processutils.execute('driverctl', 'set-override',
                                                pci_address, driver)
                if err:
                    msg = "Failed to bind dpdk interface err - %s" % err
                    raise OvsDpdkBindException(msg)
                else:
                    _update_dpdk_map(ifname, pci_address, mac_address, driver)
                    # Not like other nics, beacause mellanox nics keep the
                    # interface after binding it to dpdk, so we are adding
                    # ethtool command with 10 attempts after binding the driver
                    # just to make sure that the interface is initialized
                    # successfully in order not to fail in each of this cases:
                    # - get_dpdk_devargs() in case of OvsDpdkPort and
                    #   OvsDpdkBond.
                    # - bind_dpdk_interface() in case of OvsDpdkBond.
                    if vendor_id == "0x15b3":
                        processutils.execute('ethtool', '-i', ifname,
                                             attempts=10)  <<=== It seems that the actual behavior does not work as expected. When we bind mellanox nic, it does not show the interface, so ethtool can not see it and raise exception 

            except processutils.ProcessExecutionError:
                msg = "Failed to bind interface %s with dpdk" % ifname
                raise OvsDpdkBindException(msg)

Comment 1 Steve Baker 2021-11-16 20:34:57 UTC
Assigning to Networking for dsneddon triage

Comment 5 Karthik Sundaravel 2021-11-30 15:25:36 UTC
Hi,

please use driver: mlx5_core for the dpdk ports in the nic configs templates. In case of mellanox cards, we need not override the default driver with vfio-pci.

So it would return from the first if statement.
def bind_dpdk_interfaces(ifname, driver, noop):
    iface_driver = get_interface_driver(ifname)
    if iface_driver == driver:
        logger.info("Driver (%s) is already bound to the device (%s)" %
                    (driver, ifname))
        return

Sample templates: https://github.com/openstack/os-net-config/blob/master/etc/os-net-config/samples/ovs_dpdk.yaml#L16

Comment 6 XinhuaLi 2021-12-01 00:25:03 UTC
Hi Karthik,

Good day.
Thanks very much for your help.
So in that case, should we add advice in document like "In case of mellanox cards, we should use mlx5_core but not vfio-pci" or just add some detect mechanism into code? 

Regards
Sam

Comment 7 Karthik Sundaravel 2021-12-01 12:27:52 UTC
Hi 

We have a note added in our documentation [1]. We'll also make a patch that states the error clearly.

[1] https://access.redhat.com/documentation/fr-fr/red_hat_openstack_platform/16.2/html/network_functions_virtualization_planning_and_configuration_guide/ch-hardware-requirements#ref_supported-nics

Comment 18 errata-xmlrpc 2022-09-21 12:17:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543