Bug 2023595 - RHOSP17.0 dpdk Command: ethtool -I error Stderr: 'Cannot get driver information: No such device\\n'",
Summary: RHOSP17.0 dpdk Command: ethtool -I error Stderr: 'Cannot get driver informati...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-net-config
Version: 16.2 (Train)
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ga
: 17.0
Assignee: Karthik Sundaravel
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-16 06:04 UTC by XinhuaLi
Modified: 2022-09-21 12:17 UTC (History)
9 users (show)

Fixed In Version: os-net-config-14.2.1-0.20220427221831.19307e0.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-21 12:17:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 819999 0 None NEW Notify the need for overriding the default driver for Mellanox NIC 2021-12-02 06:35:59 UTC
OpenStack gerrit 827401 0 None NEW Notify the need for overriding the default driver for Mellanox NIC 2022-02-02 10:56:24 UTC
Red Hat Issue Tracker NFV-2342 0 None None None 2021-12-01 12:00:46 UTC
Red Hat Issue Tracker OSP-10817 0 None None None 2021-11-16 06:11:07 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:17:41 UTC

Description XinhuaLi 2021-11-16 06:04:14 UTC
Description of problem:
We can see error when deployment overcloud with mellanox nic card when use dpdk.
error messages as below .
--------------------------------------------
"Traceback (most recent call last):",
        "  File \"/usr/lib/python3.6/site-packages/os_net_config/utils.py\", line 327, in bind_dpdk_interfaces",
        "    attempts=10)",
        "  File \"/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py\", line 431, in execute",
        "    cmd=sanitized_cmd)",
        "oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.",
        "Command: ethtool -i ens5f1",
        "Exit code: 71",
        "Stdout: ''",
        "Stderr: 'Cannot get driver information: No such device\\n'",
        "",
        "During handling of the above exception, another exception occurred:",
        "",
        "Traceback (most recent call last):",
        "  File \"/bin/os-net-config\", line 10, in <module>",
        "    sys.exit(main())",
        "  File \"/usr/lib/python3.6/site-packages/os_net_config/cli.py\", line 343, in main",
        "    provider.add_object(obj)",
        "  File \"/usr/lib/python3.6/site-packages/os_net_config/__init__.py\", line 70, in add_object",
        "    self.add_object(member)",
        "  File \"/usr/lib/python3.6/site-packages/os_net_config/__init__.py\", line 104, in add_object",
        "    self.add_ovs_dpdk_bond(obj)",
        "  File \"/usr/lib/python3.6/site-packages/os_net_config/impl_ifcfg.py\", line 973, in add_ovs_dpdk_bond",
        "    utils.bind_dpdk_interfaces(ifname, dpdk_port.driver, self.noop)",
        "  File \"/usr/lib/python3.6/site-packages/os_net_config/utils.py\", line 331, in bind_dpdk_interfaces",
        "    raise OvsDpdkBindException(msg)",
        "os_net_config.utils.OvsDpdkBindException: Failed to bind interface ens5f1 with dpdk",
        "+ RETVAL=1",
        "+ set -e",
        "+ [[ 1 == 2 ]]",
        "+ [[ 1 != 0 ]]",
        "+ echo 'ERROR: os-net-config configuration failed.'",
        "ERROR: os-net-config configuration failed.",
        "+ exit 1",
        "+ configure_safe_defaults",
        "+ [[ 1 == 0 ]]",
        "+ cat",
--------------------------------------------------

Version-Release number of selected component (if applicable):
- os-net-config-11.5.0-2.20210528113720.48c6710.el8ost.2.noarch
- RHOSP 16.2

How reproducible:
Deployment dkdp via mellanox nic 

Steps to Reproduce:
1.
2.
3.

Actual results:
error message show as above

Expected results:

No error 

Additional info:

def bind_dpdk_interfaces(ifname, driver, noop):
    iface_driver = get_interface_driver(ifname)
    if iface_driver == driver:
        logger.info("Driver (%s) is already bound to the device (%s)" %
                    (driver, ifname))
        return
    pci_address = get_pci_address(ifname, noop)
    if not noop:
        if pci_address:
            # modbprobe of the driver has to be done before binding.
            # for reboots, puppet will add the modprobe to /etc/rc.modules
            if 'vfio-pci' in driver:
                try:
                    processutils.execute('modprobe', 'vfio-pci')
                except processutils.ProcessExecutionError:
                    msg = "Failed to modprobe vfio-pci module"
                    raise OvsDpdkBindException(msg)

            mac_address = interface_mac(ifname)
            vendor_id = get_vendor_id(ifname)
            try:
                out, err = processutils.execute('driverctl', 'set-override',
                                                pci_address, driver)
                if err:
                    msg = "Failed to bind dpdk interface err - %s" % err
                    raise OvsDpdkBindException(msg)
                else:
                    _update_dpdk_map(ifname, pci_address, mac_address, driver)
                    # Not like other nics, beacause mellanox nics keep the
                    # interface after binding it to dpdk, so we are adding
                    # ethtool command with 10 attempts after binding the driver
                    # just to make sure that the interface is initialized
                    # successfully in order not to fail in each of this cases:
                    # - get_dpdk_devargs() in case of OvsDpdkPort and
                    #   OvsDpdkBond.
                    # - bind_dpdk_interface() in case of OvsDpdkBond.
                    if vendor_id == "0x15b3":
                        processutils.execute('ethtool', '-i', ifname,
                                             attempts=10)  <<=== It seems that the actual behavior does not work as expected. When we bind mellanox nic, it does not show the interface, so ethtool can not see it and raise exception 

            except processutils.ProcessExecutionError:
                msg = "Failed to bind interface %s with dpdk" % ifname
                raise OvsDpdkBindException(msg)

Comment 1 Steve Baker 2021-11-16 20:34:57 UTC
Assigning to Networking for dsneddon triage

Comment 5 Karthik Sundaravel 2021-11-30 15:25:36 UTC
Hi,

please use driver: mlx5_core for the dpdk ports in the nic configs templates. In case of mellanox cards, we need not override the default driver with vfio-pci.

So it would return from the first if statement.
def bind_dpdk_interfaces(ifname, driver, noop):
    iface_driver = get_interface_driver(ifname)
    if iface_driver == driver:
        logger.info("Driver (%s) is already bound to the device (%s)" %
                    (driver, ifname))
        return

Sample templates: https://github.com/openstack/os-net-config/blob/master/etc/os-net-config/samples/ovs_dpdk.yaml#L16

Comment 6 XinhuaLi 2021-12-01 00:25:03 UTC
Hi Karthik,

Good day.
Thanks very much for your help.
So in that case, should we add advice in document like "In case of mellanox cards, we should use mlx5_core but not vfio-pci" or just add some detect mechanism into code? 

Regards
Sam

Comment 7 Karthik Sundaravel 2021-12-01 12:27:52 UTC
Hi 

We have a note added in our documentation [1]. We'll also make a patch that states the error clearly.

[1] https://access.redhat.com/documentation/fr-fr/red_hat_openstack_platform/16.2/html/network_functions_virtualization_planning_and_configuration_guide/ch-hardware-requirements#ref_supported-nics

Comment 18 errata-xmlrpc 2022-09-21 12:17:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.