Description of problem: We can see error when deployment overcloud with mellanox nic card when use dpdk. error messages as below . -------------------------------------------- "Traceback (most recent call last):", " File \"/usr/lib/python3.6/site-packages/os_net_config/utils.py\", line 327, in bind_dpdk_interfaces", " attempts=10)", " File \"/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py\", line 431, in execute", " cmd=sanitized_cmd)", "oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.", "Command: ethtool -i ens5f1", "Exit code: 71", "Stdout: ''", "Stderr: 'Cannot get driver information: No such device\\n'", "", "During handling of the above exception, another exception occurred:", "", "Traceback (most recent call last):", " File \"/bin/os-net-config\", line 10, in <module>", " sys.exit(main())", " File \"/usr/lib/python3.6/site-packages/os_net_config/cli.py\", line 343, in main", " provider.add_object(obj)", " File \"/usr/lib/python3.6/site-packages/os_net_config/__init__.py\", line 70, in add_object", " self.add_object(member)", " File \"/usr/lib/python3.6/site-packages/os_net_config/__init__.py\", line 104, in add_object", " self.add_ovs_dpdk_bond(obj)", " File \"/usr/lib/python3.6/site-packages/os_net_config/impl_ifcfg.py\", line 973, in add_ovs_dpdk_bond", " utils.bind_dpdk_interfaces(ifname, dpdk_port.driver, self.noop)", " File \"/usr/lib/python3.6/site-packages/os_net_config/utils.py\", line 331, in bind_dpdk_interfaces", " raise OvsDpdkBindException(msg)", "os_net_config.utils.OvsDpdkBindException: Failed to bind interface ens5f1 with dpdk", "+ RETVAL=1", "+ set -e", "+ [[ 1 == 2 ]]", "+ [[ 1 != 0 ]]", "+ echo 'ERROR: os-net-config configuration failed.'", "ERROR: os-net-config configuration failed.", "+ exit 1", "+ configure_safe_defaults", "+ [[ 1 == 0 ]]", "+ cat", -------------------------------------------------- Version-Release number of selected component (if applicable): - os-net-config-11.5.0-2.20210528113720.48c6710.el8ost.2.noarch - RHOSP 16.2 How reproducible: Deployment dkdp via mellanox nic Steps to Reproduce: 1. 2. 3. Actual results: error message show as above Expected results: No error Additional info: def bind_dpdk_interfaces(ifname, driver, noop): iface_driver = get_interface_driver(ifname) if iface_driver == driver: logger.info("Driver (%s) is already bound to the device (%s)" % (driver, ifname)) return pci_address = get_pci_address(ifname, noop) if not noop: if pci_address: # modbprobe of the driver has to be done before binding. # for reboots, puppet will add the modprobe to /etc/rc.modules if 'vfio-pci' in driver: try: processutils.execute('modprobe', 'vfio-pci') except processutils.ProcessExecutionError: msg = "Failed to modprobe vfio-pci module" raise OvsDpdkBindException(msg) mac_address = interface_mac(ifname) vendor_id = get_vendor_id(ifname) try: out, err = processutils.execute('driverctl', 'set-override', pci_address, driver) if err: msg = "Failed to bind dpdk interface err - %s" % err raise OvsDpdkBindException(msg) else: _update_dpdk_map(ifname, pci_address, mac_address, driver) # Not like other nics, beacause mellanox nics keep the # interface after binding it to dpdk, so we are adding # ethtool command with 10 attempts after binding the driver # just to make sure that the interface is initialized # successfully in order not to fail in each of this cases: # - get_dpdk_devargs() in case of OvsDpdkPort and # OvsDpdkBond. # - bind_dpdk_interface() in case of OvsDpdkBond. if vendor_id == "0x15b3": processutils.execute('ethtool', '-i', ifname, attempts=10) <<=== It seems that the actual behavior does not work as expected. When we bind mellanox nic, it does not show the interface, so ethtool can not see it and raise exception except processutils.ProcessExecutionError: msg = "Failed to bind interface %s with dpdk" % ifname raise OvsDpdkBindException(msg)
Assigning to Networking for dsneddon triage
Hi, please use driver: mlx5_core for the dpdk ports in the nic configs templates. In case of mellanox cards, we need not override the default driver with vfio-pci. So it would return from the first if statement. def bind_dpdk_interfaces(ifname, driver, noop): iface_driver = get_interface_driver(ifname) if iface_driver == driver: logger.info("Driver (%s) is already bound to the device (%s)" % (driver, ifname)) return Sample templates: https://github.com/openstack/os-net-config/blob/master/etc/os-net-config/samples/ovs_dpdk.yaml#L16
Hi Karthik, Good day. Thanks very much for your help. So in that case, should we add advice in document like "In case of mellanox cards, we should use mlx5_core but not vfio-pci" or just add some detect mechanism into code? Regards Sam
Hi We have a note added in our documentation [1]. We'll also make a patch that states the error clearly. [1] https://access.redhat.com/documentation/fr-fr/red_hat_openstack_platform/16.2/html/network_functions_virtualization_planning_and_configuration_guide/ch-hardware-requirements#ref_supported-nics
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543