Bug 1915299 - os-net-config fails to re-provision networking config on compute node with DPDK interfaces mapped to numbered interfaces
Summary: os-net-config fails to re-provision networking config on compute node with DP...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-net-config
Version: 13.0 (Queens)
Hardware: All
OS: All
high
high
Target Milestone: z6
: 16.1 (Train on RHEL 8.2)
Assignee: Dan Sneddon
QA Contact: Paras Babbar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-12 12:12 UTC by Alex Stupnikov
Modified: 2024-06-13 23:53 UTC (History)
9 users (show)

Fixed In Version: os-net-config-11.3.2-1.20210406083710.f49ab16.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-26 13:50:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1918036 0 high CLOSED os-net-config maps DPDK NICs twice if they are already active 2023-09-18 00:24:16 UTC
Red Hat Bugzilla 1918979 0 high CLOSED os-net-config maps DPDK NICs twice if they are already active 2024-10-01 17:26:32 UTC
Red Hat Issue Tracker OSP-1517 0 None None None 2022-08-30 14:28:30 UTC
Red Hat Product Errata RHBA-2021:2097 0 None None None 2021-05-26 13:51:21 UTC

Internal Links: 1918036

Description Alex Stupnikov 2021-01-12 12:12:26 UTC
Description of problem:

Customer reported a problem when deployment command fails for existing overcloud when invoked with templates that contain definition [1]. From provided sosreport (attached to case) it looks like this issue is caused by failed os-net-config.

Exception [2] is logged. From os-net-confg code it looks like that the failure occurs because /var/lib/os-net-config/dpdk_mapping.yaml doesn't contain a record for nic6. From provided sosreport I can see that var/lib/os-net-config/dpdk_mapping.yaml file is valid, but contains information for real interfaces instead of numbered interfaces (for example, it contains record for p2p2 instead of nic6).

Version-Release number of selected component (if applicable):

RHOSP 13, os-net-config-8.4.4-6.el7ost.noarch


How reproducible:

Run deployment command for existing overcloud which DPDK interfaces were provisioned using numbered NICs with "NetworkDeploymentActions: ['CREATE','UPDATE']"


[1]
NodeDPDKNetworkDeploymentActions: ['CREATE','UPDATE']

[2]
Jan 11 12:03:14 dpdk-compute0 os-collect-config: Traceback (most recent call last):
Jan 11 12:03:14 dpdk-compute0 os-collect-config: File "/usr/bin/os-net-config", line 10, in <module>
Jan 11 12:03:14 dpdk-compute0 os-collect-config: sys.exit(main())
Jan 11 12:03:14 dpdk-compute0 os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/cli.py", line 303, in main
Jan 11 12:03:14 dpdk-compute0 os-collect-config: provider.add_object(obj)
Jan 11 12:03:14 dpdk-compute0 os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/__init__.py", line 70, in add_object
Jan 11 12:03:14 dpdk-compute0 os-collect-config: self.add_object(member)
Jan 11 12:03:14 dpdk-compute0 os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/__init__.py", line 104, in add_object
Jan 11 12:03:14 dpdk-compute0 os-collect-config: self.add_ovs_dpdk_bond(obj)
Jan 11 12:03:14 dpdk-compute0 os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/impl_ifcfg.py", line 920, in add_ovs_dpdk_bond
Jan 11 12:03:14 dpdk-compute0 os-collect-config: utils.bind_dpdk_interfaces(ifname, dpdk_port.driver, self.noop)
Jan 11 12:03:14 dpdk-compute0 os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/utils.py", line 298, in bind_dpdk_interfaces
Jan 11 12:03:14 dpdk-compute0 os-collect-config: raise OvsDpdkBindException(msg)
Jan 11 12:03:14 dpdk-compute0 os-collect-config: os_net_config.utils.OvsDpdkBindException: Interface nic6 cannot be found

Comment 1 Dan Sneddon 2021-01-12 21:00:56 UTC
Looking at the attached support case, I see that the NICs are not being detected correctly. The NICs p2p1 and p2p2 are being detected twice, so the numbered NIC ordering is skipping nic6 and nic8 which are being mapped to p2p1 and p2p2, however these NICs have already been assigned to p2p1 and p2p2:


Jan 11 12:03:14 cpt0-dpdk-dell-tovb os-collect-config: [2021/01/11 11:58:07 AM] [INFO] Active nics are ['em1', 'em2', 'p1p1', 'p1p2', 'p2p1', 'p2p1', 'p2p2', 'p2p2', 'p3p1', 'p3p1', 'p3p2', 
'p3p2']
Jan 11 12:03:14 cpt0-dpdk-dell-tovb os-collect-config: [2021/01/11 11:58:07 AM] [INFO] nic2 mapped to: em2
Jan 11 12:03:14 cpt0-dpdk-dell-tovb os-collect-config: [2021/01/11 11:58:07 AM] [INFO] nic3 mapped to: p1p1
Jan 11 12:03:14 cpt0-dpdk-dell-tovb os-collect-config: [2021/01/11 11:58:07 AM] [INFO] nic4 mapped to: p1p2
Jan 11 12:03:14 cpt0-dpdk-dell-tovb os-collect-config: [2021/01/11 11:58:07 AM] [INFO] nic7 mapped to: p2p2
Jan 11 12:03:14 cpt0-dpdk-dell-tovb os-collect-config: [2021/01/11 11:58:07 AM] [INFO] nic5 mapped to: p2p1
Jan 11 12:03:14 cpt0-dpdk-dell-tovb os-collect-config: [2021/01/11 11:58:07 AM] [INFO] nic1 mapped to: em1
Jan 11 12:03:14 cpt0-dpdk-dell-tovb os-collect-config: [2021/01/11 11:58:07 AM] [INFO] nic11 mapped to: p3p2
Jan 11 12:03:14 cpt0-dpdk-dell-tovb os-collect-config: [2021/01/11 11:58:07 AM] [INFO] nic9 mapped to: p3p1

In order to troubleshoot this, I need to see the NIC config templates that are being used in the stack update, as well as more information about what changes were made manually. What was the goal of the manual changes? What were the changes made to the NIC config templates (or network environment files) before running a stack update with NetworkDeployActions set to ["CREATE","UPDATE"].

Comment 5 Dan Sneddon 2021-01-19 21:32:27 UTC
I think I have discovered where the bug lies here. When os-net-config runs for the first time, the DPDK nics have no entry in /sys/net. Since the NICs are not present there, we look at the DPDK mapping and add the NICs to the list of active NICs.

When you made the LACP change and updated the stack, the DPDK NICs would have been active and would have an entry in /sys/net. The NICs were added to the list of active NICs, but the DPDK mapping added those NICs to the list of active NICs a second time.

To fix this we probably have to made sure we only add the DPDK NIC to the list once.

I can file an upstream bug and patch, but I don't know if or how long it would take for the change to be made in OSP 13. It is probably best to use the following workaround instead.

My recommendation is to use real NIC names in the computeDPDK.yaml template. If the nodes do not all have the same NIC name configuration, then a mapping will have to be provided. See the file in firstboot/os-net-config-mappings.yaml in the openstack-tripleo-heat-templates directory and the associated documentation for more information.

Comment 6 Alex Stupnikov 2021-01-20 08:54:08 UTC
Thank you so much Dan! We will try to explain available options to customer.

Comment 7 pweeks 2021-01-20 20:47:55 UTC
Dan, can you fill in the fixed in version field and link to the patch?
I'll set tags appropriate for 16.1.5.

Comment 23 errata-xmlrpc 2021-05-26 13:50:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.6 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2097


Note You need to log in before you can comment on or make changes to this bug.