Bug 1733492 - [RCA] Udev rules rules and nic mapping changed after node delete
Summary: [RCA] Udev rules rules and nic mapping changed after node delete
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-net-config
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-26 08:56 UTC by Eduard Barrera
Modified: 2019-10-16 22:14 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 22:14:42 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Eduard Barrera 2019-07-26 08:56:27 UTC
Description of problem:

Overcloud node deleted failed and caused connectivity lost on 20 computes.
* In the os-collect-config logs- We noticed that nic1 is getting mapped to br-XXXXX instead of correct interface[1]. We tried running os-net-config manually but during our manual run nic1 was again getting mapped to br-XXXXX. As nic was not properly getting map, we proposed a workaround of creating a mapping.yaml which successfully worked on 1 of a compute node. But as this is a manual work for 20 compute nodes and we tried to find other possible workarounds.

We found that udev rules are different for good and affected compute nodes and affected compute node have an entry for  "br-XXXXXX"[3]. We moved the /etc/udev/rules.d/70-persistent-net.rules and reboot one of a compute node which worked.
/etc/udev/rules.d/70-persistent-net.rules got recreated[3] and connectivity restored after network and openvswitch service restart.


This environment was recently upgraded from 8 to 10

Version-Release number of selected component (if applicable):
OSP10

How reproducible:
Unsure

Steps to Reproduce:
1. delete an overcloud node 
2.
3.

Actual results:
nic mapping changed, connectivity lost, node delete failed

Expected results:
node deleted only

Additional info:

Comment 37 Bob Fournier 2019-10-15 19:39:52 UTC
Angela - its interesting that workaround came up Andreas on another bug, see https://bugzilla.redhat.com/show_bug.cgi?id=1760806.  We think its a reasonable workaround to prevent cloud-init from overwriting the config.

Comment 39 Bob Fournier 2019-10-15 20:35:23 UTC
It seems that a support exception isn't needed in this case - its a workaround for cloud-init behaviour, but I'm not entirely clear in which cases we require an SE.

Comment 40 Angela Soni 2019-10-16 22:14:42 UTC
Marking this bug as closed based on above workaround.


Note You need to log in before you can comment on or make changes to this bug.