Bug 2249024

Summary: RHOSP16.2 to 17.1 upgrade: During Leapp uprade steps the network interface names are not preserved
Product: Red Hat OpenStack Reporter: Shravan Kumar Tiwari <shtiwari>
Component: openstack-tripleo-heat-templatesAssignee: Sergii Golovatiuk <sgolovat>
Status: CLOSED ERRATA QA Contact: Archana Singh <arcsingh>
Severity: medium Docs Contact:
Priority: medium    
Version: 17.1 (Wallaby)CC: abhijadh, bshephar, fpiccion, jbadiapa, jelle.hoylaerts.ext, jpretori, madgupta, mariel, mburns, pgodwin, sapaul, sgolovat
Target Milestone: z3Keywords: Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-14.3.1-17.1.20231103010835.el9ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-05-22 20:42:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2263838    

Description Shravan Kumar Tiwari 2023-11-10 09:43:59 UTC
Description of problem:
One of our Telco customer performing RHOSP16.2 to 17.1 upgrade and after OSP upgrade completed successfully he has to do the Leapp upgrade of ndoe from RHEL8 to RHEL9 (he is following doc section https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html-single/framework_for_upgrades_16.2_to_17.1/index#upgrading-the-undercloud-operating-system).

The problem that they experience that nic eno49 and eno50 belongs to Bond0 and after leapp upgrade both eno49 and eno50 are represented with same mac address.
Customer thinks "due to a bug in the ansible code generating this files it writes eno49 and eno50 which are part of the same bond the same mac.
It writes the same mac instead of taking care of the fact that they have the same mac because they are in a LACP bond, it would for eno50 take it's permaddr."
When you change this manually (for eno50 to :61) it works perfectly fine after a server reboot.
When you don't modify this interfaces get terribly mixed up.

Problematic code part:
------------------------
The wrong code can be found in the rendered ansible code for example:
- name: Keep nics with prefix in NICsPrefixesToUdev from renaming
              vars:
                nics_prefixes_to_keep: {get_attr: [RoleParametersValue, value, 'nics_prefixes_to_keep']}
              # (.ifname | test("^.*\\..*$") | not) removes vlan nics like ens1.1
              # (.ifname | test("^.*v[0-9]*$") | not) removes virtual function nics ens1v1
              # (.ifname | test("^.*_[0-9]*$") | not) also removes virtual function nics ens1_1
              shell: >
                  ip -j link show | jq -r --arg prefix "{{ item }}" '.[] | select((.ifname | startswith($prefix)) and (.ifname | test("^.*v[0-9]*$")|not) and (.ifname | test("^.*_[0-9]*$") | not) and (.ifname | test("^.*\\..*$") | not)) | "SUBSYSTEM==\"net\",ACTION==\"add\",DRIVERS==\"?*\"," + "NAME=\"" + .ifname +"\" ,ATTR{address}==\"" + .address + "\""' >> /etc/udev/rules.d/70-rhosp-persistent-net.rules
              loop: "{{ nics_prefixes_to_keep|list }}"

What workaround customer tried:
-------------------------------
Modifying the code to the following, to make sure the perm_address is used instead of the duplicate address does the trick:
ip -j link show | jq -r --arg prefix "en" '.[] | select((.ifname | startswith($prefix)) and (.ifname | test("^.*v[0-9]*$")|not) and (.ifname | test("^.*_[0-9]*$") | not) and (.ifname | test("^.*\\..*$") | not)) | if .permaddr? then .address=.permaddr else . end | "SUBSYSTEM==\"net\",ACTION==\"add\",DRIVERS==\"?*\"," + "NAME=\"" + .ifname +"\" ,ATTR{address}==\"" + .address + "\""'




Expected results:
- The deployment scripts used for RHOSP upgrade/Leapp should take care of this and upgrade should not fail.


Additional info:

1- This issue happens for all roles, here it's reported for director, but also all other roles suffer from this code bug.
2- Once the first upgrade fails and you run the second upgrade then /etc/udev/rules.d/70-rhosp-persistent-net.rules files will have duplicate entries. So, it seems if the upgrade failed first time, so at relaunch it doesn't clean the udev file it just appends.

Comment 9 Juan Badia Payno 2024-02-13 14:20:56 UTC
*** Bug 2263838 has been marked as a duplicate of this bug. ***

Comment 24 errata-xmlrpc 2024-05-22 20:42:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: openstack-tripleo-heat-templates and tripleo-ansible update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:2736

Comment 25 Juan Badia Payno 2024-10-08 12:22:22 UTC
*** Bug 2314924 has been marked as a duplicate of this bug. ***