Description of problem: One of our Telco customer performing RHOSP16.2 to 17.1 upgrade and after OSP upgrade completed successfully he has to do the Leapp upgrade of ndoe from RHEL8 to RHEL9 (he is following doc section https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html-single/framework_for_upgrades_16.2_to_17.1/index#upgrading-the-undercloud-operating-system). The problem that they experience that nic eno49 and eno50 belongs to Bond0 and after leapp upgrade both eno49 and eno50 are represented with same mac address. Customer thinks "due to a bug in the ansible code generating this files it writes eno49 and eno50 which are part of the same bond the same mac. It writes the same mac instead of taking care of the fact that they have the same mac because they are in a LACP bond, it would for eno50 take it's permaddr." When you change this manually (for eno50 to :61) it works perfectly fine after a server reboot. When you don't modify this interfaces get terribly mixed up. Problematic code part: ------------------------ The wrong code can be found in the rendered ansible code for example: - name: Keep nics with prefix in NICsPrefixesToUdev from renaming vars: nics_prefixes_to_keep: {get_attr: [RoleParametersValue, value, 'nics_prefixes_to_keep']} # (.ifname | test("^.*\\..*$") | not) removes vlan nics like ens1.1 # (.ifname | test("^.*v[0-9]*$") | not) removes virtual function nics ens1v1 # (.ifname | test("^.*_[0-9]*$") | not) also removes virtual function nics ens1_1 shell: > ip -j link show | jq -r --arg prefix "{{ item }}" '.[] | select((.ifname | startswith($prefix)) and (.ifname | test("^.*v[0-9]*$")|not) and (.ifname | test("^.*_[0-9]*$") | not) and (.ifname | test("^.*\\..*$") | not)) | "SUBSYSTEM==\"net\",ACTION==\"add\",DRIVERS==\"?*\"," + "NAME=\"" + .ifname +"\" ,ATTR{address}==\"" + .address + "\""' >> /etc/udev/rules.d/70-rhosp-persistent-net.rules loop: "{{ nics_prefixes_to_keep|list }}" What workaround customer tried: ------------------------------- Modifying the code to the following, to make sure the perm_address is used instead of the duplicate address does the trick: ip -j link show | jq -r --arg prefix "en" '.[] | select((.ifname | startswith($prefix)) and (.ifname | test("^.*v[0-9]*$")|not) and (.ifname | test("^.*_[0-9]*$") | not) and (.ifname | test("^.*\\..*$") | not)) | if .permaddr? then .address=.permaddr else . end | "SUBSYSTEM==\"net\",ACTION==\"add\",DRIVERS==\"?*\"," + "NAME=\"" + .ifname +"\" ,ATTR{address}==\"" + .address + "\""' Expected results: - The deployment scripts used for RHOSP upgrade/Leapp should take care of this and upgrade should not fail. Additional info: 1- This issue happens for all roles, here it's reported for director, but also all other roles suffer from this code bug. 2- Once the first upgrade fails and you run the second upgrade then /etc/udev/rules.d/70-rhosp-persistent-net.rules files will have duplicate entries. So, it seems if the upgrade failed first time, so at relaunch it doesn't clean the udev file it just appends.
*** Bug 2263838 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: openstack-tripleo-heat-templates and tripleo-ansible update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:2736
*** Bug 2314924 has been marked as a duplicate of this bug. ***