DescriptionShravan Kumar Tiwari
2023-11-10 09:43:59 UTC
Description of problem:
One of our Telco customer performing RHOSP16.2 to 17.1 upgrade and after OSP upgrade completed successfully he has to do the Leapp upgrade of ndoe from RHEL8 to RHEL9 (he is following doc section https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html-single/framework_for_upgrades_16.2_to_17.1/index#upgrading-the-undercloud-operating-system).
The problem that they experience that nic eno49 and eno50 belongs to Bond0 and after leapp upgrade both eno49 and eno50 are represented with same mac address.
Customer thinks "due to a bug in the ansible code generating this files it writes eno49 and eno50 which are part of the same bond the same mac.
It writes the same mac instead of taking care of the fact that they have the same mac because they are in a LACP bond, it would for eno50 take it's permaddr."
When you change this manually (for eno50 to :61) it works perfectly fine after a server reboot.
When you don't modify this interfaces get terribly mixed up.
Problematic code part:
------------------------
The wrong code can be found in the rendered ansible code for example:
- name: Keep nics with prefix in NICsPrefixesToUdev from renaming
vars:
nics_prefixes_to_keep: {get_attr: [RoleParametersValue, value, 'nics_prefixes_to_keep']}
# (.ifname | test("^.*\\..*$") | not) removes vlan nics like ens1.1
# (.ifname | test("^.*v[0-9]*$") | not) removes virtual function nics ens1v1
# (.ifname | test("^.*_[0-9]*$") | not) also removes virtual function nics ens1_1
shell: >
ip -j link show | jq -r --arg prefix "{{ item }}" '.[] | select((.ifname | startswith($prefix)) and (.ifname | test("^.*v[0-9]*$")|not) and (.ifname | test("^.*_[0-9]*$") | not) and (.ifname | test("^.*\\..*$") | not)) | "SUBSYSTEM==\"net\",ACTION==\"add\",DRIVERS==\"?*\"," + "NAME=\"" + .ifname +"\" ,ATTR{address}==\"" + .address + "\""' >> /etc/udev/rules.d/70-rhosp-persistent-net.rules
loop: "{{ nics_prefixes_to_keep|list }}"
What workaround customer tried:
-------------------------------
Modifying the code to the following, to make sure the perm_address is used instead of the duplicate address does the trick:
ip -j link show | jq -r --arg prefix "en" '.[] | select((.ifname | startswith($prefix)) and (.ifname | test("^.*v[0-9]*$")|not) and (.ifname | test("^.*_[0-9]*$") | not) and (.ifname | test("^.*\\..*$") | not)) | if .permaddr? then .address=.permaddr else . end | "SUBSYSTEM==\"net\",ACTION==\"add\",DRIVERS==\"?*\"," + "NAME=\"" + .ifname +"\" ,ATTR{address}==\"" + .address + "\""'
Expected results:
- The deployment scripts used for RHOSP upgrade/Leapp should take care of this and upgrade should not fail.
Additional info:
1- This issue happens for all roles, here it's reported for director, but also all other roles suffer from this code bug.
2- Once the first upgrade fails and you run the second upgrade then /etc/udev/rules.d/70-rhosp-persistent-net.rules files will have duplicate entries. So, it seems if the upgrade failed first time, so at relaunch it doesn't clean the udev file it just appends.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: openstack-tripleo-heat-templates and tripleo-ansible update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2024:2736
Comment 25Juan Badia Payno
2024-10-08 12:22:22 UTC
*** Bug 2314924 has been marked as a duplicate of this bug. ***