Description of problem: After restoring the undercloud we are losing the connectivity with the controller nodes because for some reason the network interfaces names are changing from eth0-eth1-eth2 to ensX. This change in the interace's names is breaking the bridge br-ctlplane and as a result we can see in the logs: "ssh: connect to host 192.168.24.34 port 22: No route to host". Version-Release number of selected component (if applicable): RHOS-17.1-RHEL-9-20230216.n.1 rhel-guest-image-9.2-20230207.8.x86_64.qcow2 How reproducible: Executing the bnr workflow with infrared plugin Steps to Reproduce: 1. Deploy the stack 2. Install rear and nfs: infrared backup-restore --ospversion 17.1 --setup-nfs-rear true --backup-dir /home/ctl_plane_backups 3. Execute rear backup: infrared backup-restore --ospversion 17.1 --backup-undercloud true --backup-overcloud true --backup-dir /home/ctl_plane_backups 4. Restore the undercloud: infrared backup-restore --ospversion 17.1 --restore-undercloud true --restore-overcloud true --backup-dir /home/ctl_plane_backups Actual results: This change in the interace's names is breaking the bridge br-ctlplane and as a result we can see in the undercloud restore logs: "ssh: connect to host 192.168.24.34 port 22: No route to host". Expected results: Restoration is sucessful Additional info: As a workaroud it is possible to fix the networking problem using os-net-config utility: sed -i 's/eth0/ens3/g' /etc/os-net-config/config.yaml; os-net-config -c /etc/os-net-config/config.yaml
You are destroying the undercloud VM set up by infrared virsh plugin and re-creating the undercloud VM without using the virsh plugin, I guess in one of these playbooks? * https://github.com/redhat-openstack/infrared/blob/master/plugins/tripleo-undercloud/restore.yml#L97-L115 * https://github.com/redhat-openstack/infrared/blob/55ba05ca0d9f5aca6f605816da02dac053537254/plugins/tripleo-undercloud/restore_containerized.yml The virsh plugin does it in a similar way, but there is many things that can be different based on options. * https://github.com/redhat-openstack/infrared/blob/master/plugins/virsh/tasks/vms_2_install.yml#L37-L78 For example: {% if provision.bootmode == 'uefi' %} --boot {{ 'hd' if topology_node.deploy_os|default(True) else 'uefi' }} \ {% else %} {%- if interface.model is defined and interface.model %},model={{ interface.model }}{% endif %} {% if topology_node.machine_type is defined and topology_node.machine_type %} --machine {{ topology_node.machine_type }} \ {% endif %} --os-variant {{ topology_node.os.variant }} \ I think something is different, i.e the hardware the undercloud sees is different and based on that the interface names are different. It is also possible the undercloud initially installed has netifnames disabled? I doubt that this is a product bug, this is an issue with the infrastructure used for testing.
Thanks Harald for your comment, let me provide a clarification about the procedure: We are not using the tripleo-undercloud IR plugin. We use the backup and restore plugin [1] that execute the backup and restore tripleo role [2] using the openstack backup commands that are implemented in the cli [3] . The backup and restore role relies on ReaR [4] to backup and restore the undercloud and controller nodes, so when we restore the undercloud node using ReaR, we expect to have restored exactly the vm with the same network interfaces as in the backup image. I wonder why the interfaces names are changing since the scripts to enable the network interfaces in the restored node are with the ethX naming. [1] https://gitlab.cee.redhat.com/osp-dfg-enterprise/infrared-plugin-backup-restore [2] https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/backup_and_restore [3] https://github.com/openstack/python-tripleoclient [4] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_basic_system_settings/assembly_recovering-and-restoring-a-system_configuring-basic-system-settings
I can confirm that for some reason the undercloud was initially installed with netifnames disabled: [root@undercloud-0 stack]# cat /proc/cmdline BOOT_IMAGE=(hd0,gpt3)/vmlinuz-5.14.0-283.el9.x86_64 root=UUID=62b51192-13b0-4838-a267-e410f86ee01e console=tty0 console=ttyS0,115200n8 no_timer_check net.ifnames=0 crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M
We will go doing further investigation on ReaR side.