Description of problem: Deployed node does not boot, fails during switchroot and drops to emergency shell. Console log: Timed out waiting for device dev-di…5c\x2da829\x2d37e4267c9978.device. It is failing to mount the EFI partition. On working node deployment: Feb 24 13:55:37 host-192-168-24-34 ironic-python-agent[669]: 2022-02-24 13:55:37.379 669 DEBUG ironic_python_agent.extensions.image [-] Added entry to /etc/fstab for EFI partition auto-mount with uuid 481E-F2FD _append_uefi_to_fstab /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py:909 On failing node deployment: Feb 24 13:55:37 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:55:37.075 670 DEBUG ironic_python_agent.extensions.image [-] Added entry to /etc/fstab for EFI partition auto-mount with uuid 68fe0417-0a56-445c-a829-37e4267c9978 _append_uefi_to_fstab /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py:909 The difference is the UUID used in fstab. On the failing node the "partuuid" as listed by lsblk of the EFI partition is used. On working node the "uuid" as listed by lsblk is used. Version-Release number of selected component (if applicable): rhosp-director-images-ipa-x86_64-17.0-20220216.2.el8ost.noarch How reproducible: 10% Steps to Reproduce: 1. Attempt to deploy many nodes in parallel 2. Most nodes deploy successfully 3. Actual results: fstab entry written: UUID=68fe0417-0a56-445c-a829-37e4267c9978 /boot/efi vfat umask=0077 0 1 Consloe logs: Timed out waiting for device dev-di…5c\x2da829\x2d37e4267c9978.device. Expected results: fstab entry written: UUID=46A5-0817 /boot/efi vfat umask=0077 0 1 Additional info: qemu-nbd --connect=/dev/nbd0 /root/debug_switchroot_issue/compute-3-disk1.qcow2 using `lsblk --output-all --json /dev/nbd0 | jq .` there is for the EFI partition. "partuuid": "68fe0417-0a56-445c-a829-37e4267c9978" "uuid": "482E-AAF9" It seems IPA is not consistently using "uuid".
OK DEPLOY ######### Feb 24 13:55:32 host-192-168-24-34 ironic-python-agent[669]: 2022-02-24 13:55:32.694 669 DEBUG ironic_lib.utils [-] Command stdout is: "KNAME="vda" UUID="" PARTUUID="" TYPE="disk" LABEL="" KNAME="vda1" UUID="481E-F2FD" PARTUUID="b7dac7a5-eefa-4966-bbcd-358f074618f3" TYPE="part" LABEL="efi-part" KNAME="vda2" UUID="2022-02-24-18-48-59-00" PARTUUID="6be3281a-34e7-4ab0-b30a-0417a06f101b" TYPE="part" LABEL="config-2" KNAME="vda3" UUID="e8bf71d2-a8ef-4176-a277-fd26220ef3fb" PARTUUID="3d0b48e3-00e4-40c1-99f0-653f5f694c01" TYPE="part" LABEL="img-rootfs" " _log /usr/lib/python3.6/site-packages/ironic_lib/utils.py:99 Feb 24 13:55:32 host-192-168-24-34 ironic-python-agent[669]: 2022-02-24 13:55:32.704 669 DEBUG ironic_lib.utils [-] Command stderr is: "" _log /usr/lib/python3.6/site-packages/ironic_lib/utils.py:100 Feb 24 13:55:32 host-192-168-24-34 ironic-python-agent[669]: 2022-02-24 13:55:32.707 669 DEBUG ironic_python_agent.extensions.image [-] Partition 481E-F2FD found on device /dev/vda _get_partition /usr/lib/python3.6/site-packages/ironic_p ython_agent/extensions/image.py:99 FAILED DEPLOY ############# Feb 24 13:55:32 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:55:32.632 670 DEBUG oslo_concurrency.processutils [-] CMD "lsblk -PbioKNAME,UUID,PARTUUID,TYPE,LABEL /dev/vda" returned: 0 in 0.010s execute /usr/lib/python3.6/sit e-packages/oslo_concurrency/processutils.py:423 Feb 24 13:55:32 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:55:32.636 670 DEBUG ironic_lib.utils [-] Command stdout is: "KNAME="vda" UUID="" PARTUUID="" TYPE="disk" LABEL="" KNAME="vda1" UUID="482E-AAF9" PARTUUID="68fe0417-0a56-445c-a829-37e4267c9978" TYPE="part" LABEL="efi-part" KNAME="vda2" UUID="2022-02-24-18-49-07-00" PARTUUID="2a41400b-765e-4fa3-98e8-b6d19f86ae25" TYPE="part" LABEL="config-2" KNAME="vda3" UUID="e8bf71d2-a8ef-4176-a277-fd26220ef3fb" PARTUUID="3bec0aa9-f641-4a58-b5ba-14bb2147b2e5" TYPE="part" LABEL="img-rootfs" " _log /usr/lib/python3.6/site-packages/ironic_lib/utils.py:99 Feb 24 13:55:32 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:55:32.645 670 DEBUG ironic_lib.utils [-] Command stderr is: "" _log /usr/lib/python3.6/site-packages/ironic_lib/utils.py:100 Feb 24 13:55:32 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:55:32.647 670 DEBUG ironic_python_agent.extensions.image [-] Partition 68fe0417-0a56-445c-a829-37e4267c9978 found on device /dev/vda _get_partition /usr/lib/python 3.6/site-packages/ironic_python_agent/extensions/image.py:103
Failed: Feb 24 13:55:25 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:55:25.881 670 DEBUG root [-] Executing command: image.install_bootloader with args: {'root_uuid': 'e8bf71d2-a8ef-4176-a277-fd26220ef3fb', 'efi_system_part_uuid': '68fe0417-0a56-445c-a829-37e4267c9978', 'prep_boot_part_uuid': None, 'target_boot_mode': 'uefi'} execute_command /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/base.py:255 Working: Feb 24 13:55:01 host-192-168-24-41 ironic-python-agent[678]: 2022-02-24 13:55:01.853 678 DEBUG root [-] Executing command: image.install_bootloader with args: {'root_uuid': 'e8bf71d2-a8ef-4176-a277-fd26220ef3fb', 'efi_system_part_uuid': '46E9-A0BD', 'prep_boot_part_uuid': None, 'target_boot_mode': 'uefi'} execute_command /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/base.py:255 Looks like we're selecting the partuuid instead of the uuid, and that is why the mount is failing. Interestingly enough, this should work with just the partuuid AIUI. which comes from ironic_lib's disk_utils. Failed node: Feb 24 13:54:59 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:54:59.050 670 DEBUG ironic_lib.disk_utils [-] Falling back to partition UUID as the block device UUID was not found while examining /dev/vda3 block_uuid /usr/lib/python3.6/site-packages/ironic_lib/disk_utils.py:563 A total of 2 partitions failed to lookup on the node that failed, vs only one (the root fs) on the one that succeeded. The code, explicitly tries to return the UUID field, and then falls back to PARTUUID
Feb 24 13:54:59 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:54:59.072 670 DEBUG oslo_concurrency.processutils [-] CMD "lsblk /dev/vda1 --pairs --bytes --ascii --nodeps --output UUID,PARTUUID" returned: 0 in 0.016s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 Feb 24 13:54:59 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:54:59.079 670 DEBUG ironic_lib.utils [-] Command stdout is: "UUID="" PARTUUID="68fe0417-0a56-445c-a829-37e4267c9978"
Fix cherry-picked downstream and in downstream review.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543