Bug 2058717

Summary: fstab entry for EFI partition written with partition UUID instead of of uuid - deployed node does not boot
Product: Red Hat OpenStack Reporter: Harald Jensås <hjensas>
Component: openstack-ironic-python-agentAssignee: Julia Kreger <jkreger>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 17.0 (Wallaby)CC: jkreger, jparoly, sbaker
Target Milestone: betaKeywords: Triaged
Target Release: 17.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-ironic-python-agent-7.0.3-0.20220315051950.881015a.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-21 12:19:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Harald Jensås 2022-02-25 17:11:33 UTC
Description of problem:
Deployed node does not boot, fails during switchroot and drops to emergency shell.
Console log:
 Timed out waiting for device dev-di…5c\x2da829\x2d37e4267c9978.device.

It is failing to mount the EFI partition.

On working node deployment:
Feb 24 13:55:37 host-192-168-24-34 ironic-python-agent[669]: 2022-02-24 13:55:37.379 669 DEBUG ironic_python_agent.extensions.image [-] Added entry to /etc/fstab for EFI partition auto-mount with uuid 481E-F2FD _append_uefi_to_fstab /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py:909

On failing node deployment:
Feb 24 13:55:37 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:55:37.075 670 DEBUG ironic_python_agent.extensions.image [-] Added entry to /etc/fstab for EFI partition auto-mount with uuid 68fe0417-0a56-445c-a829-37e4267c9978
_append_uefi_to_fstab /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py:909


The difference is the UUID used in fstab. On the failing node the "partuuid" as listed by lsblk of the EFI partition is used. On working node the "uuid" as listed by lsblk is used.


Version-Release number of selected component (if applicable):
rhosp-director-images-ipa-x86_64-17.0-20220216.2.el8ost.noarch

How reproducible:
10%

Steps to Reproduce:
1. Attempt to deploy many nodes in parallel
2. Most nodes deploy successfully
3. 

Actual results:
fstab entry written:

 UUID=68fe0417-0a56-445c-a829-37e4267c9978 /boot/efi       vfat    umask=0077      0       1

Consloe logs:
  Timed out waiting for device dev-di…5c\x2da829\x2d37e4267c9978.device.


Expected results:
fstab entry written:

 UUID=46A5-0817  /boot/efi       vfat    umask=0077      0       1

Additional info:
qemu-nbd --connect=/dev/nbd0 /root/debug_switchroot_issue/compute-3-disk1.qcow2
using `lsblk --output-all --json /dev/nbd0 | jq .` there is for the EFI partition.
 "partuuid": "68fe0417-0a56-445c-a829-37e4267c9978" 
 "uuid": "482E-AAF9" 

It seems IPA is not consistently using "uuid".

Comment 2 Harald Jensås 2022-02-25 17:19:53 UTC
OK DEPLOY
#########
Feb 24 13:55:32 host-192-168-24-34 ironic-python-agent[669]: 2022-02-24 13:55:32.694 669 DEBUG ironic_lib.utils [-] Command stdout is: "KNAME="vda" UUID="" PARTUUID="" TYPE="disk" LABEL=""
                                                             KNAME="vda1" UUID="481E-F2FD" PARTUUID="b7dac7a5-eefa-4966-bbcd-358f074618f3" TYPE="part" LABEL="efi-part"
                                                             KNAME="vda2" UUID="2022-02-24-18-48-59-00" PARTUUID="6be3281a-34e7-4ab0-b30a-0417a06f101b" TYPE="part" LABEL="config-2"
                                                             KNAME="vda3" UUID="e8bf71d2-a8ef-4176-a277-fd26220ef3fb" PARTUUID="3d0b48e3-00e4-40c1-99f0-653f5f694c01" TYPE="part" LABEL="img-rootfs"
                                                             " _log /usr/lib/python3.6/site-packages/ironic_lib/utils.py:99
Feb 24 13:55:32 host-192-168-24-34 ironic-python-agent[669]: 2022-02-24 13:55:32.704 669 DEBUG ironic_lib.utils [-] Command stderr is: "" _log /usr/lib/python3.6/site-packages/ironic_lib/utils.py:100
Feb 24 13:55:32 host-192-168-24-34 ironic-python-agent[669]: 2022-02-24 13:55:32.707 669 DEBUG ironic_python_agent.extensions.image [-] Partition 481E-F2FD found on device /dev/vda _get_partition /usr/lib/python3.6/site-packages/ironic_p
ython_agent/extensions/image.py:99

FAILED DEPLOY
#############
Feb 24 13:55:32 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:55:32.632 670 DEBUG oslo_concurrency.processutils [-] CMD "lsblk -PbioKNAME,UUID,PARTUUID,TYPE,LABEL /dev/vda" returned: 0 in 0.010s execute /usr/lib/python3.6/sit
e-packages/oslo_concurrency/processutils.py:423
Feb 24 13:55:32 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:55:32.636 670 DEBUG ironic_lib.utils [-] Command stdout is: "KNAME="vda" UUID="" PARTUUID="" TYPE="disk" LABEL=""
                                                            KNAME="vda1" UUID="482E-AAF9" PARTUUID="68fe0417-0a56-445c-a829-37e4267c9978" TYPE="part" LABEL="efi-part"
                                                            KNAME="vda2" UUID="2022-02-24-18-49-07-00" PARTUUID="2a41400b-765e-4fa3-98e8-b6d19f86ae25" TYPE="part" LABEL="config-2"
                                                            KNAME="vda3" UUID="e8bf71d2-a8ef-4176-a277-fd26220ef3fb" PARTUUID="3bec0aa9-f641-4a58-b5ba-14bb2147b2e5" TYPE="part" LABEL="img-rootfs"
                                                            " _log /usr/lib/python3.6/site-packages/ironic_lib/utils.py:99
Feb 24 13:55:32 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:55:32.645 670 DEBUG ironic_lib.utils [-] Command stderr is: "" _log /usr/lib/python3.6/site-packages/ironic_lib/utils.py:100
Feb 24 13:55:32 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:55:32.647 670 DEBUG ironic_python_agent.extensions.image [-] Partition 68fe0417-0a56-445c-a829-37e4267c9978 found on device /dev/vda _get_partition /usr/lib/python
3.6/site-packages/ironic_python_agent/extensions/image.py:103

Comment 3 Julia Kreger 2022-02-25 17:52:47 UTC
Failed:

Feb 24 13:55:25 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:55:25.881 670 DEBUG root [-] Executing command: image.install_bootloader with args: {'root_uuid': 'e8bf71d2-a8ef-4176-a277-fd26220ef3fb', 'efi_system_part_uuid': '68fe0417-0a56-445c-a829-37e4267c9978', 'prep_boot_part_uuid': None, 'target_boot_mode': 'uefi'} execute_command /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/base.py:255

Working:

Feb 24 13:55:01 host-192-168-24-41 ironic-python-agent[678]: 2022-02-24 13:55:01.853 678 DEBUG root [-] Executing command: image.install_bootloader with args: {'root_uuid': 'e8bf71d2-a8ef-4176-a277-fd26220ef3fb', 'efi_system_part_uuid': '46E9-A0BD', 'prep_boot_part_uuid': None, 'target_boot_mode': 'uefi'} execute_command /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/base.py:255

Looks like we're selecting the partuuid instead of the uuid, and that is why the mount is failing. Interestingly enough, this should work with just the partuuid AIUI.

which comes from ironic_lib's disk_utils.

Failed node:

Feb 24 13:54:59 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:54:59.050 670 DEBUG ironic_lib.disk_utils [-] Falling back to partition UUID as the block device UUID was not found while examining /dev/vda3 block_uuid /usr/lib/python3.6/site-packages/ironic_lib/disk_utils.py:563

A total of 2 partitions failed to lookup on the node that failed, vs only one (the root fs) on the one that succeeded.

The code, explicitly tries to return the UUID field, and then falls back to PARTUUID

Comment 4 Julia Kreger 2022-02-25 20:35:10 UTC
Feb 24 13:54:59 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:54:59.072 670 DEBUG oslo_concurrency.processutils [-] CMD "lsblk /dev/vda1 --pairs --bytes --ascii --nodeps --output UUID,PARTUUID" returned: 0 in 0.016s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
Feb 24 13:54:59 host-192-168-24-8 ironic-python-agent[670]: 2022-02-24 13:54:59.079 670 DEBUG ironic_lib.utils [-] Command stdout is: "UUID="" PARTUUID="68fe0417-0a56-445c-a829-37e4267c9978"

Comment 7 Julia Kreger 2022-03-14 18:32:26 UTC
Fix cherry-picked downstream and in downstream review.

Comment 17 errata-xmlrpc 2022-09-21 12:19:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543