Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1824844

Summary: Failed deployment of whole disk images with GPT in UEFI mode with local boot
Product: Red Hat OpenStack Reporter: Bob Fournier <bfournie>
Component: openstack-ironic-python-agentAssignee: RHOS Maint <rhos-maint>
Status: CLOSED ERRATA QA Contact: Alistair Tonner <atonner>
Severity: high Docs Contact:
Priority: high    
Version: 16.0 (Train)CC: bfournie, dtantsur, mburns, rpittau, slinaber
Target Milestone: betaKeywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-ironic-python-agent-5.0.2-0.20200521164433.7bad00c.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-29 07:51:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bob Fournier 2020-04-16 14:35:11 UTC
Description of problem:

Seeing this problem when installing a whole disk image built with GPT in UEFI mode.

12:21 <yolanda> i see in conductor, some logs... None"; prepare_image: result "{'result': 'prepare_image: image (rhcos-44.81.202003062006-0-compressed.x86_64.qcow2) written to device /dev/sda root_uuid=0x00000000'}", 2020-03-23 10:39:17.742 1 DEBUG ironic.drivers.modules.agent_base [-] Installing the bootloader for node ae7c756c-3b19-4d47-a203-fae69d2f065e on partition 0x00000000, 2020-03-23 10:39:17.903 1 DEBUG ironic.drivers.modules.agent_client [-] Agent
12:21 <yolanda> command image.install_bootloader for node ae7c756c-3b19-4d47-a203-fae69d2f065e returned result None, error {'message': 'Error finding the disk or partition device to deploy the image onto: No partition with UUID 0x00000000 found on device /dev/sda', 'code': 404, 'type': 'DeviceNotFound', 'details': 'No partition with UUID 0x00000000 found on device /dev/sda'},
The ID 0x00000000 comes from ironic-lib's get_disk_identifier which seems explicitly MBR-centric. Our UEFI CI jobs also return 0x00000000 there, but I suspect this value is ignored or processed otherwise for netboot.

For whole disk images with legacy boot we simply skip installing the bootloader. This leaves us with UEFI deployments, where we try to find a partition with ID 0x00000000. What we actually have is:

Mar 23 10:39:59 master-2 ironic-python-agent[562]: 2020-03-23 10:39:59.936 562 DEBUG ironic_lib.utils [-] Execution completed, command line is "lsblk -PbioKNAME,UUID,PARTUUID,TYPE /dev/sda" execute /usr/lib/python2.7/site-packages/ironic_lib/utils.py:101
Mar 23 10:39:59 master-2 ironic-python-agent[562]: 2020-03-23 10:39:59.937 562 DEBUG ironic_lib.utils [-] Command stdout is: "KNAME="sda" UUID="" PARTUUID="" TYPE="disk"
                                                   KNAME="sda1" UUID="6cbabcc8-9f5f-46f5-8c44-a52a334c4636" PARTUUID="f5b7a905-8d18-4caf-9c04-2022b0a0b560" TYPE="part"
                                                   KNAME="sda2" UUID="EC5D-23FD" PARTUUID="2ba267c1-603e-41b6-a468-b1f467bf991f" TYPE="part"
                                                   KNAME="sda3" UUID="" PARTUUID="44976824-89ad-40b9-b7da-b062e902d7b8" TYPE="part"
                                                   KNAME="sda4" UUID="00000000-0000-4000-a000-000000000002" PARTUUID="7f54e0c2-f228-46f3-9201-0931488f1aab" TYPE="part"
                                                   KNAME="sda5" UUID="2020-03-23-10-37-15-00" PARTUUID="5965c89f-cf89-4399-aff9-3b973c684c8a" TYPE="part"
                                                   " execute /usr/lib/python2.7/site-packages/ironic_lib/utils.py:103
Nothing here matches 0x00000000 indeed. The partitions here are:

Mar 23 10:39:59 master-2 ironic-python-agent[562]: 2020-03-23 10:39:59.571 562 DEBUG ironic_lib.utils [-] Command stdout is: "BYT;
                                                   /dev/sda:476940MiB:scsi:512:512:gpt:ATA Samsung SSD 860:;
                                                   1:1.00MiB:385MiB:384MiB:ext4:boot:;
                                                   2:385MiB:512MiB:127MiB:fat16:EFI-SYSTEM:boot;
                                                   3:512MiB:513MiB:1.00MiB::BIOS-BOOT:bios_grub;
                                                   4:513MiB:3416MiB:2903MiB::luks_root:;
                                                   5:476876MiB:476940MiB:64.0MiB:::;
                                                   " execute /usr/lib/python2.7/site-packages/ironic_lib/utils.py:1033
Even if we did find the LUKS partition, we wouldn't be able to mount it, so it's a dead end.

We need to figure out how UEFI whole disk installation can work when the efibootmgr approach does not work.

Comment 3 Bob Fournier 2020-06-16 20:30:46 UTC
Verified fix is available in compose RHOS-16.1-RHEL-8-20200611.n.0

Comment 7 errata-xmlrpc 2020-07-29 07:51:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3148