Bug 1824844 - Failed deployment of whole disk images with GPT in UEFI mode with local boot
Summary: Failed deployment of whole disk images with GPT in UEFI mode with local boot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic-python-agent
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 16.1 (Train on RHEL 8.2)
Assignee: RHOS Maint
QA Contact: Alistair Tonner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-16 14:35 UTC by Bob Fournier
Modified: 2020-07-29 07:52 UTC (History)
5 users (show)

Fixed In Version: openstack-ironic-python-agent-5.0.2-0.20200521164433.7bad00c.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-29 07:51:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack Storyboard 2007455 0 None None None 2020-04-16 14:38:50 UTC
OpenStack gerrit 715024 0 None MERGED Return false for MBR bootloader check on UEFI machines 2020-06-22 09:10:50 UTC
OpenStack gerrit 720549 0 None MERGED A boot partition on a GPT disk should be considered an EFI partition 2020-06-22 09:10:50 UTC
Red Hat Product Errata RHBA-2020:3148 0 None None None 2020-07-29 07:52:02 UTC

Description Bob Fournier 2020-04-16 14:35:11 UTC
Description of problem:

Seeing this problem when installing a whole disk image built with GPT in UEFI mode.

12:21 <yolanda> i see in conductor, some logs... None"; prepare_image: result "{'result': 'prepare_image: image (rhcos-44.81.202003062006-0-compressed.x86_64.qcow2) written to device /dev/sda root_uuid=0x00000000'}", 2020-03-23 10:39:17.742 1 DEBUG ironic.drivers.modules.agent_base [-] Installing the bootloader for node ae7c756c-3b19-4d47-a203-fae69d2f065e on partition 0x00000000, 2020-03-23 10:39:17.903 1 DEBUG ironic.drivers.modules.agent_client [-] Agent
12:21 <yolanda> command image.install_bootloader for node ae7c756c-3b19-4d47-a203-fae69d2f065e returned result None, error {'message': 'Error finding the disk or partition device to deploy the image onto: No partition with UUID 0x00000000 found on device /dev/sda', 'code': 404, 'type': 'DeviceNotFound', 'details': 'No partition with UUID 0x00000000 found on device /dev/sda'},
The ID 0x00000000 comes from ironic-lib's get_disk_identifier which seems explicitly MBR-centric. Our UEFI CI jobs also return 0x00000000 there, but I suspect this value is ignored or processed otherwise for netboot.

For whole disk images with legacy boot we simply skip installing the bootloader. This leaves us with UEFI deployments, where we try to find a partition with ID 0x00000000. What we actually have is:

Mar 23 10:39:59 master-2 ironic-python-agent[562]: 2020-03-23 10:39:59.936 562 DEBUG ironic_lib.utils [-] Execution completed, command line is "lsblk -PbioKNAME,UUID,PARTUUID,TYPE /dev/sda" execute /usr/lib/python2.7/site-packages/ironic_lib/utils.py:101
Mar 23 10:39:59 master-2 ironic-python-agent[562]: 2020-03-23 10:39:59.937 562 DEBUG ironic_lib.utils [-] Command stdout is: "KNAME="sda" UUID="" PARTUUID="" TYPE="disk"
                                                   KNAME="sda1" UUID="6cbabcc8-9f5f-46f5-8c44-a52a334c4636" PARTUUID="f5b7a905-8d18-4caf-9c04-2022b0a0b560" TYPE="part"
                                                   KNAME="sda2" UUID="EC5D-23FD" PARTUUID="2ba267c1-603e-41b6-a468-b1f467bf991f" TYPE="part"
                                                   KNAME="sda3" UUID="" PARTUUID="44976824-89ad-40b9-b7da-b062e902d7b8" TYPE="part"
                                                   KNAME="sda4" UUID="00000000-0000-4000-a000-000000000002" PARTUUID="7f54e0c2-f228-46f3-9201-0931488f1aab" TYPE="part"
                                                   KNAME="sda5" UUID="2020-03-23-10-37-15-00" PARTUUID="5965c89f-cf89-4399-aff9-3b973c684c8a" TYPE="part"
                                                   " execute /usr/lib/python2.7/site-packages/ironic_lib/utils.py:103
Nothing here matches 0x00000000 indeed. The partitions here are:

Mar 23 10:39:59 master-2 ironic-python-agent[562]: 2020-03-23 10:39:59.571 562 DEBUG ironic_lib.utils [-] Command stdout is: "BYT;
                                                   /dev/sda:476940MiB:scsi:512:512:gpt:ATA Samsung SSD 860:;
                                                   1:1.00MiB:385MiB:384MiB:ext4:boot:;
                                                   2:385MiB:512MiB:127MiB:fat16:EFI-SYSTEM:boot;
                                                   3:512MiB:513MiB:1.00MiB::BIOS-BOOT:bios_grub;
                                                   4:513MiB:3416MiB:2903MiB::luks_root:;
                                                   5:476876MiB:476940MiB:64.0MiB:::;
                                                   " execute /usr/lib/python2.7/site-packages/ironic_lib/utils.py:1033
Even if we did find the LUKS partition, we wouldn't be able to mount it, so it's a dead end.

We need to figure out how UEFI whole disk installation can work when the efibootmgr approach does not work.

Comment 3 Bob Fournier 2020-06-16 20:30:46 UTC
Verified fix is available in compose RHOS-16.1-RHEL-8-20200611.n.0

Comment 7 errata-xmlrpc 2020-07-29 07:51:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3148


Note You need to log in before you can comment on or make changes to this bug.