Bug 2110919 - uefi_pxe_bootfile_name refers to ipxe.efi even though ipxe.efi has been replaced by snponly.efi
Summary: uefi_pxe_bootfile_name refers to ipxe.efi even though ipxe.efi has been repla...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.2 (Train)
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
: ---
Assignee: Julia Kreger
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-26 08:00 UTC by yatanaka
Modified: 2023-08-08 19:24 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-12-05 21:26:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 851039 0 None NEW Fix ilo boot interface order 2022-07-26 14:56:23 UTC
Red Hat Issue Tracker OSP-17853 0 None None None 2022-07-26 08:03:32 UTC
Red Hat Knowledge Base (Solution) 6971038 0 None None None 2022-08-08 03:11:03 UTC

Description yatanaka 2022-07-26 08:00:35 UTC
Description of problem:

ipxe.efi is replaced by snponly.efi by the following change.
  https://opendev.org/openstack/puppet-ironic/commit/3864e15998b5b1eec7d2b1b4911add9bb899fdb8
It seems that this change was backported to RHOSP 16.2.3

After the above change, ipxe.efi doesn't exist, and snponly.efi exists instead of ipxe.efi.
~~~
<RHOSP 16.2.3 undercloud>
/var/lib/ironic/tftpboot:
total 432 
-rwxr--r--. 1 42422 42422  25076 Jul 20 17:42 chain.c32
-rwxr--r--. 1 42422 42422 116064 Jul 20 17:42 ldlinux.c32
-rw-r--r--. 1 42422 42422     37 Jul 20 17:42 map-file
drwxr-xr-x. 2 42422 42422    114 Jul 20 21:46 master_images
-rwxr--r--. 1 42422 42422  42376 Jul 20 17:42 pxelinux.0
drwxr-xr-x. 2 42422 42422     21 Jul 21 11:17 pxelinux.cfg
-rwxr--r--. 1 42422 42422 170048 Jul 20 17:42 snponly.efi <==== this file exists instead of ipxe.efi
-rwxr--r--. 1 42422 42422  73125 Jul 20 17:42 undionly.kpxe
~~~

`uefi_ipxe_bootfile_name` of ironic.conf is changed to snponly.efi
However, `uefi_pxe_bootfile_name` of ironic.conf still refers to ipxe.efi
~~~
</var/lib/config-data/puppet-generated/ironic/etc/ironic/ironic.conf of undercloud>
#pxe_bootfile_name = pxelinux.0
uefi_pxe_bootfile_name=ipxe.efi     <=========(*)
#ipxe_bootfile_name = undionly.kpxe
uefi_ipxe_bootfile_name=snponly.efi <=========(*)
#pxe_bootfile_name_by_arch =
#ipxe_bootfile_name_by_arch =
~~~

`uefi_pxe_bootfile_name` is reflected to Neutron's DHCP configuration during a deployment.
~~~
</var/lib/neutron/dhcp/aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa/opts>
  :
tag:port-bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb,tag:ipxe,67,http://10.0.0.1:8088/boot.ipxe
tag:port-bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb,66,10.0.0.1
tag:port-bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb,tag:!ipxe,67,ipxe.efi <===============================(*)
tag:port-bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb,150,10.0.0.1
tag:port-bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb,option:server-ip-address,10.0.0.1
  :
~~~

This causes deployment failure when "boot_mode:uefi" is configured and "pxe" is used as boot-interface.
Because ipxe.efi doesn't exist.


uefi_pxe_bootfile_name is configured by openstack-tripleo-heat-template as below.
~~~
[yatanaka@yatanaka openstack-tripleo-heat-templates]$ ag uefi_pxe_bootfile_name
deployment/ironic/ironic-conductor-container-puppet.yaml
368:            ironic::drivers::pxe::uefi_pxe_bootfile_name: 'ipxe.efi'
~~~



Version-Release number of selected component (if applicable):
RHOSP 16.2.3
( I think this issue doesn't occur in RHOSP 16.2.2 because ipxe.efi is not replaced yet on 16.2.2 )


How reproducible:

My customer is encountering this issue when "pm_type: ilo" and "boot_mode:uefi" configured.

Steps to Reproduce:
1. install undercloud with "ipxe_enabled = true"
2. enroll overcloude node with "pm_type: ilo" according to document[1]
3. enable UEFI according to document[2]
4. deploy overcloud


I suppose we can reproduce this issue by the following steps.

1. install undercloud with "ipxe_enabled = true"
2. enroll overcloude node with "pm_type: ipmi".
3. enable UEFI according to document[2]
4. set "pxe" as boot-interface by the following command
  $ openstack baremetal node set <node> --boot-interface pxe
5. deploy overcloud


I believe this issue doesn't occur an RHOPS 16.2.3 environment upgraded from RHOSP 16.2.2 or earlier because this sort of environment have ipxe.efi, because ipxe.efi is not deleted by the update.


Actual results:
Deployment susceeds


Expected results:
Deployment fails due to absent of ipxe.efi


Additional info:

[1]https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html/director_installation_and_usage/assembly_power-management-drivers#ref_integrated-lights-out-ilo_power-management-drivers
[2]https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html/director_installation_and_usage/assembly_configuring-a-basic-overcloud#proc_setting-the-boot-mode-to-uefi-mode_basic

Comment 1 Julia Kreger 2022-07-26 14:04:05 UTC
Please have the customer set the boot_interface to "ipxe". The boot interfaces were split upstream to allow multiarchitecture and specific configuration differences, where most deployers prefer iPXE, for it they really ought to be using the specific boot_interface for it as opposed to the general pxe interface which would be for aspects such as network booting using other bootloaders such as GRUB.

It does appear that using the ilo hardware type will default their experience to the "pxe" interface, which means there are a few different smaller bugs there and just setting the boot_interface to "ipxe" should resolve the issue for now.

Comment 2 Julia Kreger 2022-07-26 15:46:17 UTC
I did some checking in puppet-ironic as well as ironic itself, and the default value being loaded for the uefi_pxe_bootfile_name parameter. It actually comes from the upstream project when not set.

Explicitly setting an override in puppet-ironic to the default may not be possible as it would introduce additional unexpected behavior, which also changes again in future releases. The customer can change it locally by asserting configuration override parameter "pxe/uefi_pxe_bootfile_name" to "snponly.efi".

As noted above, the default switches to "bootx64.efi" for use with GRUB network booting, which would be an unintended, yet different user experience. Overall this is rooted back in the creation of the node and the overall "default_boot_interface", or lack their of. The user could just change the boot interface of the node to "ipxe".

For furhter context, boot_interface selection occurs one of two ways. Via an explicit "default_boot_interface" parameter, which is asserted as the default. This could be passed in as an override parameter setting the default to "ipxe". If no default is defined, then is a union of the hardware type's supported interfaces and the enabled boot interfaces is used to identify the first value preferred by the hardware type. This is how the "ilo" hardware type ends up with this issue.


So to rehash, three options exist:
1) Just change the node's "boot_interface" to ipxe before inspecting/deploying.
2) Setup an override parameter for the puppet configuration to set a "pxe/uefi_pxe_bootfile_name" parameter to "snponly.efi". They will of course need to re-setup the undercloud to do this.
3) Setup an override parameter for the puppet configuration to set a "default_boot_interface" parameter to "ipxe". Again, they will need ot change their deployed undercloud configuration for his to be effective.

I'll leave this open to allow the larger team to review it upon our next bug triage session. I have also uploaded a change to fix the overall boot_interface order for the ilo hardware type, but I'm unsure if we will be able to backport that change at this time.

Comment 3 Takashi Kajinami 2022-07-27 02:58:03 UTC
I think the overall problem is (as Yamato pointed out in his initial analysis) that we hard-code [pxe] uefi_pxe_bootfile_name to ipxe.efi.
 https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ironic/ironic-conductor-container-puppet.yaml#L368

puppet-ironic provides the ironic::pxe class, which is commonly used by the other classes related to pxe/ipxe setup.
Current tht set the bootfile parameter of this base class according to the IronicIPXEUefiSnpOnly parameter here,
but the above override is preventing that common definition from being used.
 https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ironic/ironic-conductor-container-puppet.yaml#L468

I feel like the good solution here is to remove that hardcode, so that we use snponly.efi/ipxe.efi according to IronicIPXEUefiSnpOnly
in that pxe driver setting. The current default is a kind of broken and does not work.

Comment 4 Julia Kreger 2022-07-27 13:19:46 UTC
I agree it is a problem, but the node should not be using the "pxe" interface to begin with. It should be using the "ipxe" boot_interface.

I agree we ought to remove the hard code, unfortunately my plate is also full at the moment.

Comment 5 yatanaka 2022-08-01 03:07:50 UTC
Thank you for your help, Julia and Takashi.

> 1) Just change the node's "boot_interface" to ipxe before inspecting/deploying.

I asked the customer to run the following command,
  $ openstack baremetal node set <node> --boot-interface ipxe
but it failed due to the following error.
  boot interface implementation '<ironic.drivers.modules.ipxe.iPXEBoot object at 0x7f6784e886a0>' is not supported by hardware type IloHardware. (HTTP 400)

`ilo` driver only supports `ilo-pxe` boot interface, so I think this is expected behavior.
~~~
(undercloud) [stack@undercloud ~]$ openstack baremetal driver show ilo -c enabled_boot_interfaces
+-------------------------+---------+
| Field                   | Value   |
+-------------------------+---------+
| enabled_boot_interfaces | ilo-pxe |
+-------------------------+---------+
~~~

> 2) Setup an override parameter for the puppet configuration to set a "pxe/uefi_pxe_bootfile_name" parameter to "snponly.efi". They will of course need to re-setup the undercloud to do this.

I asked the customer to set "pxe/uefi_pxe_bootfile_name" parameter to "snponly.efi", and the customer reported us that the overcloud deployment succeeded.
Thank you for your help, again!

Comment 6 Julia Kreger 2022-08-01 21:26:53 UTC
Sorry, I forgot the ilo hardware type actually has their own boot interface alias which I've forgotten about, ilo-ipxe.

Which is also on the interface load order, but would need to be in the configuration as well. Obviously, in this case, thye are hard coded to ilo-pxe, and don't have ilo-ipxe enabled at all.

Interestingly enough, afaik, this is not a specific configuration we supply an example of. I guess they are just supplying a number of specific settings and one of them happens to just be the ilo-pxe boot interface on it's own.

I'm glad ot hear that just setting the template parameter worked for them.

Comment 7 Steve Baker 2022-12-05 21:26:34 UTC
Fix has landed upstream in Zed, targeting for 18.0


Note You need to log in before you can comment on or make changes to this bug.