Description of problem: When metal3 deploys Supermicro nodes with Redfish and UEFI the deployment goes fine - the nodes run inspection via fast-track, the RHCOS image is installed on the node, and the nodes boot to that image on disk. On the subsequent reboot however the node attempts to PXE boot instead of booting to disk and it installs the inspector image and eventually stays looping in IPA. The deployment fails. The reason appears to be because the Redfish request to set to boot to disk Persistent (BootSourceOverrideEnabled': 'Continuous): 2021-01-20 23:54:49.103 1 DEBUG sushy.connector [req-d3cc400a-5752-400d-b77d-f8b2b672de4c - - - - -] HTTP request: PATCH https://mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1; headers: {'OData-Version': '4.0'}; body: {'Boot': {'BootSourceOverrideTarget': 'Hdd', 'BootSourceOverrideEnabled': 'Continuous'}} Is then followed a request to boot to disk WITHOUT the BootSourceOverrideEnabled set: 2021-01-20 23:54:52.045 1 DEBUG sushy.connector [req-d3cc400a-5752-400d-b77d-f8b2b672de4c - - - - -] HTTP request: PATCH https://mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1; headers: {'OData-Version': '4.0'}; body: {'Boot': {'BootSourceOverrideTarget': 'Hdd'}} This is then followed by a set of boot mode (to fix https://bugzilla.redhat.com/show_bug.cgi?id=1888072): 2021-01-20 23:54:53.360 1 DEBUG sushy.connector [req-d3cc400a-5752-400d-b77d-f8b2b672de4c - - - - -] HTTP request: PATCH https://mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1; headers: {'OData-Version': '4.0'}; body: {'Boot': {'BootSourceOverrideMode': 'UEFI'}} As a result, before the boot to disk the Redfish settings are: { "BootSourceOverrideEnabled": "Once", "BootSourceOverrideMode": "UEFI", "BootSourceOverrideTarget": "Hdd", This works fine for the first boot but after that the Redfish settings read are: "BootSourceOverrideEnabled": "Disabled", "BootSourceOverrideMode": "Legacy", "BootSourceOverrideTarget": "None", So all subsequent boots will be to PXE which is the default. Note that we tried to change the Boot Order in BIOS so that HDD is first. It would seem that if the BootSourceOverrideTarget was not set then it would use this boot order but it didn't work. Version-Release number of selected component (if applicable): This is seen on 4.7 build from 12/18 # rpm -qa | grep ironic python3-ironic-prometheus-exporter-0.0.1-0.20190712090404.f7e9344.el8ost.noarch python3-ironic-lib-4.4.1-0.20201120173800.e1b5e12.el8.noarch openstack-ironic-common-16.0.3-0.20201219231205.4ae5375.el8.noarch openstack-ironic-conductor-16.0.3-0.20201219231205.4ae5375.el8.noarch openstack-ironic-api-16.0.3-0.20201219231205.4ae5375.el8.noarch How reproducible: Happens every time. Steps to Reproduce: 1. Run ansible install script to deploy the cluster. Actual results: Nodes boot to PXE after first booting to disk after image installed. Expected results: Nodes boot to disk every time after image is installed. Additional info: Its likely the issue is here in sushy - https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/redfish/management.py#L96 The BootSourceOverrideTarget won't be set if its getting set to the same value. However Supermicro requires it to be set each time.
This is related to https://review.opendev.org/c/openstack/ironic/+/731644 which references http://lists.openstack.org/pipermail/openstack-discuss/2020-April/014543.html and provides context on how BootSourceOverrideEnabled should be be used.
We tried setting pxe.enable_netboot_fallback in ironic.conf but that had no affect and the Supermicro node still booted to PXE after an external reboot. We then changed this line https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/redfish/management.py#L96 to not check if BootSourceOverideEnabled is already set and it works fine. The 2nd PATCH request matches the 1st PATCH request and it sets Continuous. After RHCOS reboots the node (outside of Ironic) the node boots to disk.
Fix has merged and is available in brew https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1486293
Got a chance to test this patch on Supermicro 1029Ps using 4.7.0-0.nightly-2021-02-04-031352 and it is verified. We could see the logs setting HDD boot with "Continuous" flag, 2021-02-15 21:22:01.566 1 DEBUG sushy.connector [req-d8db6171-f384-48c6-b6ac-afe873710e9b - - - - -] HTTP request: PATCH https://mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1; headers: {'OData-Version': '4.0'}; body: {'Boot': {'BootSourceOverrideTarget': 'Hdd', 'BootSourceOverrideEnabled': 'Continuous'}}; blocking: False; timeout: 60; session arguments: {}; _op /usr/lib/python3.6/site-packages/sushy/connector.py:102 2021-02-15 21:22:03.056 1 DEBUG sushy.connector [req-d8db6171-f384-48c6-b6ac-afe873710e9b - - - - -] HTTP request: PATCH https://mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1; headers: {'OData-Version': '4.0'}; body: {'Boot': {'BootSourceOverrideMode': 'UEFI'}}; blocking: False; timeout: 60; session arguments: {}; _op /usr/lib/python3.6/site-packages/sushy/connector.py:102 and on second request as well, 2021-02-15 21:22:04.501 1 DEBUG sushy.connector [req-d8db6171-f384-48c6-b6ac-afe873710e9b - - - - -] HTTP request: PATCH https://mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1; headers: {'OData-Version': '4.0'}; body: {'Boot': {'BootSourceOverrideTarget': 'Hdd', 'BootSourceOverrideEnabled': 'Continuous'}}; blocking: False; timeout: 60; session arguments: {}; _op /usr/lib/python3.6/site-packages/sushy/connector.py:102 2021-02-15 21:22:05.959 1 DEBUG sushy.connector [req-d8db6171-f384-48c6-b6ac-afe873710e9b - - - - -] HTTP request: PATCH https://mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1; headers: {'OData-Version': '4.0'}; body: {'Boot': {'BootSourceOverrideMode': 'UEFI'}}; blocking: False; timeout: 60; session arguments: {}; _op /usr/lib/python3.6/site-packages/sushy/connector.py:102 Redfish boot config is set with same and it booted properly, { "BootSourceOverrideEnabled": "Continuous", "BootSourceOverrideMode": "UEFI", "BootSourceOverrideTarget": "Hdd", "BootSourceOverrideTarget": [ "None", "Pxe", "Floppy", "Cd", "Usb", "Hdd", "BiosSetup" ] } Thanks Bob for working on this.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633