Bug 1892302

Summary: Setting Supermicro node to PXE boot via Redfish doesn't take affect
Product: OpenShift Container Platform Reporter: Bob Fournier <bfournie>
Component: Bare Metal Hardware ProvisioningAssignee: Bob Fournier <bfournie>
Bare Metal Hardware Provisioning sub component: ironic QA Contact: Raviv Bar-Tal <rbartal>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: augol, dblack, rbartal, smalleni, tsedovic
Version: 4.6.zKeywords: OtherQA, TestBlocker, Triaged
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1888072 Environment:
Last Closed: 2020-11-30 16:45:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1888072    
Bug Blocks: 1888375    

Description Bob Fournier 2020-10-28 12:41:58 UTC
+++ This bug was initially created as a clone of Bug #1888072 +++

Description of problem:

Starting with a Supermicro node set to PXE boot (it was manually set via IPMI) we see Ironic able to successfully do a deployment and set the node to boot from disk using Redfish. However deploying a second time will fail because the node will keep bootinh to disk, it appears the Redfish command that Ironic send to change to PXE boot is not taking affect, perhaps because of the BootSourceOverrideEnabled setting.

The first time the node is set to boot from disk after writing the image:
2020-10-13 21:28:31.152 1 DEBUG sushy.connector [req-4841e280-8461-4351-a09a-5c8cfbe2c17a - - - - -] HTTP request: PATCH https://mgmt-f07-h13-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1; headers: {'OData-Version': '4.0'}; body: {'Boot': {'BootSourceOverrideTarget': 'Hdd', 'BootSourceOverrideEnabled': 'Continuous'}}; blocking: False; timeout: 60; session arguments: {}; _op /usr/lib/python3.6/site-packages/sushy/connector.py:102

And it takes affect and does boot from disk:
'IndicatorLED': 'Off', 'PowerState': 'On', 'Boot': {'BootSourceOverrideEnabled': 'Continuous', 'BootSourceOverrideMode': 'Legacy', 'BootSourceOverrideTarget': 'Hdd',

We see the command to not boot persistent:
2020-10-13 21:28:41.398 1 DEBUG sushy.connector [req-33f87403-3090-4afb-8c04-f8042aa61f81 - - - - -] HTTP request: PATCH https://mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1; headers: {'OData-Version': '4.0'}; body: {'Boot': {'BootSourceOverrideTarget': 'Hdd'}};

which results in BootSourceOverrideEnabled 'Once'
'PowerState': 'On', 'Boot': {'BootSourceOverrideEnabled': 'Once', 'BootSourceOverrideMode': 'Legacy', 'BootSourceOverrideTarget': 'Hdd',

And eventually:
'Boot': {'BootSourceOverrideEnabled': 'Disabled', 'BootSourceOverrideMode': 'Legacy', 'BootSourceOverrideTarget': 'Non

=========

On the second deployment we see:
'Boot': {'BootSourceOverrideEnabled': 'Disabled', 'BootSourceOverrideMode': 'Legacy', 'BootSourceOverrideTarget': 'None',

Then the command set back to PXE boot for introspection:
2020-10-13 19:56:45.095 1 DEBUG sushy.connector [req-6194fdaf-04ad-4c58-a51d-678af46bb6d3 - - - - -] HTTP request: PATCH https://mgmt-f06-h14-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1; headers: {'OData-Version': '4.0'}; body: {'Boot': {'BootSourceOverrideTarget': 'Pxe', 'BootSourceOverrideEnabled': 'Once'}}; blocking: False; timeout: 60; session arguments: {}; _op /usr/lib/python3.6/site-packages/sushy/connector.py:102^[[00m
2020-10-13 19:56:45.113 1 DEBUG sushy.connector [req-8b759858-1468-4051-98b8-a6bd4985df89 - - - - -] HTTP response for GET https://mgmt-f07-h13-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1: status code: 200 _op /usr/lib/python3.6/site-packages/sushy/connector.py:156

It is sent a 2nd time shortly after:
2020-10-13 19:56:45.113 1 DEBUG sushy.connector [req-8b759858-1468-4051-98b8-a6bd4985df89 - - - - -] HTTP request: PATCH https://mgmt-f07-h13-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1; headers: {'OData-Version': '4.0'}; body: {'Boot': {'BootSourceOverrideTarget': 'Pxe', 'BootSourceOverrideEnabled': 'Once'}}; blocking: False; timeout: 60; session arguments: {}; _op /usr/lib/python3.6/site-packages/sushy/connector.py:102

We can see in a subsequent get that the BootSourceOverrideEnabled and BootSourceOverrideTarget have changed:
IndicatorLED': 'Off', 'PowerState': 'Off', 'Boot': {'BootSourceOverrideEnabled': 'Once', 'BootSourceOverrideMode': 'Legacy', 'BootSourceOverrideTarget': 'Pxe', 'BootSourceOverrideTarget': ['None', 'Pxe', 'Floppy', 'Cd', 'Usb', 'Hdd', 'BiosSetup']},

ironic reboots the node (with this warning which is a separate issue):
020-10-13 19:56:59.846 1 WARNING sushy.resources.system.system [req-6194fdaf-04ad-4c58-a51d-678af46bb6d3 - - - - -] Could not figure out the allowed values for the reset system action for System 1^[[00m
2020-10-13 19:56:59.846 1 DEBUG sushy.connector [req-6194fdaf-04ad-4c58-a51d-678af46bb6d3 - - - - -] HTTP request: POST https://mgmt-f06-h14-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1/Actions/ComputerSystem.Reset; headers: {'OData-Version': '4.0'}; body: {'ResetType': 'On'}; blocking: False; timeout: 60; session arguments: {}; _op /usr/lib/python3.6/site-packages/sushy/connector.py:102

** However the node boots to disk, not PXE. **

Eventually the node will return:
Boot': {'BootSourceOverrideEnabled': 'Disabled', 'BootSourceOverrideMode': 'Legacy', 'BootSourceOverrideTarget': 'None',


This is with:
Hardware - Supermicro 1029P
Firmware Revision: 01.71.17
BIOS Version: 3.0a
Redfish Version: 1.0.1

--- Additional comment from Bob Fournier on 2020-10-14 12:04:57 UTC ---



--- Additional comment from Bob Fournier on 2020-10-14 18:26:13 UTC ---

Looks like the issue is that we need to set the mode to UEFI prior to PXE booting as it ends up reverting back to Legacy - 'BootSourceOverrideMode': 'Legacy',.  Working on a patch.

--- Additional comment from Bob Fournier on 2020-10-14 22:44:17 UTC ---

The Supermicro seems to require the setting of the boot mode along with the boot device when using Redfish, otherwise it reverts the boot mode to "Legacy".  Can illustrate this with a simple case:

Start with these 2 settings for 
$ curl -k --user XXXX https://10.1.41.239/redfish/v1/Systems/1/ | jq .
 "Boot": {
    "BootSourceOverrideEnabled": "Continuous",
    "BootSourceOverrideMode": "UEFI",
    "BootSourceOverrideTarget": "Pxe",

Then change only BootSourceOverrideEnabled and BootSourceOverrideTarget
$ curl -k --user XXXX -X PATCH -d '{"Boot": {"BootSourceOverrideTarget": "Pxe", "BootSourceOverrideEnabled": "Once"}}' https://10.1.41.239/redfish/v1/Systems/1/

The mode has flipped to Legacy
$ curl -k --user XXXX -X PATCH -d '{"Boot": {"BootSourceOverrideTarget": "Pxe", "BootSourceOverrideEnabled": "Once"}}' https://10.1.41.239/redfish/v1/Systems/1/
 "Boot": {
    "BootSourceOverrideEnabled": "Once",
    "BootSourceOverrideMode": "Legacy",
    "BootSourceOverrideTarget": "Pxe",

--- Additional comment from Bob Fournier on 2020-10-22 17:19:18 UTC ---

We've confirmed with Supermicro that the boot mode ("BootSourceOverrideMode") must be set in the Redfish request when setting the device ( "BootSourceOverrideTarget" and "BootSourceOverrideEnabled").  This is different than other vendors like Dell and HPE which require that the mode NOT be set in the same request - see https://review.opendev.org/#/c/710846/.

I have a patch upstream that will fix the issue for Supermicro and not break other vendors - https://review.opendev.org/#/c/758856/.  We've verified that nodes boot properly to PXE with this patch.  It's still pending upstream reviews.

Comment 5 errata-xmlrpc 2020-11-30 16:45:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.6 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5115