Bug 1885308 - Supermicro nodes failed to boot via disk during installation when using IPMI and UEFI
Summary: Supermicro nodes failed to boot via disk during installation when using IPMI ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Bob Fournier
QA Contact: Raviv Bar-Tal
URL:
Whiteboard:
Depends On:
Blocks: 1888069
TreeView+ depends on / blocked
 
Reported: 2020-10-05 15:03 UTC by Murali Krishnasamy
Modified: 2021-02-24 15:23 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Supermicro handles the setting of boot device via IPMI differently than other vendors. Consequence: When using IPMI and UEFI with Supermicro devices the nodes failed to boot from disk after the image was written to disk. Fix: Detect that the node is Supermicro and set the appropriate code in IPMI to boot from disk. Result: The Supermicro nodes correctly boot from disk after deployment.
Clone Of:
: 1888069 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:23:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ramdisk log after BOOT Mode changed to UEFI in BIOS config (572.04 KB, text/plain)
2020-10-07 21:39 UTC, Bob Fournier
no flags Details
ramdisk in uefi mode (638.37 KB, text/plain)
2020-10-08 16:04 UTC, Bob Fournier
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 757198 0 None MERGED IPMI: Handle vendor set boot device differences 2021-02-11 22:20:37 UTC
OpenStack gerrit 767583 0 None MERGED IPMI: Handle vendor set boot device differences 2021-02-11 22:20:37 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:23:35 UTC

Description Murali Krishnasamy 2020-10-05 15:03:57 UTC
Description of problem:

    OCP 4.6.0 deployment using baremetal-deploy(openshift-installer) on Supermicro nodes are not working as expected.

Once openshift-installer kicks in, it goes through a couple of boots using "PXE" on UEFI mode for RHCOS installation and finally boots with persistent "disk" on UEFI mode, this doesn't seem to be working properly in Supermicro(espl. 1029p) nodes. The logs have been captured during installation from the bootstrap VM and verified via IPMI tool.

Issue is, the final reboot with "disk" goes thru "PXE"(servers default boot order) and does OS installation again. It can be skipped only by interrupting manually via console and change the boot order to disk. 

The same flow is working fine in Dell servers with UEFI modes.

Ironic conductor logs
======================
2020-09-30 00:13:13.766 1 DEBUG ironic.common.utils [req-fd37b81f-fc6f-498e-beb4-77900f7b8b45 - - - - -] Execution completed, command line is "ipmitool -I lanplus -H mgmt-f06-h14-000-1029p.rdu2.scalelab.redhat.com -L ADMINISTRATOR -p 623 -U quads -R 1 -N 1 -f /tmp/tmp500y8ha3 chassis bootdev pxe options=efiboot" execute /usr/lib/python3.6/site-packages/ironic/common/utils.py:77

2020-09-30 00:06:30.874 1 DEBUG ironic.common.utils [req-579626f8-3b40-4824-ae41-ac4036196f22 - - - - -] Execution completed, command line is "ipmitool -I lanplus -H mgmt-f06-h14-000-1029p.rdu2.scalelab.redhat.com -L ADMINISTRATOR -p 623 -U quads -R 1 -N 1 -f /tmp/tmpdzwtq6pz chassis bootdev pxe options=efiboot" execute /usr/lib/python3.6/site-packages/ironic/common/utils.py:77

2020-09-30 00:19:46.761 1 DEBUG oslo_concurrency.processutils [req-14dae137-9147-40d4-bbf6-48244f15d950 - - - - -] Running cmd (subprocess): ipmitool -I lanplus -H mgmt-f06-h14-000-1029p.rdu2.scalelab.redhat.com -L ADMINISTRATOR -p 623 -U quads -R 1 -N 1 -f /tmp/tmp0e5_f3xu raw 0x00 0x08 0x05 0xe0 0x08 0x00 0x00 0x00 execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372

IPMI tool on boot status
========================
While PXE boot
# ipmitool -I lanplus -U xxxxx -P xxxxx -H mgmt-f06-h14-000-1029p.rdu2.scalelab.redhat.com chassis bootparam get 5
Boot parameter version: 1
Boot parameter 5 is valid/unlocked
Boot parameter data: a004000000
 Boot Flags :
   - Boot Flag Valid
   - Options apply to only next boot
   - BIOS EFI boot 
   - Boot Device Selector : Force PXE
   - Console Redirection control : System Default
   - BIOS verbosity : Console redirection occurs per BIOS configuration setting (default)
   - BIOS Mux Control Override : BIOS uses recommended setting of the mux at the end of POST

While Disk boot
# ipmitool -I lanplus -U xxxxx -P xxxxx -H mgmt-f06-h14-000-1029p.rdu2.scalelab.redhat.com chassis bootparam get 5
Boot parameter version: 1
Boot parameter 5 is valid/unlocked
Boot parameter data: e008000000
 Boot Flags :
   - Boot Flag Valid
   - Options apply to all future boots
   - BIOS EFI boot 
   - Boot Device Selector : Force Boot from default Hard-Drive
   - Console Redirection control : System Default
   - BIOS verbosity : Console redirection occurs per BIOS configuration setting (default)
   - BIOS Mux Control Override : BIOS uses recommended setting of the mux at the end of POST


Version-Release number of selected component (if applicable):
OCP 4.6.0


How reproducible: Consistently on Supermicro nodes


Steps to Reproduce:
1. Just deploy a fresh cluster on Supermicro nodes using beremetal-deploy or openshift-installer
2. Observe the final boot order and the behavior on console


Actual results:
Final boot is using PXE and does OS installation again by the PXE server


Expected results:
Final boot should be via disk as in Dell server


Additional info:
Hardware - Supermicro 1029P
Firmware Revision: 01.71.17
BIOS Version: 3.0a
Redfish Version: 1.0.1

Comment 1 Iury Gregory Melo Ferreira 2020-10-05 18:26:50 UTC
I've never used supermicro before, but I have some questions that may help us understand a bit of what is happening.

1- Just to confirm you are usuing the ipmi driver?
2- Have you try using BIOS instead of UEFI? I'm wondering if the same problem persists or not.
3- This sounds a bit of firmware version problem, maybe try updating and testing with bios and uefi to see how it goes? 

Thanks!

Comment 2 Sai Sindhur Malleni 2020-10-05 18:54:28 UTC
(In reply to Iury Gregory Melo Ferreira from comment #1)
> I've never used supermicro before, but I have some questions that may help
> us understand a bit of what is happening.
> 
> 1- Just to confirm you are usuing the ipmi driver?
Yes, ipmi driver
> 2- Have you try using BIOS instead of UEFI? I'm wondering if the same
> problem persists or not.
BIOS works
> 3- This sounds a bit of firmware version problem, maybe try updating and
> testing with bios and uefi to see how it goes? 
> 
> Thanks!

Comment 3 Murali Krishnasamy 2020-10-05 21:10:38 UTC
Openshift-installer 4.6.0 is also failing to boot Supermicro nodes with UEFI mode. I have updated this ticket to OCP 4.6.0

Same behavior as seen in previous version, UEFI is failing, using IPMI driver but BIOS mode works.

Comment 5 Murali Krishnasamy 2020-10-06 19:26:55 UTC
Failed again with updated firmware and BIOS.

Firmware Revision: 01.71.19
BIOS Version: 3.3

Comment 6 Bob Fournier 2020-10-06 20:13:42 UTC
Looks like an issue installing the whole disk image via UEFI.  On the conductor I see:
2020-10-05 20:27:26.325 1 INFO ironic.drivers.modules.agent_base [req-3e8e212e-e1bc-429f-bbe9-8a77605c9791 - - - - -] Could not install bootloader for whole disk image for node 44ab8813-3d0e-47f8-b3b7-244b4c75b4fa, Error: No partition with UUID 0x00000000 found on device /dev/sda"

This error is repeated for 3 nodes:

In the corresponding ramdisk image for node 44ab8813-3d0e-47f8-b3b7-244b4c75b4fa we get:

ct 05 16:27:20 master-0 ironic-python-agent[2176]: 2020-10-05 16:27:20.423 2176 ERROR root [-] Command failed: install_bootloader, error: Error finding the disk or partition device to deploy the image onto: No partition with UUID 0x00000000 found on device /dev/sda: ironic_python_agent.errors.DeviceNotFound: Error finding the disk or partition device to deploy the image onto: No partition with UUID 0x00000000 found on device /dev/sda
                                                    2020-10-05 16:27:20.423 2176 ERROR root Traceback (most recent call last):
                                                    2020-10-05 16:27:20.423 2176 ERROR root   File "/usr/lib/python3.6/site-packages/ironic_python_agent/extensions/base.py", line 163, in run
                                                    2020-10-05 16:27:20.423 2176 ERROR root     result = self.execute_method(**self.command_params)
                                                    2020-10-05 16:27:20.423 2176 ERROR root   File "/usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py", line 753, in install_bootloader
                                                    2020-10-05 16:27:20.423 2176 ERROR root     target_boot_mode=target_boot_mode)
                                                    2020-10-05 16:27:20.423 2176 ERROR root   File "/usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py", line 503, in _install_grub2
                                                    2020-10-05 16:27:20.423 2176 ERROR root     root_partition = _get_partition(device, uuid=root_uuid)
                                                    2020-10-05 16:27:20.423 2176 ERROR root   File "/usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py", line 132, in _get_partition
                                                    2020-10-05 16:27:20.423 2176 ERROR root     raise errors.DeviceNotFound(error_msg)
                                                    2020-10-05 16:27:20.423 2176 ERROR root ironic_python_agent.errors.DeviceNotFound: Error finding the disk or partition device to deploy the image onto: No partition with UUID 0x00000000 found on device /dev/sda
                                                    2020-10-05 16:27:20.423 2176 ERROR root


Which is why booting off disk is failing when using UEFI.

What's the version of ironic-conductor pkg being used?  I realize this is 4.5 but can you get the pkg version from the ironic container?

Comment 7 Murali Krishnasamy 2020-10-06 20:29:17 UTC
Thanks Bob for the update, These are from 4.6 build.

openstack-ironic-api-15.1.1-0.20200724075308.3e92fd0.el8ost.noarch
python3-ironic-lib-4.3.0-0.20200605221931.df238ba.el8ost.noarch
openstack-ironic-common-15.1.1-0.20200724075308.3e92fd0.el8ost.noarch
openstack-ironic-conductor-15.1.1-0.20200724075308.3e92fd0.el8ost.noarch
python3-ironic-prometheus-exporter-0.0.1-0.20190712090404.f7e9344.el8ost.noarch

Comment 8 Bob Fournier 2020-10-06 23:05:42 UTC
Talking with Julia, Sai, and Murali - Julia thought based on the errors in Comment 6 that the Supermicro is actually in BIOS mode and ironic-python-agent attempts to boot with UEFI.  We were able to confirm that in the IPA logs here:
Oct 05 16:25:14 master-0 ironic-python-agent[2176]: 2020-10-05 16:25:14.946 2176 DEBUG root [-] The current boot mode is bios get_boot_info /usr/lib/python3.6/site-packages/ironic_python_agent/hardware.py:1149

Which is set here https://github.com/openstack/ironic-python-agent/blob/99dee5067ea4f06d3083170d801e600f46842170/ironic_python_agent/hardware.py#L1189 based on '/sys/firmware/efi' not being present.

We'll need to get into the BIOS configuration on the Supermicro and set it for UEFI. Julia found these notes from Supermicro that may be useful - https://www.supermicro.com/support/faqs/faq.cfm?faq=22208.

Comment 9 Dmitry Tantsur 2020-10-07 09:54:03 UTC
> ironic_python_agent.errors.DeviceNotFound: Error finding the disk or partition device to deploy the image onto: No partition with UUID 0x00000000 found on device /dev/sda

This is a red herring, ignore it. At some point we should just stop calling the code that can never succeed..

> Which is why booting off disk is failing when using UEFI.

Whole disk images can boot even if grub installation fails (it always does exactly the way you show).

Comment 10 Dmitry Tantsur 2020-10-07 10:00:36 UTC
While the error is a red herring, I do agree with Bob's conclusions. OpenShift requests UEFI boot, but the node is actually booted in legacy mode. Since IPMI cannot change the boot mode, it has to be done manually. Could you try it?

If you cannot make such modifications, you need to use legacy (BIOS) boot.

Comment 11 Bob Fournier 2020-10-07 21:39:48 UTC
Created attachment 1719855 [details]
ramdisk log after BOOT Mode changed to UEFI in BIOS config

Comment 12 Bob Fournier 2020-10-07 21:40:59 UTC
So we made some progress changing to UEFI mode in the BIOS configuration and the bootloader install works now, but we still are not booting off the hard drive even though Ironic is setting it via ipmitool and it appears to be set correctly according to ipmitool.

In the BIOS configuration Kambiz changed the Boot Mode from dual to UEFI?

After that we see the current boot mode detected correctly in IPA:
Oct 07 15:10:57 f06-h15-000-1029p.rdu2.scalelab.redhat.com ironic-python-agent[2175]: 2020-10-07 15:10:57.639 2175 DEBUG root [-] The current boot mode is uefi get_boot_info /usr/lib/python3.6/site-packages/ironic_python_agent/hardware.py:1149

Oct 07 15:12:35 f06-h15-000-1029p.rdu2.scalelab.redhat.com ironic-python-agent[2175]: 2020-10-07 15:12:35.261 2175 DEBUG ironic_lib.utils [-] Command stdout is: "Model: ATA INTEL SSDSC2BB48 (scsi)
                                                                                      Disk /dev/sda: 480GB
                                                                                      Sector size (logical/physical): 512B/4096B
                                                                                      Partition Table: gpt
                                                                                      Disk Flags:

                                                                                      Number  Start   End     Size    File system  Name        Flags
                                                                                       1      1049kB  404MB   403MB   ext4         boot
                                                                                       2      404MB   537MB   133MB   fat16        EFI-SYSTEM  boot, esp
                                                                                       3      537MB   538MB   1049kB               BIOS-BOOT   bios_grub
                                                                                       4      538MB   3553MB  3015MB               luks_root
                                                                                       5      480GB   480GB   68.0MB

Ironic sets the boot device by:
2020-10-07 19:12:55.272 1 DEBUG oslo_concurrency.processutils [req-829d1ae7-dc1a-4f3b-b9f2-f5ec45701931 - - - - -] CMD "ipmitool -I lanplus -H mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com -L ADMINISTRATOR -p 623 -U quads -R 1 -N 5 -f /tmp/tmpb7xfpvoq raw 0x00 0x08 0x05 0xe0 0x08 0x00 0x00 0x00" returned: 0 in 0.045s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:416

And it looks like its set correctly:
ipmitool -I lanplus -U quads -P rdu2@479 -H mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com chassis bootparam get 5
Boot parameter version: 1
Boot parameter 5 is valid/unlocked
Boot parameter data: a004000000
 Boot Flags :
   - Boot Flag Valid
   - Options apply to only next boot
   - BIOS EFI boot 
   - Boot Device Selector : Force PXE
   - Console Redirection control : System Default
   - BIOS verbosity : Console redirection occurs per BIOS configuration setting (default)
   - BIOS Mux Control Override : BIOS uses recommended setting of the mux at the end of POST

But the next boot is PXE and the dnmasq ends up providing inspector.ipxe

I've attached the ramdisk log for this node.

Comment 13 Bob Fournier 2020-10-07 21:42:49 UTC
Actually I grabbed the wrong output from ipmitool, it should be this one that shows its set to boot of the hard-drive:

# ipmitool -I lanplus -U quads -P rdu2@479 -H mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com chassis bootparam get 5
Boot parameter version: 1
Boot parameter 5 is valid/unlocked
Boot parameter data: e008000000
 Boot Flags :
   - Boot Flag Valid
   - Options apply to all future boots
   - BIOS EFI boot 
   - Boot Device Selector : Force Boot from default Hard-Drive
   - Console Redirection control : System Default
   - BIOS verbosity : Console redirection occurs per BIOS configuration setting (default)
   - BIOS Mux Control Override : BIOS uses recommended setting of the mux at the end of POST

Comment 14 Bob Fournier 2020-10-07 21:48:22 UTC
Also we tested a manual reboot via ipmi with the settings as in Comment 13 (Force Boot from default Hard-Drive) as this also resulted in a PXE boot.

Comment 15 Ramon Acedo 2020-10-08 11:42:26 UTC
Would replacing IPMI by Redfish to manage this node work?

Comment 16 Iury Gregory Melo Ferreira 2020-10-08 12:00:03 UTC
(In reply to Ramon Acedo from comment #15)
> Would replacing IPMI by Redfish to manage this node work?

I think they don't have the redfish license, but would be really good to see if it would work with redfish + uefi.

Comment 18 Dmitry Tantsur 2020-10-08 15:45:40 UTC
Do you have ramdisk logs from comment 12? The ones in comment 11 still show "The current boot mode is bios".

Comment 19 Murali Krishnasamy 2020-10-08 16:01:30 UTC
Changed BIOS configuration to boot UEFI mode for "mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com" host only, other hosts are still in BIOS mode.

Oct 07 15:10:57 f06-h15-000-1029p.rdu2.scalelab.redhat.com ironic-python-agent[2175]: 2020-10-07 15:10:57.639 2175 DEBUG root [-] The current boot mode is uefi get_boot_info /usr/lib/python3.6/site-packages/ironic_python_agent/hardware.py:1149

Comment 20 Bob Fournier 2020-10-08 16:04:29 UTC
Created attachment 1720006 [details]
ramdisk in uefi mode

Comment 21 Dmitry Tantsur 2020-10-08 16:41:28 UTC
The boot order seems correct for the last run:

Oct 07 15:12:36 f06-h15-000-1029p.rdu2.scalelab.redhat.com ironic-python-agent[2175]: 2020-10-07 15:12:36.014 2175 DEBUG ironic_lib.utils [-] Execution completed, command line is "efibootmgr -c -d /dev/sda -p 2 -w -L ironic1 -l \EFI\BOOT\BOOTX64.EFI" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:101
Oct 07 15:12:36 f06-h15-000-1029p.rdu2.scalelab.redhat.com ironic-python-agent[2175]: 2020-10-07 15:12:36.027 2175 DEBUG ironic_lib.utils [-] Command stdout is: "BootCurrent: 0007
 Timeout: 1 seconds
 BootOrder: 0000,0004,0005,0006,0007,0008,0009,000A,000B,000C,000D,000E,000F,0003,0001
 Boot0001  Hard Drive
 Boot0003* UEFI: Built-in EFI Shell
 Boot0004* (B28/D0/F0) UEFI: PXE IPv4 Intel(R) Ethernet Connection X722 for 1GbE(MAC:0cc47afa192a)
 Boot0005* (B28/D0/F1) UEFI: PXE IPv4 Intel(R) Ethernet Connection X722 for 1GbE(MAC:0cc47afa192b)
 Boot0006* (B94/D0/F0) UEFI: PXE IPv4 Intel(R) Ethernet Controller X710 for 10GbE SFP+(MAC:ac1f6b2d19d4)
 Boot0007* (B94/D0/F1) UEFI: PXE IPv4 Intel(R) Ethernet Controller X710 for 10GbE SFP+(MAC:ac1f6b2d19d5)
 Boot0008* (B94/D0/F2) UEFI: PXE IPv4 Intel(R) Ethernet Controller X710 for 10GbE SFP+(MAC:ac1f6b2d19d6)
 Boot0009* (B94/D0/F3) UEFI: PXE IPv4 Intel(R) Ethernet Controller X710 for 10GbE SFP+(MAC:ac1f6b2d19d7)
 Boot000A* (B28/D0/F0) UEFI: PXE IPv6 Intel(R) Ethernet Connection X722 for 1GbE(MAC:0cc47afa192a)
 Boot000B* (B28/D0/F1) UEFI: PXE IPv6 Intel(R) Ethernet Connection X722 for 1GbE(MAC:0cc47afa192b)
 Boot000C* (B94/D0/F0) UEFI: PXE IPv6 Intel(R) Ethernet Controller X710 for 10GbE SFP+(MAC:ac1f6b2d19d4)
 Boot000D* (B94/D0/F1) UEFI: PXE IPv6 Intel(R) Ethernet Controller X710 for 10GbE SFP+(MAC:ac1f6b2d19d5)
 Boot000E* (B94/D0/F2) UEFI: PXE IPv6 Intel(R) Ethernet Controller X710 for 10GbE SFP+(MAC:ac1f6b2d19d6)
 Boot000F* (B94/D0/F3) UEFI: PXE IPv6 Intel(R) Ethernet Controller X710 for 10GbE SFP+(MAC:ac1f6b2d19d7)
 Boot0000* ironic1
 " execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:103

The current hypothesis is that calling `ipmitool bootdev` after that confuses the EFI firmware and resets the boot order to something else. A potential fix is to call ipmitool before efibootmgr, https://review.opendev.org/#/c/756881/ achieves exactly that. We need to find a way to test it.

Comment 24 Murali Krishnasamy 2020-10-10 00:20:35 UTC
We tried using redfish on these models and it rebooted properly with disk + efi on reboot, Ironic conductor logs,
first PXE boot,
2020-10-09 17:50:32.334 1 DEBUG sushy.connector [req-06df134d-6bef-4af9-9cd4-b4ce00e5a77a - - - - -] HTTP request: PATCH https://mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1; headers: {'OData-Version': '4.0'}; body: {'Boot': {'BootSourceOverrideTarget': 'Pxe', 'BootSourceOverrideEnabled': 'Once'}}; blocking: False; timeout: 60; session arguments: {}; _op /usr/lib/python3.6/site-packages/sushy/connector.py:102
2020-10-09 17:50:32.543 1 DEBUG sushy.connector [req-06df134d-6bef-4af9-9cd4-b4ce00e5a77a - - - - -] HTTP response for PATCH https://mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1: status code: 200 _op /usr/lib/python3.6/site-packages/sushy/connector.py:156

reboot after deploy,
2020-10-09 18:01:24.471 1 DEBUG sushy.connector [req-9edbf591-f8f5-4ce4-884f-9c391253ddd2 - - - - -] HTTP request: PATCH https://mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1; headers: {'OData-Version': '4.0'}; body: {'Boot': {'BootSourceOverrideTarget': 'Hdd'}}; blocking: False; timeout: 60; session arguments: {}; _op /usr/lib/python3.6/site-packages/sushy/connector.py:102
2020-10-09 18:01:24.716 1 DEBUG sushy.connector [req-9edbf591-f8f5-4ce4-884f-9c391253ddd2 - - - - -] HTTP response for PATCH https://mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1: status code: 200 _op /usr/lib/python3.6/site-packages/sushy/connector.py:156

But observed some inconsistency while using redfish + uefi on a re-deploying node(or a node which had persistent disk option set in BIOS), that first PXE boot set by redfish is not recognized by the server, it continued to boot via disk.  We had to manually reset the boot option to pxe + uefi before re-deployment. It is reproducible in this server model.

Comment 25 Julia Kreger 2020-10-13 23:38:23 UTC
(In reply to Murali Krishnasamy from comment #24)
> We tried using redfish on these models and it rebooted properly with disk +
> efi on reboot, Ironic conductor logs,
> first PXE boot,
> 2020-10-09 17:50:32.334 1 DEBUG sushy.connector
> [req-06df134d-6bef-4af9-9cd4-b4ce00e5a77a - - - - -] HTTP request: PATCH
> https://mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1;
> headers: {'OData-Version': '4.0'}; body: {'Boot':
> {'BootSourceOverrideTarget': 'Pxe', 'BootSourceOverrideEnabled': 'Once'}};
> blocking: False; timeout: 60; session arguments: {}; _op
> /usr/lib/python3.6/site-packages/sushy/connector.py:102
> 2020-10-09 17:50:32.543 1 DEBUG sushy.connector
> [req-06df134d-6bef-4af9-9cd4-b4ce00e5a77a - - - - -] HTTP response for PATCH
> https://mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1:
> status code: 200 _op /usr/lib/python3.6/site-packages/sushy/connector.py:156
> 
> reboot after deploy,
> 2020-10-09 18:01:24.471 1 DEBUG sushy.connector
> [req-9edbf591-f8f5-4ce4-884f-9c391253ddd2 - - - - -] HTTP request: PATCH
> https://mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1;
> headers: {'OData-Version': '4.0'}; body: {'Boot':
> {'BootSourceOverrideTarget': 'Hdd'}}; blocking: False; timeout: 60; session
> arguments: {}; _op /usr/lib/python3.6/site-packages/sushy/connector.py:102
> 2020-10-09 18:01:24.716 1 DEBUG sushy.connector
> [req-9edbf591-f8f5-4ce4-884f-9c391253ddd2 - - - - -] HTTP response for PATCH
> https://mgmt-f06-h15-000-1029p.rdu2.scalelab.redhat.com/redfish/v1/Systems/1:
> status code: 200 _op /usr/lib/python3.6/site-packages/sushy/connector.py:156
> 
> But observed some inconsistency while using redfish + uefi on a re-deploying
> node(or a node which had persistent disk option set in BIOS), that first PXE
> boot set by redfish is not recognized by the server, it continued to boot
> via disk.  We had to manually reset the boot option to pxe + uefi before
> re-deployment. It is reproducible in this server model.

Please open a specific bug covering the redfish behavior issues you've observed. Due to the cross-product nature we'll need to duplicate the bugs and track them independently through testing to final resolution. If you can also provide a direct curl of https://bmc_ip/redfish/v1/Systems/1 before, and after, as well as what the exact steps that were taken to reset it to UEFI + PXE given idealy the BMC should have refused the initial request but clearly we have some sort of bug in the behavior that we need to sort out.

Comment 26 Bob Fournier 2020-10-14 00:14:41 UTC
Regarding the Redfish issue in Comment 25, I created https://bugzilla.redhat.com/show_bug.cgi?id=1888072 and one upstream -https://storyboard.openstack.org/#!/story/2008252.

Comment 27 Bob Fournier 2020-10-19 12:48:39 UTC
Removing NeedInfo as Redfish bug has been created and this IPMI issue is understood.

Comment 28 Bob Fournier 2020-10-26 11:55:59 UTC
Upstream patch is still under review.

Comment 30 Bob Fournier 2020-12-17 18:19:11 UTC
Backported to Victoria.

Comment 31 Bob Fournier 2021-01-05 16:16:05 UTC
Patch has merged and is included in tagged package  openstack-ironic-16.0.3-0.20201219231205.4ae5375.el8
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1424819

Comment 35 errata-xmlrpc 2021-02-24 15:23:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.