Created attachment 1955208 [details] UEFI boot issue Description of problem: On a Dell PowerEdge R7525, we cannot boot the hardened overcloud image without secure boot enabled in BIOS. We also had similar issues in the past on RHOSP17.0 based releases. Enabling secure boot allows us to boot the installed image. Version-Release number of selected component (if applicable): rpm -qa | grep rhosp-director rhosp-director-images-base-17.1-20230328.1.test.el9ost.noarch rhosp-director-images-metadata-17.1-20230328.1.test.el9ost.noarch rhosp-director-images-ipa-x86_64-17.1-20230328.1.test.el9ost.noarch rhosp-director-images-x86_64-17.1-20230328.1.test.el9ost.noarch rhosp-director-images-17.1-20230328.1.test.el9ost.noarch rhosp-director-images-minimal-17.1-20230328.1.test.el9ost.noarch rhosp-director-images-uefi-x86_64-17.1-20230328.1.test.el9ost.noarch How reproducible: Always on our hardware. Steps to Reproduce: 1. Deploy RHOSP17.1 Actual results: Overcloud node provisioning fails. Expected results: Overcloud node provisioning passes. Additional info: It happens on two of our Dell AMD hardware; we cannot reproduce it on our Dell Intel hardware.
On the affected hosts, we have upgraded BIOS/iDRAC/drivers and firmware to the latest up-to-date versions provided by Dell.
Since this appears to be an AMD vs Intel issue, we'll need to gather enough information to assign this to RHEL. There have been a handful of AMD related bugs recently. We need some logs or screenshots of the boot failures, then we can take it from there.
Created attachment 1955685 [details] UEFI Successful Boot Hi Steve, Which information can I provide to help with this? On the AMD side, the issue in the image attached happens 10 seconds after picking the boot entry in the grub2 menu. I am attaching a picture of a successful pre-boot (using secure boot) output pre kernel boot. I will be on PTO until the 13th of April. Please feel free to put a needinfo on me with additional information you need.
Except what occurs when it is unsuccessful as secure boot has been disabled? ... Do we know if console=tty0 is on the kernel command line which grub is passing in? My only bit of concern though, is if we go to rhel, they may kick the bug back stating that Secure Boot is the intended case to be supported. From their point of view, it would be reasonable given they would much prefer everyone to be using it by default.
Actually, disregard my last w/r/t the red screen. The issue is the underlying system is faulting the startup because it is not enforcing secure boot. I suspect it might be a firmware issue which Dell needs to investigate. They may too say Secure Boot is the intended/design default.
Thanks, Julia, If the main suspicion is related to Dell firmware, I will ask some folks to attempt to deploy the same release on their AMD platforms, but I am unsure when they will be able to.
*** Bug 2179009 has been marked as a duplicate of this bug. ***
This bug was produced on a Dell PowerEdge R7525 with 2x `AMD EPYC 7702 64-Core Processor`. Nova squad successfully deployed OpenStack platform on Dell PowerEdge with 2x `AMD EPYC 7402 24-Core Processor`. smooney has mentioned that he also had an issue deploying RHEL 9.2 on top of a host with `AMD EPYC 7702` CPU, but we haven't followed up if this is the same issue as observed in this bug.
A new BIOS update was released today, `2.11.3`, for this PowerEdge model. I will be updating and attempting to redeploy OpenStack on the problematic platform.
(In reply to Vadim Khitrin from comment #23) > A new BIOS update was released today, `2.11.3`, for this PowerEdge model. > I will be updating and attempting to redeploy OpenStack on the problematic > platform. Is there a log from that install of rhel 9.2 on thje Dell r7525? I'm curious to see if there is an attempt to change the EFI boot order using efibootmgr. On the other Dell systems where this install worked, was the EFI boot order also "redhat" last, specifically after a PXE boot that attempts to load snponly.efi? -Lenny.
Here is a status update on my investigation. First off, I started investigating this problem in the context of BZ 2190353. There I explained why enabling secure boot avoids the reported problem: https://bugzilla.redhat.com/show_bug.cgi?id=2190353#c3 https://bugzilla.redhat.com/show_bug.cgi?id=2190353#c9 In short, when UEFI Boot Device Selection attempts a rhel boot, subsequent to a PXE boot that starts up snponly.efi but fails, then the rhel boot faults in UEFI firmware. Enabling Secure Boot prevents the execution of snponly.efi because it correctly fails signature authentication. At the time that BZ 2190353 was closed, I still wasn't sure if the UEFI firmware fault was occurring while the system was still in grub, or if it had transferred into the kernel. I had narrowed it down to right after the initrd had been read into memory. But I didn't know if grub had gotten to the point where it had transfered control to the kernel. So, I don't fully understand why BZ 2190353 was closed. Additionally, my suspicion was that snponly.efi, or boot.ipxe, which appears to be loaded by snponly.efi, where changing the state of the UEFI firmware in some way that would later cause the grub load of the rhel kernel and initrd to not work correctly. Using a debug grub, I now know that grub has transferred control into the kernel image that grub loaded. Additionally, at least as verified by a simple checksum, the loaded kernel image appears to be intact at the point that grub transfers control. So I now need to see how far we get in the EFI stub in the kernel. -Lenny.
Another status update: I have debug output from the RHEL kernel before the UEFI fault: DEBUG: efi_main() 05/23/0023 14:28:43 PowerEdge R7525 - BIOS 2.11.3 A system restart is required. The system detected an exception during the UEFI pre-boot environment. ------------------------------------------------------------------------------- Type: Invalid opcode (06) Source: Software (UEFI0004) on BSP AX=0000000000000000 BX=00000000B5000000 SI=0000000063194740 DI=0000000063192090 CX=000000005493FFE8 DX=0000000000000032 R8=000000005493097E R9=0000000000000001 10=000000005769B000 11=000000005769AF18 12=000000005769B330 13=00000000555AFFA0 14=000000005769B000 15=0000000000000000 BP=0000000063E7CC20 SP=000000005769AF60 IP=0000000061BC15DA Flags=00210282 CurrentTPL = 10, Event Depth 1 LastMsg: LBRfr0 549309AC Unknown(ptdvm) +0409AC LBRto0 61BC15A0 SecurityStubDxe.efi -->RIP 61BC15DA SecurityStubDxe.efi Stack trace not available The kernel makes a few calls into the UEFI firmware this early in the boot. I think it's likely that the fault is occurring on one of those calls. Now that I have debug output from kernel, I should be able to narrow in on it. -Lenny.
(In reply to Lenny Szubowicz from comment #29) > Another status update: > > I have debug output from the RHEL kernel before the UEFI fault: > > DEBUG: efi_main() > > 05/23/0023 14:28:43 > PowerEdge R7525 - BIOS 2.11.3 > A system restart is required. The system detected an exception during the > UEFI > pre-boot environment. > ----------------------------------------------------------------------------- > -- > Type: Invalid opcode (06) Source: Software (UEFI0004) on BSP > AX=0000000000000000 BX=00000000B5000000 SI=0000000063194740 > DI=0000000063192090 > CX=000000005493FFE8 DX=0000000000000032 R8=000000005493097E > R9=0000000000000001 > 10=000000005769B000 11=000000005769AF18 12=000000005769B330 > 13=00000000555AFFA0 > 14=000000005769B000 15=0000000000000000 BP=0000000063E7CC20 > SP=000000005769AF60 > IP=0000000061BC15DA Flags=00210282 CurrentTPL = 10, Event Depth 1 > LastMsg: > > LBRfr0 549309AC Unknown(ptdvm) +0409AC > LBRto0 61BC15A0 SecurityStubDxe.efi > -->RIP 61BC15DA SecurityStubDxe.efi > Stack trace not available > > The kernel makes a few calls into the UEFI firmware this early in the boot. > I think it's likely that the fault is occurring on one of those calls. > > Now that I have debug output from kernel, I should be able to narrow in on > it. > > -Lenny. Lenny do you want us to open the RHEL bz that was closed?
(In reply to Eran Kuris from comment #30) > (In reply to Lenny Szubowicz from comment #29) > > Another status update: > > > > I have debug output from the RHEL kernel before the UEFI fault: > > > > DEBUG: efi_main() > > > > 05/23/0023 14:28:43 > > PowerEdge R7525 - BIOS 2.11.3 > > A system restart is required. The system detected an exception during the > > UEFI > > pre-boot environment. > > ----------------------------------------------------------------------------- > > -- > > Type: Invalid opcode (06) Source: Software (UEFI0004) on BSP > > AX=0000000000000000 BX=00000000B5000000 SI=0000000063194740 > > DI=0000000063192090 > > CX=000000005493FFE8 DX=0000000000000032 R8=000000005493097E > > R9=0000000000000001 > > 10=000000005769B000 11=000000005769AF18 12=000000005769B330 > > 13=00000000555AFFA0 > > 14=000000005769B000 15=0000000000000000 BP=0000000063E7CC20 > > SP=000000005769AF60 > > IP=0000000061BC15DA Flags=00210282 CurrentTPL = 10, Event Depth 1 > > LastMsg: > > > > LBRfr0 549309AC Unknown(ptdvm) +0409AC > > LBRto0 61BC15A0 SecurityStubDxe.efi > > -->RIP 61BC15DA SecurityStubDxe.efi > > Stack trace not available > > > > The kernel makes a few calls into the UEFI firmware this early in the boot. > > I think it's likely that the fault is occurring on one of those calls. > > > > Now that I have debug output from kernel, I should be able to narrow in on > > it. > > > > -Lenny. > > Lenny do you want us to open the RHEL bz that was closed? I think we can continue the diagnosis and discussion in this BZ.
Thanks for the installation logs provide in comment 28. The log shows that the "redhat" boot entry (Boot0000) appears to be correctly created and placed first in the boot order: May 22 11:54:02 kodkod01.lab.eng.tlv2.redhat.com ironic-python-agent[5425]: 2023-05-22 11:54:02.779 5425 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): efibootmgr -v -c -d /dev/sda -p 1 -w -L redhat -l \EFI\redhat\shimx64.efi execute /usr/lib/python3.9/site-packages/oslo_concurrency/processutils.py:384 May 22 11:54:02 kodkod01.lab.eng.tlv2.redhat.com ironic-python-agent[5425]: 2023-05-22 11:54:02.848 5425 DEBUG oslo_concurrency.processutils [-] CMD "efibootmgr -v -c -d /dev/sda -p 1 -w -L redhat -l \EFI\redhat\shimx64.efi" returned: 0 in 0.070s execute /usr/lib/python3.9/site-packages/oslo_concurrency/processutils.py:422 May 22 11:54:02 kodkod01.lab.eng.tlv2.redhat.com ironic-python-agent[5425]: 2023-05-22 11:54:02.876 5425 DEBUG ironic_lib.utils [-] Command stdout is: "BootCurrent: 0004 BootOrder: 0000,0005,0004,0003,0001 Boot0001* EFI RAID Disk PlaceHolder 1 PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Ctrl(0x0)/SCSI(0,0) Boot0003* Integrated NIC 1 Port 1 Partition 1 VenHw(3a191845-5f86-4e78-8fce-c4cff59f9daa) Boot0004* Integrated NIC 1 Port 2 Partition 1 VenHw(d227c733-f75f-4341-b749-4d1759ec8538) Boot0005* EFI RAID Disk PlaceHolder 1 PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Ctrl(0x0)/SCSI(1,0) Boot0000* redhat HD(1,GPT,b84b5ab4-7a90-4916-81c4-ebfbb0d98af6,0x800,0x8000)/File(\EFI\redhat\shimx64.efi) " _log /usr/lib/python3.9/site-packages/ironic_lib/utils.py:99 But then, when the problem reported in this BZ occurs, the redhat boot entry is no longer first in the boot order. Yesterday I found a way to reproduce this problem at will. 1. The system has the "redhat" Boot0000 boot entry as created by: [root@compute-0 ~]# efibootmgr -v -c -p 1 -w -L redhat -l \\EFI\\redhat\\shimx64.efi The system is currently booted using that Boot0000 boot entry. 2. I create a second boot entry that is essentially a duplicate of Boot0000, but it has a different label, "redhat1". [root@compute-0 ~]# efibootmgr -v -c -p 1 -w -L redhat1 -l \\EFI\\redhat\\shimx64.efi On this particular system, the Boot0001 boot entry is used, because that's the lowest numbered unused boot variable. I suspect that the assigned boot variable number doesn't matter. This efibootmgr command also makes Boot0001 first in the boot order: [root@compute-0 ~]# efibootmgr -v -c -p 1 -w -L redhat1 -l \\EFI\\redhat\\shimx64.efi BootCurrent: 0000 BootOrder: 0001,0000,0005,0004,0003 Boot0000* redhat HD(1,GPT,b84b5ab4-7a90-4916-81c4-ebfbb0d98af6,0x800,0x8000)/File(\EFI\redhat\shimx64.efi) Boot0003* Integrated NIC 1 Port 1 Partition 1 VenHw(3a191845-5f86-4e78-8fce-c4cff59f9daa) Boot0004* Integrated NIC 1 Port 2 Partition 1 VenHw(d227c733-f75f-4341-b749-4d1759ec8538) Boot0005* EFI RAID Disk PlaceHolder 1 PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Ctrl(0x0)/SCSI(1,0) Boot0001* redhat1 HD(1,GPT,b84b5ab4-7a90-4916-81c4-ebfbb0d98af6,0x800,0x8000)/File(\EFI\redhat\shimx64.efi) [root@compute-0 ~]# Why do I create this duplicate boot variable? I found that I had to boot the system from a different boot variable other than the one I'm going to use to reproduce the problem with. On this system, I will reproduce the problem using Boot0000, just like the OpenStack ironic deployment did. 3. I reboot the system back to the installed RHEL, but this time using the "redhat1" Boot0001 boot variable. [root@compute-0 ~]# efibootmgr BootCurrent: 0001 BootOrder: 0001,0000,0005,0004,0003 Boot0000* redhat Boot0001* redhat1 Boot0003* Integrated NIC 1 Port 1 Partition 1 Boot0004* Integrated NIC 1 Port 2 Partition 1 Boot0005* EFI RAID Disk PlaceHolder 1 4. Delete the "redhat" Boot0000, just like the OpenStack deployment does. [root@compute-0 ~]# efibootmgr -b 0 -B BootCurrent: 0001 BootOrder: 0001,0005,0004,0003 Boot0001* redhat1 Boot0003* Integrated NIC 1 Port 1 Partition 1 Boot0004* Integrated NIC 1 Port 2 Partition 1 Boot0005* EFI RAID Disk PlaceHolder 1 [root@compute-0 ~]# 5. Recreate the "redhat" boot entry and make it first in the boot order, just like the OpenStack deployment does. [root@compute-0 ~]# efibootmgr -v -c -p 1 -w -L redhat -l \\EFI\\redhat\\shimx64.efi BootCurrent: 0001 BootOrder: 0000,0001,0005,0004,0003 Boot0001* redhat1 HD(1,GPT,b84b5ab4-7a90-4916-81c4-ebfbb0d98af6,0x800,0x8000)/File(\EFI\redhat\shimx64.efi) Boot0003* Integrated NIC 1 Port 1 Partition 1 VenHw(3a191845-5f86-4e78-8fce-c4cff59f9daa) Boot0004* Integrated NIC 1 Port 2 Partition 1 VenHw(d227c733-f75f-4341-b749-4d1759ec8538) Boot0005* EFI RAID Disk PlaceHolder 1 PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Ctrl(0x0)/SCSI(1,0) Boot0000* redhat HD(1,GPT,b84b5ab4-7a90-4916-81c4-ebfbb0d98af6,0x800,0x8000)/File(\EFI\redhat\shimx64.efi) [root@compute-0 ~]# As expected, the Boot0000 entry is reused and made first in the boot order. 6. Reboot the system. The expectation is that Boot0000 will persist as first in the boot order and that the system will boot from Boot0000. But instead, after the system reboots and you login, you discover that Boot0001 is first in the boot order and the system booted using Boot0001. [root@compute-0 ~]# efibootmgr BootCurrent: 0001 BootOrder: 0001,0000,0005,0004,0003 Boot0000* redhat Boot0001* redhat1 Boot0003* Integrated NIC 1 Port 1 Partition 1 Boot0004* Integrated NIC 1 Port 2 Partition 1 Boot0005* EFI RAID Disk PlaceHolder 1 This is analogous to what happened during the OpenStack ironic deployment. Except for OpenStack, the deployment boot was via PXE and there was no functionally duplicate "redhat1" that was ahead of the PXE boot entries in the boot order. Therefore, on reboot, the system attempted the PXE boot that loaded snponly.efi before eventually trying "redhat". You could now go back and repeat the steps from step 4, and keep reproducing the problem. You could also interrupt the reboot via F11 on the iDRAC vga console and examine the boot order before grub runs and boots the kernel. You'll see that the boot order is incorrect at that point. So that leaves the questions: 1. Did efibootmgr and the kernel fail to change the boot order, even though they appeared to successfully do? Or: 2. Did the firmware fail to make the change and didn't report an error to the kernel? One way to answer that question is to do the equivalent of the efibootmgr operations using the EFI shell command bcfg from the EFI shell on that system and test whether it's reproducible without the use of the RHEL kernel or efibootmgr. I don't see that this system provides a way to boot to the EFI shell. Perhaps there is some BIOS setup option that I didn't notice. So, I will ask Dell about this. However, my suspicion is that it's more likely that this is a UEFI firmware issue on this system. Regardless of the root cause, this non-persistent boot order problem is the first problem that exposes the second problem. The second problem is that a failed PXE boot of snponly.efi causes the booting RHEL kernel to fault in EFI firmware. I'll provide more of an explanation of why that is occurring in a subsequent comment. -Lenny.
Stuart, is there a way to enable booting to the EFI shell on the PowerEdge R7525? See comment 32 for why I'm asking. -Lenny.
(In reply to Lenny Szubowicz from comment #33) > Stuart, is there a way to enable booting to the EFI shell on the PowerEdge > R7525? > No, Dell systems (generally) don't have EFI shell in BIOS. I put an EFI shell binary on a USB drive (either real or virtual via iDRAC) and boot to that when I need EFI shell.
(In reply to Stuart Hayes from comment #34) > (In reply to Lenny Szubowicz from comment #33) > > Stuart, is there a way to enable booting to the EFI shell on the PowerEdge > > R7525? > > > > No, Dell systems (generally) don't have EFI shell in BIOS. I put an EFI > shell binary on a USB drive (either real or virtual via iDRAC) and boot to > that when I need EFI shell. Ok, I can do the same from the local block storage device. Thanks for the quick response. -Lenny.
In comment 32 I described problem 1 and how to reproduce it. If we hadn't encounter problem 1, a reboot of the OpenStack deployed ironic system would have successfully booted directly to that deployed RHEL system via the "redhat" boot entry without encountering problem 2, the UEFI fault. However, given problem 1, other boot entries get tried before the "redhat" one. In such a case, the PXE boot of snponly.efi appears to set the stage for the RHEL kernel to crash in UEFI firmware. This was the state of the debug at the end of comment 29, where I reported that grub successfully transferred control to the kernel and that the fault occurred somewhere in the kernel. Problem 2, the UEFI firmware fault, occurs when the kernel calls the EFI ExitBootServices() call. It's very likely that the problem is in UEFI firmware on this system. Moreover, it's pretty much impossible for Red Hat to make any further progress towards determining the root cause of the fault in UEFI firmware without assistance from UEFI firmware experts from Dell. The parameters to the ExitBootServices() call are not complex. It's extremely unlikely that the running of snponly.efi could have affected what to kernel does on this call. How do I know that it's ExitBootServices()? First, I used an instrumented kernel with additional debug messages. DEBUG: efi_main, entry DEBUG: efi_main, efi_dxe_table=00000000576e0170 DEBUG: efi_relocate_kernel, entry DEBUG: efi_get_memory_map, entry DEBUG: efi_relocate_kernel, relocated to=0000000004c00000 DEBUG: efi_main, checking secure boot setting DEBUG: efi_main, enabling reset attack mitigation DEBUG: efi_main, getting random seed DEBUG: efi_main, getting tpm2 event log DEBUG: efi_main, setting up graphics DEBUG: efi_main, setting up pci DEBUG: efi_main, setting up quirks DEBUG: efi_main, getting ready to exit boot DEBUG: efi_exit_boot_services, entry DEBUG: efi_get_memory_map, entry 05/24/0023 13:27:27 PowerEdge R7525 - BIOS 2.11.3 A system restart is required. The system detected an exception during the UEFI pre-boot environment. ------------------------------------------------------------------------------- Type: Invalid opcode (06) Source: Software (UEFI0004) on BSP AX=0000000000000000 BX=0000000000000000 SI=0000000063194740 DI=0000000063192090 CX=000000005493FFE8 DX=0000000000000032 R8=000000005493097E R9=0000000000000001 10=000000005769ADC0 11=000000005769ACD8 12=000000005769B0F0 13=00000000555AFFA0 14=000000005769ADC0 15=0000000000000000 BP=0000000063E7CC20 SP=000000005769AD20 IP=0000000061BC1529 Flags=00210297 CurrentTPL = 10, Event Depth 1 LastMsg: LBRfr0 549309AC Unknown(ptdvm) +0409AC LBRto0 61BC1520 SecurityStubDxe.efi -->RIP 61BC1529 SecurityStubDxe.efi Stack trace not available Stack Dump: 5769AD20 00000000006C646E ndl..... 5769AD28 0000000000000000 ........ 5769AD30 0000000000000000 ........ 5769AD38 000000005769ADC0 ..iW.... 5769AD40 0000000063192090 . .c.... 5769AD48 0000000055546471 qdTU.... 5769AD50 0000000063E7CC30 0..c.... 5769AD58 0000000000000000 ........ 5769AD60 0000000000002700 .'...... 5769AD68 00000000B5080000 ........ 5769AD70 0000000000000000 ........ 5769AD78 000000005769AD98 ..iW.... 5769AD80 0000000063192020 .c.... ... Secondly, since I'm using EFI boot-time services to print my progress messages, I can't use that mechanism after the call to ExitBootServices(). Additionally, I can't use it immediately before the ExitBootServices() after the required preceding call to GetMemoryMap(). However, as a hack, I can comment out just the call to ExitBootServices(). The kernel will continue booting without encountering the UEFI fault, display normal kernel boot messages, and manage to get a decent way into the RHEL boot. So this proves that the ExitBootServices() call triggers the UEFI fault. Thirdly, there is a plausible rationale why actions taken by snponly.efi could easily affect the actions taken by the UEFI firmware to handle the ExitBootServices() call made by the RHEL kernel. ExitBootServices() is responsible for shutting down any active UEFI drivers and protocols that might be active, including ones that were started by snponly.efi. [Note that "snponly.efi" also loads boot.ipxe, which then attempts to load its configuration file over the network, but fails to find it.] So snponly.efi has directly and indirectly caused the activation and usage of various network related services. And these have to be shutdown at ExitBootServices() time. Lastly, it's conceivable that the root cause bug is somewhere in snponly.efi or boot.ipxe. It's possible that they destroyed some state that only gets tripped over when UEFI firmware tries to shut things down during the ExitBootServices() call. Thus the UEFI firmware is just an innocent victim. But it's impossible for me to diagnose what the UEFI firmware is tripping over when it faults. I really need Dell's help to go further. If Dell has doubts that grub and the kernel are innocent, it would be possible to write a small UEFI app that just calls GetMemoryMap() and ExitBootServices(). This should encounter the same problem by running it directly from the EFI shell or as an EFI boot entry. This should demonstrate then problem without grub or the kernel in the picture at all. -Lenny.
I'll create an issue here and get someone from our firmware team to help.
Hi Stuart, Any updates from Dell?
I have created an issue internally and passed it to a firmware engineer, but I haven't heard anything yet.
The latest upstream kernel, 6.4.0-rc4, also encounters that fault in UEFI firmware on the ExitBootServices() call if snponly.efi (et. al.) run first. DEBUG: efi_main, entry DEBUG: efi_main, efi_dxe_table=0x00000000576e0170 DEBUG: efi_relocate_kernel, entry DEBUG: efi_get_memory_map, trying alloc of 3448. bytes for memory map DEBUG: efi_relocate_kernel, relocated to=0x0000000005800000 DEBUG: efi_main, checking secure boot setting DEBUG: efi_main, enabling reset attack mitigation DEBUG: efi_main, getting random seed DEBUG: efi_main, getting tpm2 event log DEBUG: efi_main, setting up graphics DEBUG: efi_main, setting up pci DEBUG: efi_main, setting up quirks DEBUG: efi_main, getting ready to exit boot DEBUG: efi_exit_boot_services, entry DEBUG: efi_get_memory_map, trying alloc of 3592. bytes for memory map 06/01/0023 02:33:50 PowerEdge R7525 - BIOS 2.11.3 A system restart is required. The system detected an exception during the UEFI pre-boot environment. ------------------------------------------------------------------------------- Type: Invalid opcode (06) Source: Software (UEFI0004) on BSP AX=0000000000000000 BX=0000000000000000 SI=000000006318E140 DI=0000000063193090 CX=000000005493FFE8 DX=0000000000000032 R8=000000005493097E R9=0000000000000001 10=000000005769AE30 11=000000005769AD48 12=000000005769B110 13=00000000555AFFA0 14=000000005769AE30 15=0000000000000000 BP=0000000063E7CC20 SP=000000005769AD90 IP=0000000061C0A729 Flags=00210297 CurrentTPL = 10, Event Depth 1 LastMsg: LBRfr0 549309AC Unknown(ptdvm) +0409AC LBRto0 61C0A720 SecurityStubDxe.efi -->RIP 61C0A729 SecurityStubDxe.efi Stack trace not available Stack Dump: 5769AD90 00000000006C646E ndl..... 5769AD98 0000000000000000 ........ 5769ADA0 0000000000000000 ........ 5769ADA8 000000005769AE30 0.iW.... 5769ADB0 0000000063193090 .0.c.... 5769ADB8 0000000055546471 qdTU.... 5769ADC0 0000000063E7CC30 0..c.... 5769ADC8 0000000000000000 ........ 5769ADD0 0000000000002700 .'...... 5769ADD8 00000000B5080000 ........ 5769ADE0 0000000000000000 ........ 5769ADE8 000000005769AE08 ..iW.... 5769ADF0 0000000063193020 0.c.... 5769ADF8 0000000055545F6A j_TU.... 5769AE00 0000000063193090 .0.c.... 5769AE08 0000000000000100 ........ 5769AE10 0000000000000000 ........ 5769AE18 000000005769AE38 8.iW.... 5769AE20 0000000000000000 ........ 5769AE28 000000005554333E >3TU.... ... The upstream kernel supports the kernel command line parameter disable_early_pci_dma, which results in a call to efi_pci_disable_bridge_busmaster() before the call to ExitBootServices(). Setting disable_early_pci_dma did not prevent the above fault. -Lenny.
I have set up an R7525 with 2x EPYC 7702 CPUs. I installed RHEL9.2 on an NVMe drive on the system, and I rearranged the boot order to put PXE boot first. It boots to iPXE, which exits, and then it boots to RHEL9.2 successfully, so I am not able to reproduce this yet. What version of iPXE are you booting to? Any other idea what might be different that causes yours to fail while mine doesn't? Our firmware team is asking for BIOS logs--I'm waiting for instructions on how to get what they are looking for. They also asked if the systems that don't have EPYC 7702 CPUs will fail if they are running the same BIOS version that's failing with the EPYC 7702s (the tests posted earlier show that other CPUs were tested with older BIOS versions).
I've reproduced the UEFI fault with a minimal gnu-efi program that calls ExitBootServices(). The boot order is: 1. PXE that loads snponly.efi, which then loads http://192.0.22.1:8088/boot.ipxe 2. Minimal ExitBootServices test: HD(1,GPT,b84b5ab4-7a90-4916-81c4-ebfbb0d98af6,0x800,0x8000)/File(\efi\redhat\exit-boot.efi) Booting from PXE Device 2: Integrated NIC 1 Port 2 Partition 1 >>Start PXE over IPv4. Station IP address is 192.0.22.24 Server IP address is 192.0.22.1 NBP filename is snponly.efi NBP filesize is 219680 Bytes Downloading NBP file... NBP file downloaded successfully. iPXE initialising devices...ok iPXE 1.0.0+ (4bd064de) -- Open Source Network Boot Firmware -- http://ipxe.org Features: DNS HTTP HTTPS iSCSI TFTP VLAN AoE EFI Menu net0: 68:05:ca:bf:e3:83 using NII on NII-0x63168d20 (open) [Link:down, TX:0 TXE:0 RX:0 RXE:0] [Link status: Unknown (http://ipxe.org/1a086194)] Configuring (net0 68:05:ca:bf:e3:83).................. ok net0: 192.0.22.24/255.255.255.0 gw 192.0.22.1 net0: fe80::6a05:caff:febf:e383/64 Next server: 192.0.22.1 Filename: http://192.0.22.1:8088/boot.ipxe http://192.0.22.1:8088/boot.ipxe... ok boot.ipxe : 758 bytes [script] Attempting to boot from MAC 68-05-ca-bf-e3-83 pxelinux.cfg/68-05-ca-bf-e3-83... No such file or directory (http://ipxe.org/2d0c618e) PXE boot failed! No configuration found for any of the present NICs. Press any key to reboot... Could not boot image: Connection timed out (http://ipxe.org/4c22e092) No more network devices Boot Failed: PXE Device 2: Integrated NIC 1 Port 2 Partition 1 Booting from RAID Controller in SL 8: Minimal ExitBootServices test This is a minimal test of the ExitBootServices() call. EFI memory map requires 2640. bytes Allocating a 4176. byte buffer for the EFI memory map This program will loop forever if GetMemoryMap() and ExitBootServices() succeed 06/02/0023 03:35:58 PowerEdge R7525 - BIOS 2.11.3 A system restart is required. The system detected an exception during the UEFI pre-boot environment. ------------------------------------------------------------------------------- Type: Invalid opcode (06) Source: Software (UEFI0004) on BSP AX=0000000063192088 BX=0000000000000000 SI=0000000063194440 DI=0000000063397A98 CX=000000005493FFE8 DX=0000000000000032 R8=000000005493097E R9=0000000000000001 10=0000000055546309 11=000000005769B578 12=0000000000000000 13=0000000000000000 14=000000005769B660 15=0000000000000000 BP=000000006318E120 SP=000000005769B580 IP=00000000549B200C Flags=00010A17 CurrentTPL = 10, Event Depth 1 LastMsg: LBRfr0 55543242 Unknown(fgwck) +003242 Intel(R) 40GbE 4.7.22 LBRto0 5493097E Unknown(jfxmv) -->RIP 549B200C Unknown(sxhzr) +00000C Stack trace not available Stack Dump: 5769B580 0000000000000000 ........ 5769B588 0000000000000009 ........ 5769B590 0000000063194440 @D.c.... 5769B598 000000005769B608 ..iW.... 5769B5A0 0000000000000000 ........ 5769B5A8 0000000063E7CC20 ..c.... 5769B5B0 0000000000000000 ........ 5769B5B8 00000000555AFFA0 ..ZU.... 5769B5C0 00000000549BEF30 0..T.... 5769B5C8 0000000067BCE020 ..g.... 5769B5D0 0000000000000000 ........ 5769B5D8 000000005769B660 `.iW.... 5769B5E0 0000000063192090 . .c.... 5769B5E8 0000000055546471 qdTU.... 5769B5F0 0000000063E7CC30 0..c.... 5769B5F8 0000000000000000 ........ 5769B600 0000000000002700 .'...... 5769B608 00000000B5080000 ........ 5769B610 0000000000000000 ........ 5769B618 000000005769B638 8.iW.... 5769B620 0000000063192020 .c.... 5769B628 0000000055545F6A j_TU.... 5769B630 0000000063192090 . .c.... 5769B638 0000000000000100 ........ 5769B640 0000000000000000 ........ 5769B648 000000005769B668 h.iW.... 5769B650 0000000000000000 ........ 5769B658 000000005554333E >3TU.... 5769B660 0000000000032000 . ...... 5769B668 0000000000000000 ........ 5769B670 0000000000000001 ........ 5769B678 0000000000000000 ........ 5769B680 00000000549BEF30 0..T.... 5769B688 0000000055540683 ..TU.... 5769B690 00000000549BEF30 0..T.... 5769B698 0000000000000000 ........ 5769B6A0 0000000063A81000 ...c.... 5769B6A8 0000000063E9B730 0..c.... 5769B6B0 0000000000000000 ........ 5769B6B8 00000000576AFC51 Q.jW.... 5769B6C0 00000000549BEF30 0..T.... 5769B6C8 000000005769B8E0 ..iW.... 5769B6D0 000000005FEA9E20 .._.... 5769B6D8 000000005769E7D0 ..iW.... 5769B6E0 0000000080800000 ........ 5769B6E8 0000000063AD2520 %.c.... 5769B6F0 00000000576E6DA0 .mnW.... 5769B6F8 0000000063AD2520 %.c.... 5769B700 0000000020200001 .. .... ... This eliminates grub and the RHEL kernel from being a cause of the UEFI fault. That means that the problem is either in the UEFI firmware, or in snponly.efi, or in something the snponly.efi loads and runs. For reference, I'll provide the source for exit-boot.efi as an attachment. -Lenny.
(In reply to Stuart Hayes from comment #45) > I have set up an R7525 with 2x EPYC 7702 CPUs. I installed RHEL9.2 on an > NVMe drive on the system, and I rearranged the boot order to put PXE boot > first. It boots to iPXE, which exits, and then it boots to RHEL9.2 > successfully, so I am not able to reproduce this yet. What version of iPXE > are you booting to? Any other idea what might be different that causes > yours to fail while mine doesn't? > > Our firmware team is asking for BIOS logs--I'm waiting for instructions on > how to get what they are looking for. They also asked if the systems that > don't have EPYC 7702 CPUs will fail if they are running the same BIOS > version that's failing with the EPYC 7702s (the tests posted earlier show > that other CPUs were tested with older BIOS versions). Stuart, See the console log in comment 46 for some info about what happens during the PXE boot that sets up the failure in ExitBootServices(). The snponly.efi and the boot.ipxe script that it loads via http can be found here: https://people.redhat.com/~lszubowi/rhel9/.bz2183793/snponly.efi https://people.redhat.com/~lszubowi/rhel9/.bz2183793/boot.ipxe -Lenny.
Created attachment 1968458 [details] Source for exit-boot.efi used in simpler reproducer of UEFI fault
Created attachment 1968459 [details] Makefile for exit-boot.efi
Stuart, The exit-boot.efi image is also available here: https://people.redhat.com/~lszubowi/rhel9/.bz2183793/exit-boot.efi -Lenny.
Created attachment 1968460 [details] Source for exit-boot.efi (exit-boot.c) used in simpler reproducer of UEFI fault
I failed to mention in comment 46 that if I boot directly to: Minimal ExitBootServices test: HD(1,GPT,b84b5ab4-7a90-4916-81c4-ebfbb0d98af6,0x800,0x8000)/File(\efi\redhat\exit-boot.efi) the exit-boot.efi program enters its infinite loop after the call to ExitBootServices(). This shows that the actions of a PXE boot of snponly.efi set up the conditions for the UEFI fault on the ExitBootServices() call. -Lenny.
Hi Stuart, > Our firmware team is asking for BIOS logs--I'm waiting for instructions on how to get what they are looking for. They also asked if the systems that don't have EPYC 7702 CPUs will fail if they are running the same BIOS version that's failing with the EPYC 7702s (the tests posted earlier show that other CPUs were tested with older BIOS versions). Previously, before updating to `the 2.10.2` (and later to `the 2.11.3`) BIOS version, I tried other versions on the problematic platform (`2.8.4` and `2.9.3`) and still had issues. I can ask the hardware owners of the other (working) platforms to upgrade their BIOS firmware, but I am unsure when/if this will be done.
I'm pretty confident that RHOSP 17.1 would now be able to install and successfully reboot the overcloud image on the particular Dell PowerEdge R7525 where this problem was encountered. In comment 32 I described a reproducible problem on this particular system where the change of EFI boot order, done by efibootmgr, does not appear to be persistent on reboot. There is no visible error on the relevant efibootmgr commands, yet on reboot, the boot order is not what it was set to. This is problem #1 of the failed RHOSP overcloud provisioning. Problem #1 can be avoided by changing a BIOS setting: System Setup -> System BIOS -> Boot Settings -> Hard-disk Drive Placeholder <Enabled> Boot order problem is reproducible at will <Disabled> Boot order problem is not reproducible Almost for sure this is a firmware problem. The "Hard-disk Drive Placeholder" on this platform looks like the kind of boot variable that one typically sees on other platforms for any block device that has an EFI system partition. It appears that under some circumstances on reboot the firmware re-evaluates this placeholder and that this reevaluation can alter the boot order. In any event, disabling the "Hard-disk Driver Placeholder" feature in BIOS boot settings avoids the reproducible bootorder problem. Now, onto problem #2: the UEFI fault in ExitBootServices() after a PXE boot of snponly.efi. If problem #1 doesn't occur, then the reboot of the overcloud image is persistently first in the boot order. Therefore, on reboot there is no intervening attempt to PXE load snponly.efi before the boot of the overcloud image and problem #2 is not encountered. For sure, the underlying problem #2 is still present. Almost for sure problem #2 is entirely separate from problem #1. As I wrote earlier in comment 46, problem #2 is either in UEFI firmware, or possibly in snponly.efi. I would welcome a test of my claim that the RHOSP install will now succeed on the system in question. I've left "Hard-disk Drive Placeholder" disabled. -Lenny.
Sorry I am not able to work on this full time. I have an R7525, but it's in a shared lab to which i don't have physical access, so setting up a PXE server that I can control would be difficult. I have set it up HTTP boot to snponly.efi... it boots to that, exits, then will boot to your test program or RHEL9.2, no issues. It will also boot to the PXE server in the lab, which has ipxe, and it will boot to that, show a menu, exit, and then boot to your test program or RHEL9--but the ipxe in the lab is "iPXE 1.21.1+ (g323af) -- Open Source Network Boot Firmware -- http://ipxe.org". A firmware engineer here suggested that the RSOD looks like ipxe is installing an exit boot services event handler, and not uninstalling it before it exits. If I could reproduce this here, I'd debug ipxe and see if that's happening.
I have a few more observations about problem #1, the boot order issue. I didn't want to obscure my bottom line conclusion about problem #1 by including this in comment 54. So here it is separately. Near the end of comment 32, I speculated that the boot order problem might be investigated and reproduced using the bcfg command in the EFI shell. Essentially, bcfg is the equivalent of efibootmgr, but it operates entirely at the EFI firmware level. Indeed, I did try to replicate the problem using bcfg, but I was unable to do so despite considerable effort. But, throughout, I could without fail boot RHEL and use efibootmgr to reproduce the issue at will. (This was before I stumbled upon disabling Hard-disk Drive Placeholder.) It still puzzles me why I couldn't reproduce the problem with bcfg. Near the end of the install, the RHOSP installation checks for a prior "redhat" boot entry. It does so because it needs to delete it before it can create a new "redhat" boot entry for the installed image. In the installation logs: May 22 11:54:02 kodkod01.lab.eng.tlv2.redhat.com ironic-python-agent[5425]: 2023-05-22 11:54:02.270 5425 DEBUG oslo_concurrency.processutils [-] CMD "efibootmgr -v" returned: 0 in 0.040s execute /usr/lib/python3.9/site-packages/oslo_concurrency/processutils.py:422 May 22 11:54:02 kodkod01.lab.eng.tlv2.redhat.com ironic-python-agent[5425]: 2023-05-22 11:54:02.301 5425 DEBUG ironic_lib.utils [-] Command stdout is: "BootCurrent: 0004 BootOrder: 0005,0004,0003,0000,0001 Boot0000* redhat HD(1,GPT,ed4c38ad-d346-45ea-8937-59078729f832,0x800,0x8000)/File(\EFI\redhat\shimx64.efi) Boot0001* EFI RAID Disk PlaceHolder 1 PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Ctrl(0x0)/SCSI(0,0) Boot0003* Integrated NIC 1 Port 1 Partition 1 VenHw(3a191845-5f86-4e78-8fce-c4cff59f9daa) Boot0004* Integrated NIC 1 Port 2 Partition 1 VenHw(d227c733-f75f-4341-b749-4d1759ec8538) Boot0005* EFI RAID Disk PlaceHolder 1 PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Ctrl(0x0)/SCSI(1,0) " _log /usr/lib/python3.9/site-packages/ironic_lib/utils.py:99 Shows that Boot0000 is a prior "redhat" boot variable, which is in 4th position in the boot order. The efibootmgr command removes it by referencing the boot variable number, i.e. the hexadecimal 0 in Boot0000. May 22 11:54:02 kodkod01.lab.eng.tlv2.redhat.com ironic-python-agent[5425]: 2023-05-22 11:54:02.590 5425 DEBUG oslo_concurrency.processutils [-] CMD "efibootmgr -b 0000 -B" returned: 0 in 0.062s execute /usr/lib/python3.9/site-packages/oslo_concurrency/processutils.py:422 The analogous bcfg command references the boot variable by its position in the boot order: Shell> bcfg boot rm 4 After deleting the prior "redhat" boot variable, the installation creates a new one: May 22 11:54:02 kodkod01.lab.eng.tlv2.redhat.com ironic-python-agent[5425]: 2023-05-22 11:54:02.779 5425 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): efibootmgr -v -c -d /dev/sda -p 1 -w -L redhat -l \EFI\redhat\shimx64.efi execute /usr/lib/python3.9/site-packages/oslo_concurrency/processutils.py:384 May 22 11:54:02 kodkod01.lab.eng.tlv2.redhat.com ironic-python-agent[5425]: 2023-05-22 11:54:02.848 5425 DEBUG oslo_concurrency.processutils [-] CMD "efibootmgr -v -c -d /dev/sda -p 1 -w -L redhat -l \EFI\redhat\shimx64.efi" returned: 0 in 0.070s execute /usr/lib/python3.9/site-packages/oslo_concurrency/processutils.py:422 May 22 11:54:02 kodkod01.lab.eng.tlv2.redhat.com ironic-python-agent[5425]: 2023-05-22 11:54:02.876 5425 DEBUG ironic_lib.utils [-] Command stdout is: "BootCurrent: 0004 BootOrder: 0000,0005,0004,0003,0001 Boot0001* EFI RAID Disk PlaceHolder 1 PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Ctrl(0x0)/SCSI(0,0) Boot0003* Integrated NIC 1 Port 1 Partition 1 VenHw(3a191845-5f86-4e78-8fce-c4cff59f9daa) Boot0004* Integrated NIC 1 Port 2 Partition 1 VenHw(d227c733-f75f-4341-b749-4d1759ec8538) Boot0005* EFI RAID Disk PlaceHolder 1 PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Ctrl(0x0)/SCSI(1,0) Boot0000* redhat HD(1,GPT,b84b5ab4-7a90-4916-81c4-ebfbb0d98af6,0x800,0x8000)/File(\EFI\redhat\shimx64.efi) The analogous bcfg command is: Shell> bcfg boot add 0 fs0:\efi\redhat\shimx64.efi redhat Target = 0000. bcfg: Add Boot0000 as 0 Shell> The only thing that occurs to me as a difference between bcfg and efibootmgr is that although both use EFI SetVariable(), bcfg uses the Boot Services version of that call while efibootmgr uses the run-time version of EFI SetVariable(). But I have no idea why that would matter particularly regarding the "Hard-disk Drive Placeholder" setting. The enabling of "Hard-disk Drive Placeholder" appeared to have no affect on bcfg, but it certainly does on the actions of efibootmgr. Once I stumbled onto the affect of "Hard-disk Drive Placeholder" on efibootmgr, I went through a few cycles of: disabling, testing efibootmgr, enabling, testing efibootmgr, disabling, testing efibootmgr, etc. to gain confidence that there was a causal linkage. -Lenny.
(In reply to Stuart Hayes from comment #55) > Sorry I am not able to work on this full time. I have an R7525, but it's in > a shared lab to which i don't have physical access, so setting up a PXE > server that I can control would be difficult. I have set it up HTTP boot to > snponly.efi... it boots to that, exits, then will boot to your test program > or RHEL9.2, no issues. It will also boot to the PXE server in the lab, > which has ipxe, and it will boot to that, show a menu, exit, and then boot > to your test program or RHEL9--but the ipxe in the lab is "iPXE 1.21.1+ > (g323af) -- Open Source Network Boot Firmware -- http://ipxe.org". > > A firmware engineer here suggested that the RSOD looks like ipxe is > installing an exit boot services event handler, and not uninstalling it > before it exits. If I could reproduce this here, I'd debug ipxe and see if > that's happening. Stuart, That's a plausible explanation for the EFI fault on the ExitBootServices() call and would be a bug in snponly.efi. On the separate boot order issue, I wonder if you could reproduce it on your system. You would need to enable the BIOS boot setting for the hard disk placeholder, and use efibootmgr roughly as described in comment 32. At this point, I think that's entirely a firmware issue. -Lenny.
Thanks a lot, Lenny. I will work on deploying OpenStack with the BIOS feature turned off.
I can confirm that I could provision and boot a single-node deployment by turning off the BIOS attribute mentioned by Lenny (on identical hardware that had the same issue described in the bug). ``` [root@compute-1 ~]# cat /proc/cpuinfo | grep 'model name' | tail -1 model name : AMD EPYC 7702 64-Core Processor [root@compute-1 ~]# cat /etc/redhat-release Red Hat Enterprise Linux release 9.2 (Plow) [root@compute-1 ~]# cat /etc/rhosp-release Red Hat OpenStack Platform release 17.1.0 Beta (Wallaby) ```
Lenny-- I don't see the "EFI RAID Disk PlaceHolder" whether I have hard disk drive placeholder enabled or not in BIOS, and the boot order changed for me when I booted to a 2nd redhat entry, then deleted and recreated the first, as described in comment 32. How many drives do you have in your system, and what are the connected to? I have a single NVMe drive in mine, which is the one that has RHEL9 on it.
Hey Stuart, I can answer it for you. The mentioned host has a single PERC H745 controller with two drives attached to it: * SSD: Intel SSDSC2KB960G8R * HDD: Seagate ST2000NM0155 We deploy the operating system on top of the SD.
[root@compute-0 ~]# lshw -class disk -class storage *-sas description: Serial Attached SCSI controller product: MegaRAID Tri-Mode SAS3516 vendor: Broadcom / LSI physical id: 0 bus info: pci@0000:01:00.0 logical name: scsi0 version: 01 width: 64 bits clock: 33MHz capabilities: sas pm msi pciexpress msix bus_master cap_list rom configuration: driver=megaraid_sas latency=0 resources: irq:95 memory:bac00000-bacfffff memory:bad00000-badfffff memory:bb100000-bb1fffff ioport:1000(size=256) *-disk:0 description: ATA Disk product: SSDSC2KB960G8R physical id: 2.0.0 bus info: scsi@0:2.0.0 logical name: /dev/sdb version: DL69 serial: BTYF015304G5960CGN size: 894GiB (960GB) capabilities: gpt-1.00 partitioned partitioned:gpt configuration: ansiversion=6 guid=433a6cb3-dd15-4061-9a6b-06042c0ffb9c logicalsectorsize=512 sectorsize=4096 *-disk:1 description: SCSI Disk product: ST2000NM0155 vendor: SEAGATE physical id: 2.1.0 bus info: scsi@0:2.1.0 logical name: /dev/sda version: DT34 serial: ZC23HJ9X size: 1863GiB (2TB) capabilities: 7200rpm configuration: ansiversion=6 logicalsectorsize=512 sectorsize=512 *-sata description: SATA controller product: FCH SATA Controller [AHCI mode] vendor: Advanced Micro Devices, Inc. [AMD] physical id: 0 bus info: pci@0000:c3:00.0 version: 51 width: 32 bits clock: 33MHz capabilities: sata pm pciexpress msi ahci_1.0 bus_master cap_list configuration: driver=ahci latency=0 resources: irq:225 memory:a0000000-a00007ff
Created attachment 1969647 [details] EFI boot menu when BIOS boot setting hard disk placeholder is enabled
Created attachment 1969648 [details] EFI boot menu when BIOS boot setting hard disk placeholder is disabled
(In reply to Stuart Hayes from comment #60) > Lenny-- > I don't see the "EFI RAID Disk PlaceHolder" whether I have hard disk drive > placeholder enabled or not in BIOS, and the boot order changed for me when I > booted to a 2nd redhat entry, then deleted and recreated the first, as > described in comment 32. > How many drives do you have in your system, and what are the connected to? > I have a single NVMe drive in mine, which is the one that has RHEL9 on it. Hi Stuart, I've created two attachments with screen shots of the EFI boot menu on Vadim's R7525, one with hard disk placeholder enabled (comment 63) and the other with it disabled (comment 64). -Lenny.
Hi Vadim I've made some suggested edits - see below. Do you have a Dell reference guide we can link to, or should that reference be vague (e.g. see the reference guide for your hardware)? === Doc Type: Known issue Overcloud node provisioning fails for NFV deployments on AMD platforms in UEFI boot mode on Red Hat OpenStack Platform 17.1, when using the following BIOS configuration: * Boot Mode: UEFI * Hard-disk Drive Placeholder: Enabled Workaround: Set `Hard-disk Drive Placeholder` to `Disabled`. For information on how to assess each BIOS attribute for your NFV deployment on AMD platforms in UEFI boot mode, see the reference guide for your hardware. === Thanks I/