Bug 1608955

Summary: EFI stub: ERROR: Failed to alloc highmem for initrd
Product: Red Hat Enterprise Linux 7 Reporter: Pablo Iranzo Gómez <pablo.iranzo>
Component: kernelAssignee: Lenny Szubowicz <lszubowi>
kernel sub component: UEFI QA Contact: Erico Nunes <ernunes>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: aguetta, ahs3, antonio.gianfreda, bhu, dhoward, ealcaniz, fsoppels, lszubowi, mmilgram, nmurray, pablo.iranzo, pbrobinson, prarit
Version: 7.5Keywords: Reopened, ZStream
Target Milestone: rc   
Target Release: 7.7   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-3.10.0-958.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1643359 1643361 (view as bug list) Environment:
Last Closed: 2019-08-06 12:08:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1612033, 1614004, 1643359, 1643361    
Attachments:
Description Flags
Failed node with test kernel
none
Screenshot for good node
none
Patch 0 of 3 submitted to rhkernel-list for review (cover letter)
none
Patch 1 of 3 submitted to rhkernel-list for review
none
Patch 2 of 3 submitted to rhkernel-list for review
none
Patch 3 of 3 submitted to rhkernel-list for review none

Description Pablo Iranzo Gómez 2018-07-26 14:32:13 UTC
Description of problem:

During OpenStack introspection phase over UEFI customer gets error message:

UEFI stub: ERROR: Failed to alloc highmem for initrd

Boot using legacy mode works fine

Kernel is using OSP13 images with kernel 3.10.0-862

Comment 9 Irina Petrova 2018-07-31 12:47:40 UTC
Customer have tested with the following kernels:

The 7.5 kernel is: 3.10.0-862  -- NO GO
The 7.3 kernel is: 3.10.0-514  -- NO GO

According to BZ 1387689, in RHEL7.3 we have a fix in version:
kernel-3.10.0-520.el7

I asked if they could test that one as well.

Comment 10 Lenny Szubowicz 2018-07-31 14:41:04 UTC
The 'EFI stub: ERROR: Failed to alloc highmem for initrd' occurs at such an early phase of the boot that the kernel RAID driver is not an active component of the system at that point. In fact it hasn't even been loaded at that point. It's likely that its only impact is that its inclusion made the initramfs/initrd image larger.

The fix associated with BZ-1387689 is still present in the RHEL 7.5 kernel. Moreover, a failure related to that fix is not likely to result in the 'Failed to alloc highmem for initrd' error.

I don't have the environment to reproduce this problem. Therefore I'm going to have to build a diagnostic kernel for you to try. I can essentially use what I did for comment 50 in BZ 1387689 for this problem report.

Please let me know if this is workable.

                              -Lenny.

Comment 11 Lenny Szubowicz 2018-07-31 18:57:28 UTC
In looking at the relevant code, I have a guess as to what is going on:

  1. OSP UEFI iPXE boot uses the kernel's EFI image entry point and relies on
     routine handle_ramdisks() in the EFI stub portion of the kernel to load
     the initrd. Routine handle_ramdisks() uses EFI boot-time services to load
     the initrd.

  2. Routine handle_ramdisks() in RHEL 7.x limits its search for sufficient
     free memory for the initrd to memory below 2 GB even if the platform
     EFI firmware supports loading above 2 GB or above 4 GB. The upstream
     kernel has patches to support loading the initrd above 2 GB and 4 GB if
     the firmware supports this.

  3. My guess is that your initramfs/initrd just got sufficiently large
     such that there is insufficient contiguous free memory below 2 GB
     for this initrd on the systems in question, but there is likely more than
     sufficient free memory above 4 GB.

I'm going to see how difficult it will be to include these upstream patches into my diagnostic kernel, or minimally, make sure that I have code to check for this situation.

                                    -Lenny.

Comment 13 Lenny Szubowicz 2018-08-02 01:58:22 UTC
I have a diagnostic kernel available for this problem:

http://people.redhat.com/~lszubowi/rhel7/.bz1608955/kernel-3.10.0-862.el7.bz1608955.lss01.x86_64.rpm

It's based on the RHEL 7.5 kernel with the addition of some diagnostic code in the EFI stub routine which allocates space for the initrd/initramfs image.

Ideally, it would be great if you can boot this in the same way with any of the initramfs images which previously resulted in the 'EFI stub: ERROR: Failed to alloc highmem for initrd' error. This diagnostic kernel does not have any additional fixes, so I expect that it will fail in exactly the same way, but provide useful diagnostic output on the console. This additional output should look something like this:

Lenny: make_boot_params: hdr->initrd_addr_max=0x7fffffff
Lenny: Flag allows initrd above 4GB
Lenny: File: efi\redhat\initrd.img size: 0x1597f14
Lenny: EFI avail mem at=0x0, pages=0x58
Lenny: EFI avail mem at=0x5e000, pages=0x40
Lenny: EFI avail mem at=0x879bf000, pages=0xe3f
Lenny: EFI avail mem at=0x888dc000, pages=0xdf
Lenny: EFI avail mem at=0x88b4b000, pages=0xc
Lenny: EFI avail mem at=0x88b60000, pages=0x2
Lenny: EFI avail mem at=0x100000000, pages=0x76f000
Lenny: efi_high_alloc, status=0x5
EFI stub: ERROR: Failed to alloc highmem for initrd

It doesn't matter that the initramfs was built against a different kernel. The kernel never gets to the point of loading it, let alone mounting it and attempting to use any of the modules it contains. I believe that the size of the initramfs and the state of the EFI memory map are the key factors.

                                -Lenny.

Comment 18 Lenny Szubowicz 2018-08-02 21:22:48 UTC
Irina,

Thank you for providing the results from the lss01 test kernel.

Was this result obtained from an "introspection" using the test kernel but otherwise exactly the same agent.ramdisk and the same hardware environment where previously the 'EFI stub: ERROR: Failed to alloc highmem for initrds' was encountered?

If this result was obtained from exactly the same hardware environment and exactly the same agent.ramdisk image, then the result is quite surprising to me.

The only way to get that error message is if the routine efi_high_alloc() fails to allocate sufficient memory for the initrd. However, the debug output that I  added shows that efi_high_alloc() was able to find and allocate the memory it needed. In particular, this message shows that it found one fit:

Lenny: Best fit so far at=0x25749000

And ultimately, efi_high_alloc() returned a success status:

Lenny: efi_high_alloc, status=0x0

I was expecting that this routine would not find any fits and that it would return a failure status as before. The output of the EFI free memory regions would then hopefully confirm my guess in comment 11. But it looks like my guess is not correct if this test was done with the same agent.ramdisk and the same environment that otherwise results in that error. 

If you did not regenerate the agent.ramdisk, then none of the kernel's runtime drivers would load from that initramfs since they are not seen as being compatible with my test kernel. That would very likely result in all sorts of failures to find devices by the code that runs from the agent.ramdisk.

It is possible that my diagnostic output had some unexpected side effect that somehow avoided the original problem. For example, it's certainly possible that the routines I used to do console output did dynamic memory allocation such that the memory map was altered. I think I may need to produce a diagnostic kernel that doesn't do any extra output until the failure is detected.

Please confirm how the test was run so I can figure out my next step.

                                 -Lenny.

Comment 19 Irina Petrova 2018-08-03 15:06:42 UTC
Hey Lenny.

Verbatim:
~~~
We used the same hardware and the same ramfs for this test.
The thing is that it doesn't produce a successful introspection. It keeps running until the server reboots 3 times and fails.
I didn't do anything special to produce this situation, i just unpacked this kernel images using rpm2cpio and copied the vmlinuz file from the boot directory and pushed it to glance.
~~~

They have also attached the driver they're using but the driver's got nothing to do with this. And I don't see them doing anything wrong (judging from the description above).

Comment 20 Lenny Szubowicz 2018-08-03 17:48:54 UTC
Irina,

Thank you for the confirmation of the test conditions.

I probably wasn't clear enough in comment 13.

My test kernel is not run-time compatible with an agent.ramdisk which was built against any different kernel. None of the kernel modules (aka drivers) contained in that agent.ramdisk will match the version of my diagnostic kernel. As a result, they won't successfully load and the introspection is bound to fail.

I alluded to this in comment 13, but did not emphasize it. That's because I did not expect that my test kernel would be able to load the agent.ramdisk. So it would be entirely moot that its contents were incompatible.

My test 3.10.0-862.el7.bz1608955.lss01 kernel has no additional fixes over the RHEL 7.5 GA kernel (3.10.0-862.el7). It just has additional console output.

If the agent.ramdisk was regenerated using my kernel and its kernel modules, then that would be a compatible combination. Potentially, that should complete introspection. Although, that doesn't shed any light on why the diagnostic kernel manages to load the agent.ramdisk but the standard -862 kernel fails to do so.

                                -Lenny.

Comment 21 Irina Petrova 2018-08-05 16:09:20 UTC
Hi Lenny...

Uhmm, after they have re-installed the Director node (i.e. the node where introspection was ran from; the pxe server in other words), they do *not* see the alloc highmem error anymore. 

They don't see it without touching the introspection init ram disk, they don't see it when they place the driver in it. 

~~~
1. We used Lenny's kernel + original initramfs = Introspection fails (no highmem error).
2. We used original kernel + original initramfs = Introspection fails (no highmem error).
3. We used original kernel + edited initramfs = Introspection successful (no high highmem  error), but the out put of 'openstack baremetal introspection data save 1a4e30da-b6dc-499d-ba87-0bd8a3819bc0 | jq ".inventory.disks"' produces 2 disks (/dev/sda and /dev/sdb). Is this a normal situation? or does it need to produce only one disk (/dev/sda)? because the raid controller is enable.
~~~

We'll continue the investigation from Openstack's POV for now (i.e. on how to properly setup the ironic-python-agent to work with SW RAID).


Many thanks for your help!

--Irina

Comment 22 Irina Petrova 2018-08-14 07:56:43 UTC
I think we can CLOSE this Bug with INSUFFICIENT_DATA.

To the best of my knowledge:

1) the problem is not reproducible
2) we don't have the original environment anymore
3) any additional effort spent on is not reciprocal to the usefulness of eventually finding the root cause (at least not under the current circumstances)

We can always re-open the bug and continue the investigation if the problem resurfaces.

Thank you, Lenny, for all your help.

Comment 23 Lenny Szubowicz 2018-08-14 14:00:41 UTC
(In reply to Irina Petrova from comment #22)
> I think we can CLOSE this Bug with INSUFFICIENT_DATA.
> 
> To the best of my knowledge:
> 
> 1) the problem is not reproducible
> 2) we don't have the original environment anymore
> 3) any additional effort spent on is not reciprocal to the usefulness of
> eventually finding the root cause (at least not under the current
> circumstances)
> 
> We can always re-open the bug and continue the investigation if the problem
> resurfaces.
> 
> Thank you, Lenny, for all your help.

Yes, unfortunately, the original problem is no longer reproducible.
 
I agree with your assessment that without a way of reproducing the conditions which lead to the failure (EFI stub: ERROR: Failed to alloc highmem for initrd) it is very difficult to determine the root cause of the problem and to have strong confidence in the efficacy of any proposed solution.

                              -Lenny.

Comment 24 Edu Alcaniz 2018-10-17 07:47:31 UTC
I am going to reopen because I have a new case with customer having the same issue.

Comment 25 Edu Alcaniz 2018-10-17 07:49:17 UTC
Customer is using RHOSP10 and latest images

rhosp-director-images-10.0-20180821.1.el7ost.noarch
rhosp-director-images-ipa-10.0-20180821.1.el7ost.noarch

As we see this error message comes from kernel. 

[root@server01 httpboot]# strings  agent.kernel | grep highmem
EFI stub: ERROR: Failed to alloc highmem for initrds

Comment 26 Edu Alcaniz 2018-10-17 09:27:55 UTC
Undercloud Kernel

[ealcaniz@ealcaniz rpm]$ grep kernel sh_-c_rpm_--nodigest_-qa_--qf_NAME_-_VERSION_-_RELEASE_._ARCH_INSTALLTIME_date_awk_-F_printf_-59s_s_n_1_2_sort_-f 
erlang-kernel-18.3.4.7-1.el7ost.x86_64                      Tue May 29 00:42:35 2018
kernel-3.10.0-693.17.1.el7.x86_64                           Tue Mar 13 12:01:51 2018
kernel-3.10.0-862.3.2.el7.x86_64                            Tue May 29 00:43:27 2018
kernel-3.10.0-862.6.3.el7.x86_64                            Tue Oct 16 09:42:09 2018
kernel-headers-3.10.0-862.6.3.el7.x86_64                    Mon Aug  6 13:30:44 2018
kernel-tools-3.10.0-862.6.3.el7.x86_64                      Tue Oct 16 09:43:02 2018
kernel-tools-libs-3.10.0-862.6.3.el7.x86_64                 Tue Oct 16 09:41:37 2018

[ealcaniz@ealcaniz rpm]$ grep rhosp-director sh_-c_rpm_--nodigest_-qa_--qf_NAME_-_VERSION_-_RELEASE_._ARCH_INSTALLTIME_date_awk_-F_printf_-59s_s_n_1_2_sort_-f 
rhosp-director-images-10.0-20180103.3.el7ost.noarch         Tue Mar 13 16:23:33 2018
rhosp-director-images-10.0-20180518.1.el7ost.noarch         Tue May 29 00:45:48 2018
rhosp-director-images-10.0-20180618.1.el7ost.noarch         Tue Oct 16 09:42:57 2018
rhosp-director-images-ipa-10.0-20180103.3.el7ost.noarch     Tue Mar 13 16:22:10 2018
rhosp-director-images-ipa-10.0-20180518.1.el7ost.noarch     Tue May 29 00:44:45 2018
rhosp-director-images-ipa-10.0-20180618.1.el7ost.noarch     Tue Oct 16 09:41:53 2018

Comment 27 Lenny Szubowicz 2018-10-17 21:03:49 UTC
You should be able to use the diagnostic kernel pointed to by comment 13.  It's still there.

It doesn't matter that the customer is using a RHEL 7.5 z-stream kernel and the diagnostic kernel is based on the RHEL 7.5 GA kernel. There are no RHEL 7.5 remedial stream fixes in the very early kernel code that is attempting to find sufficient memory for the initrd.

Note that comment 20 applies. I expect that this diagnostic kernel will not work, but that the console output will help me determine the root cause of the initrd memory allocation failure.

I look forward to getting the diagnostic output from a failure.

                                -Lenny.

Comment 28 Edu Alcaniz 2018-10-18 05:39:08 UTC
Thanks I will send the test kernel to the customer

Currently they are using

Linux 3.10.0-862.14.4.el7.x86_64 #1 SMP Fri Sep 21 09:07:21 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Comment 29 Edu Alcaniz 2018-10-18 09:56:50 UTC
Created attachment 1495232 [details]
Failed node with test kernel

Here you have the console output for a failed node

Comment 30 Edu Alcaniz 2018-10-18 09:57:54 UTC
Created attachment 1495233 [details]
Screenshot for good node

Comment 31 Lenny Szubowicz 2018-10-18 19:58:53 UTC
Thank you for providing the console output from the lss01 diagnostic kernel. It provides pretty good evidence for the diagnosis in comment 11.

I should have a kernel with a test fix available very soon.

                                 -Lenny.

Comment 32 Lenny Szubowicz 2018-10-18 23:36:15 UTC
A kernel with a test fix is available:

http://people.redhat.com/~lszubowi/rhel7/.bz1608955/kernel-3.10.0-862.14.4.el7.bz1608955.lss02.x86_64.rpm

This is based on the RHEL 7.5.z kernel that your customer is using. The only additions are the test fix and essentially the same diagnostic code as in the lss01 kernel.

Please let me know how this works out.

                                 -Lenny.

Comment 33 Edu Alcaniz 2018-10-19 08:24:00 UTC
Hi again, 
we have tried with latest test kernel and as we see after failure it tried again with different method and seems to be successful. Below is the log.

The current boot option set by BMC is PXE!



EFI PXE        1 : EFI PXE 0 for IPv4 (28-B4-48-66-4F-02) Dump Memory Map:
Type 		 Start 		 End 		 #page 	 Attributes
available  	          0 	      6DFFF 	 6E 	 F
reserved   	      6E000 	      6FFFF 	 2 	 F
available  	      70000 	      9FFFF 	 30 	 F
available  	     100000 	   2A30EFFF 	 2A20F 	 F
BS_data    	   2A30F000 	   2A40EFFF 	 100 	 F
available  	   2A40F000 	   2A6EEFFF 	 2E0 	 F
BS_data    	   2A6EF000 	   3F050FFF 	 14962 	 F
BS_code    	   3F051000 	   3F069FFF 	 19 	 F
BS_data    	   3F06A000 	   3F2F8FFF 	 28F 	 F
BS_code    	   3F2F9000 	   3F2F9FFF 	 1 	 F
BS_data    o to    3F2FA000 y      3F38BFFF      92      F
available  o to    3F38C000        426C2FFF      3337    F
BS_data    o to    426C3000        429BAFFF      2F8     F
available   to F   429BB000  Rem   429BCFFF      2       F
BS_data     to S   429BD000  on    4758EFFF ard  4BD2    F
available   to S   4758F000        49A7FFFF      24F1    F
BS_code    	   49A80000 	   4B58EFFF 	 1B0F 	 F
RT_code    	   4B58F000 	   4B8FEFFF 	 370 	 F
RT_data    	   4B8FF000 	   53CFEFFF 	 8400 	 F
reserved   	   53CFF000 	   61CFEFFF 	 E000 	 F
ACPI_NVS   	   61CFF000 	   76CFEFFF 	 15000 	 F
ACPI_recl  	   76CFF000 	   777FEFFF 	 B00 	 F
BS_data    	   777FF000 	   777FFFFF 	 1 	 F
available  	  100000000 	 407FFFFFFF 	 3F80000 	 F
reserved   	      A0000 	      FFFFF 	 60 	 0
reserved   	   77800000 	   7FFFFFFF 	 8800 	 0
MemMapIO   	   80000000 	   8FFFFFFF 	 10000 	 1
MemMapIO   	   FD000000 	   FE7FFFFF 	 1800 	 1
MemMapIO   	   FEB00000 	   FEB03FFF 	 4 	 1
MemMapIO   	   FEC00000 	   FEC00FFF 	 1 	 1
MemMapIO   	   FEC80000 	   FED00FFF 	 81 	 1
MemMapIO   	   FF000000 	   FFFFFFFF 	 1000 	 1

  reserved  :    16862 Pages(377888768)
  BS_code   :     1B29 Pages(28479488)
  BS_data   :    19C4E Pages(432332800)
  RT_code   :      370 Pages(3604480)
  RT_data   :     8400 Pages(138412032)
  available :  3FAFDB7 Pages(273533333504)
  ACPI_recl :      B00 Pages(11534336)
  ACPI_NVS  :    15000 Pages(352321536)
  MemMapIO  :    12886 Pages(310927360)
Total Memory: 261783 MB (274500018176) Bytes

>>Start PXE over IPv4, Press [ESC] to EXIT...
  Station IP address is 172.16.48.57

  Server IP address is 172.16.48.10
  NBP filename is ipxe.efi
  NBP filesize is 977248 Bytes
 Downloading NBP file...

  Succeed to download NBP file.
iPXE initialising devices...ok



iPXE 1.0.0+ (133f) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP iSCSI TFTP SRP AoE EFI Menu

net0: 28:b4:48:66:4f:02 using 14e4-1657 on 0001:af:00.2 (open)
  [Link:down, TX:0 TXE:0 RX:0 RXE:0]
  [Link status: Down (http://ipxe.org/38086193)]
Waiting for link-up on net0...... ok
Configuring (net0 28:b4:48:66:4f:02)...... ok
net0: 172.16.48.57/255.255.252.0 gw 172.16.48.10
Next server: 172.16.48.10
Filename: http://172.16.48.10:8088/inspector.ipxe
http://172.16.48.10:8088/inspector.ipxe... ok
inspector.ipxe : 458 bytes [script]
http://172.16.48.10:8088/agent.kernel... ok
http://172.16.48.10:8088/agent.ramdisk... ok
Lenny: make_boot_params: hdr->initrd_addr_max=0x7fffffff
Lenny: Flag allows initrd above 4GB
Lenny: File: agent.ramdisk size: 0x195fb7bc
Lenny: EFI avail mem at=0x0, pages=0x6e
Lenny: EFI avail mem at=0x75000, pages=0x2b
Lenny: EFI avail mem at=0x100000, pages=0xf5b9
Lenny: EFI avail mem at=0x2a40f000, pages=0x2e0
Lenny: EFI avail mem at=0x3f38f000, pages=0x6d
Lenny: EFI avail mem at=0x3f58f000, pages=0x28f8
Lenny: EFI avail mem at=0x424ae000, pages=0x3
Lenny: EFI avail mem at=0x424b2000, pages=0x3
Lenny: EFI avail mem at=0x424b6000, pages=0x2
Lenny: EFI avail mem at=0x424bb000, pages=0x2
Lenny: EFI avail mem at=0x424d5000, pages=0xc
Lenny: EFI avail mem at=0x424e5000, pages=0x4
Lenny: EFI avail mem at=0x424ec000, pages=0x2
Lenny: EFI avail mem at=0x424f1000, pages=0x1
Lenny: EFI avail mem at=0x424f3000, pages=0x13
Lenny: EFI avail mem at=0x42507000, pages=0x21
Lenny: EFI avail mem at=0x4252a000, pages=0x1
Lenny: EFI avail mem at=0x42531000, pages=0x1
Lenny: EFI avail mem at=0x42536000, pages=0x33
Lenny: EFI avail mem at=0x4256b000, pages=0x50
Lenny: EFI avail mem at=0x425bc000, pages=0x1
Lenny: EFI avail mem at=0x425be000, pages=0x100
Lenny: EFI avail mem at=0x432dc000, pages=0x4e
Lenny: EFI avail mem at=0x43341000, pages=0x2
Lenny: EFI avail mem at=0x43346000, pages=0x5
Lenny: EFI avail mem at=0x4758f000, pages=0x24f1
Lenny: EFI avail mem at=0x100000000, pages=0x3f80000
Lenny: efi_high_alloc, no fit found below=0x7fffffff
Lenny: efi_high_alloc, status=0x800000000000000e
EFI stub: ERROR: Failed to alloc highmem for initrds
Trying to load files to higher address
Lenny: File: agent.ramdisk size: 0x195fb7bc
Lenny: EFI avail mem at=0x0, pages=0x6e
Lenny: EFI avail mem at=0x75000, pages=0x2b
Lenny: EFI avail mem at=0x100000, pages=0xf5b9
Lenny: EFI avail mem at=0x2a40f000, pages=0x2e0
Lenny: EFI avail mem at=0x3f38f000, pages=0x6d
Lenny: EFI avail mem at=0x3f58f000, pages=0x28f8
Lenny: EFI avail mem at=0x424ae000, pages=0x3
Lenny: EFI avail mem at=0x424b2000, pages=0x3
Lenny: EFI avail mem at=0x424b6000, pages=0x2
Lenny: EFI avail mem at=0x424bb000, pages=0x2
Lenny: EFI avail mem at=0x424d5000, pages=0xc
Lenny: EFI avail mem at=0x424e5000, pages=0x4
Lenny: EFI avail mem at=0x424ec000, pages=0x2
Lenny: EFI avail mem at=0x424f1000, pages=0x1
Lenny: EFI avail mem at=0x424f3000, pages=0x13
Lenny: EFI avail mem at=0x42507000, pages=0x21
Lenny: EFI avail mem at=0x4252a000, pages=0x1
Lenny: EFI avail mem at=0x42531000, pages=0x1
Lenny: EFI avail mem at=0x42536000, pages=0x33
Lenny: EFI avail mem at=0x4256b000, pages=0x50
Lenny: EFI avail mem at=0x425bc000, pages=0x1
Lenny: EFI avail mem at=0x425be000, pages=0x100
Lenny: EFI avail mem at=0x432dc000, pages=0x4e
Lenny: EFI avail mem at=0x43341000, pages=0x2
Lenny: EFI avail mem at=0x43346000, pages=0x5
Lenny: EFI avail mem at=0x4758f000, pages=0x24f1
Lenny: EFI avail mem at=0x100000000, pages=0x3f80000
Lenny: Best high fit so far at=0x4066a04000
Lenny: efi_high_alloc, status=0x0

Comment 38 Antonio Gianfreda 2018-10-22 07:28:29 UTC
Hi, we are hitting the same error on similar RHOSP10 env during overcloud deploy; all servers stuck on boot with the following error:

EFI stub: ERROR: Failed to alloc highmem for initrd

[stack@director-hu-tovb ~]$ sudo rpm -qa | grep kernel | sort
erlang-kernel-18.3.4.7-1.el7ost.x86_64
kernel-3.10.0-862.14.4.el7.x86_64
kernel-tools-3.10.0-862.14.4.el7.x86_64
kernel-tools-libs-3.10.0-862.14.4.el7.x86_64
[stack@director-hu-tovb ~]$ sudo rpm -qa | grep director | sort
rhosp-director-images-10.0-20180821.1.el7ost.noarch
rhosp-director-images-ipa-10.0-20180821.1.el7ost.noarch

We are going to try with the test kernel. Do you know when the fix will be ported to official images?

Comment 40 Aviv Guetta 2018-10-22 10:57:19 UTC
Antonio,
Thanks for trying the test-kernel, your input is valuable.
A fixed kernel build will within the official images only after it will pass QA cycle and launched by an errata.

Please let me know (via the support case) if there are other considerations to take.


Aviv

Comment 41 Aviv Guetta 2018-10-22 11:35:06 UTC
Antonio,
Regarding my previous comment, I'd add IF it will be found as a bug. Currently, it's under investigation.


Thanks,
Aviv

Comment 42 Lenny Szubowicz 2018-10-23 01:05:14 UTC
A fix, consisting of a 3 patch set, has been posted for RHEL 7.7.

The exact same patches can be applied for RHEL 7.6.z and 7.5.z.

                                -Lenny.

Comment 43 Lenny Szubowicz 2018-10-23 01:07:08 UTC
Created attachment 1496546 [details]
Patch 0 of 3 submitted to rhkernel-list for review (cover letter)

Comment 44 Lenny Szubowicz 2018-10-23 01:07:58 UTC
Created attachment 1496547 [details]
Patch 1 of 3 submitted to rhkernel-list for review

Comment 45 Lenny Szubowicz 2018-10-23 01:08:33 UTC
Created attachment 1496548 [details]
Patch 2 of 3 submitted to rhkernel-list for review

Comment 46 Lenny Szubowicz 2018-10-23 01:09:07 UTC
Created attachment 1496549 [details]
Patch 3 of 3 submitted to rhkernel-list for review

Comment 52 Bruno Meneguele 2018-10-30 19:45:58 UTC
Patch(es) committed on kernel-3.10.0-958.el7

Comment 57 errata-xmlrpc 2019-08-06 12:08:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2029