Bug 1869987

Summary: error: ../../grub-core/net/net.c:1795:timeout reading initrd.img
Product: Red Hat Enterprise Linux 7 Reporter: Javier Martinez Canillas <fmartine>
Component: grub2Assignee: Bootloader engineering team <bootloader-eng-team>
Status: CLOSED ERRATA QA Contact: Release Test Team <release-test-team-automation>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.9CC: bootloader-eng-team, djuarezg, emcnabb, extras-qa, fmartine, gmarr, hartsjc, ktordeur, lkundrak, lzap, pjanda, pjones, pvlasin, pwhalen, robatino, sadas
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: grub2-2.02-0.87.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1869335
: 1871034 (view as bug list) Environment:
Last Closed: 2020-09-29 20:59:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 245418, 1871034    

Description Javier Martinez Canillas 2020-08-19 07:47:48 UTC
+++ This bug was initially created as a clone of Bug #1869335 +++

Description of problem:
Attempting to pxe boot the latest Fedora 33 nightly (Fedora-33-20200816.n.0) fails to read the initrd.img

error: ../../grub-core/net/net.c:1795:timeout reading
`/fedora/Fedora-33-20200816.n.0/Everything/initrd.img'.

Version-Release number of selected component (if applicable):
grub2-2.04-28.fc33

Additional info:

Replacing grubaa64.efi from an earlier compose (Fedora-Rawhide-20200811.n.0) worked as expected.

--- Additional comment from Paul Whalen on 2020-08-17 15:27:39 UTC ---

Proposing as a blocker for F33 beta, criteria "It must be possible to install by booting the installation kernel directly (including via PXE) and correctly specifying a remote source for the installer itself."

--- Additional comment from Geoffrey Marr on 2020-08-17 23:41:28 UTC ---

Discussed during the 2020-08-17 blocker review meeting: [0]

The decision to classify this bug as an "AcceptedBlocker" was made as it violates the following criterion:

"It must be possible to install by booting the installation kernel directly (including via PXE)...", specifically concerning aarch64.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-08-17/f33-blocker-review.2020-08-17-16.11.txt

--- Additional comment from Paul Whalen on 2020-08-18 01:51:40 UTC ---

Same error with grub2-2.04-27.fc33

--- Additional comment from Javier Martinez Canillas on 2020-08-18 10:34:54 UTC ---

I was also able to reproduce this issue.

With grub2-2.04-27.fc33:

grub> linux /images/pxeboot/vmlinuz
grub> initrd /images/pxeboot/initrd.img
error: ../../grub-core/net/net.c:1795:timeout reading
`/images/pxeboot/initrd.img'.
grub> echo $?
28

With grub2-2.04-25.fc33:

grub> linux /images/pxeboot/vmlinuz
grub> initrd /images/pxeboot/initrd.img
grub> echo $?
0

--- Additional comment from Javier Martinez Canillas on 2020-08-18 16:02:26 UTC ---

Could you please test using the attached grubaa64.efi binary?

--- Additional comment from Paul Whalen on 2020-08-18 16:18:18 UTC ---

(In reply to Javier Martinez Canillas from comment #5)
> Created attachment 1711746 [details]
> grubaa64.efi
> 
> Could you please test using the attached grubaa64.efi binary?

That works.

--- Additional comment from Javier Martinez Canillas on 2020-08-18 16:50:20 UTC ---

(In reply to Paul Whalen from comment #6)
> (In reply to Javier Martinez Canillas from comment #5)
> > Created attachment 1711746 [details]
> > grubaa64.efi
> > 
> > Could you please test using the attached grubaa64.efi binary?
> 
> That works.

Thanks for testing. It's a build that drops the following patch:

https://src.fedoraproject.org/rpms/grub2/blob/f33/f/0238-tftp-Do-not-use-priority-queue.patch

I will dig deeper why that patch is leading to the TFTP regression.

Comment 2 Javier Martinez Canillas 2020-08-19 07:49:59 UTC
I was also able to reproduce this bug with grub2-2.02-0.86.el7

Comment 3 Javier Martinez Canillas 2020-08-19 08:13:28 UTC
It also happens on x86_64, is not specific to aarch64.

It's easier to reproduce with a large initrd, it worked correctly with an initrd with 53M but failed with one whose size is 69M.

Comment 5 Petr Janda 2020-08-19 18:15:05 UTC
Reproduced in virtual machine with RHEL-7.9-20200810.2-Server-x86_64 
grub version 
grub2-2.02-0.86.el7_8

initrd file created by dd 
$dd if=/dev/zero of=initrd.big bs=1M count=69


- start pxeboot of uefi machine using grubx64.efi 
- go to command line
grub> linuxefi /images/pxeboot/vmlinuz  #fix path according your setup
grub> initrdefi /images/pxeboot/initrd_big.img
error: timeout reading `/images/pxeboot/initrd_big.img'.
grub> echo $?
28
grub>


The size limit lies somewhere between 66060288 Bytes (63 MiB) and 67108864 Bytes (64 MiB)
on the RHEL-7.9-20200810.2-Server-x86_64 initrd has 51187595 Bytes (49 MiB)

Comment 9 Adam Williamson 2020-08-19 19:22:04 UTC
Clearing Fedora metadata.

Comment 26 Petr Janda 2020-08-24 15:50:27 UTC
Verified on RHEL-7.9-20200821 x86_64 Server compose with grub2 efi binary updated to grub2-2.02-0.87.el7
didn't tried actual boot as I have no such large initrd image, but with artifficaly created file using dd it works as expected ("initrd" was transfered, kernel fails to unpack it)
Tried files 70 MB and 400 MB big - both seems to work.

Comment 28 errata-xmlrpc 2020-09-29 20:59:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (grub2 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4046

Comment 29 Daniel Juarez 2021-06-28 13:32:56 UTC
Isn't this issue also present when using HTTP protocol instead of TFTP?

It seems the issue is fixed when using TFTP menuentries, but not for HTTP ones, i.e. (http)/vmlinuz