Bug 1266985

Summary: Patch 138 to grub2 breaks PXE booting in UEFI mode?
Product: Red Hat Enterprise Linux 7 Reporter: Mike Mosley <jmmosley>
Component: grub2Assignee: Peter Jones <pjones>
Status: CLOSED CURRENTRELEASE QA Contact: Release Test Team <release-test-team-automation>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.1CC: jmmosley, shubham.git, tomek
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-19 22:06:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
0.23 .. 0.29 none

Description Mike Mosley 2015-09-28 18:07:02 UTC
Description of problem: While moving from RHEL 6.5 to 7.1 we discovered that attempting to PXE boot our Dell Servers (PowerEdge 720xd) in UEFI mode would fail, causing the server to crash after grubx64.efi was loaded.


Version-Release number of selected component (if applicable):
grub2-efi-2.02-0.16

How reproducible:
Simply attempt to PXE boot a server in UEFI mode using the grubx64.efi file provided in the RHEL 7.1 distribution.

Steps to Reproduce:
1.  Configure DHCP to supply shim.efi (via tftpboot)to the machine attempting to PXE boot
2.  Watch shim.efi load, and then observe it tftpboot grubx64.efi over and load it.
3.  System (720xd) crashes shortly thereafter.

Actual results:
The grubx64.efi image never requests grub.cfg

Expected results:
In a normal process, I would see grubx64.efi load and then transfer grub.cfg over (via TFTP) and then continue into our Kickstart build process.


Additional info:  I have attempted this same process with a Dell R620.   In this case, we see grubx64.efi load and then numerous ARP requests are sent out and it appears that grub is ignoring the response. Again, it never requests grub.cfg.  Eventually the host drops into a grub prompt and stops.

To troubleshoot the issue, I took just the grubx64.efi binary from the RHEL 7.0 release and dropped it into place, leaving everything else 7.1.    This worked as expected on both systems.

So, I downloaded the source for the RHEL 7.1 version of grub2  (grub2-efi-2.02-0.16), built it, and started eliminating patches until I got a version that would work.   I finally discovered that by eliminating a single patch (listed below) the grubx64.efi file would work as expected.   The patch I excluded is:

Patch0138: 0138-reopen-SNP-protocol-for-exclusive-use-by-grub.patch

That is about as far as I could go with it.

Here is additional info about the two machines I tested with:
Dell PowerEdge R720xd
PXE Boot NIC Info:
Description:		Ethernet Controller 10-Gigabit X540-AT2
Vendor	:		Intel Corporation
Driver Name:		ixgbe
Driver Version: 	3.15.1-k
Firmware Version:	Family 15.0.28 (0x800004cf)


Dell PowerEdge R620
PXE Boot NIC Info:
Description:		PowerEdge R610 BCM5709 Gigabit Ethernet
Vendor	:		Broadcom Corporation
Driver Name:		bnx2
Driver Version:	2.2.5
Firmware Version:	Family 4.6.8 (bc 4.6.4 NCSI 1.0.6)

Comment 2 Peter Jones 2015-10-05 18:06:29 UTC
Can you try this with grub2-2.02-28 ?  We've seen several bugs like this, and the patch you've found as the problem is actually the /fix/ for it on many hardware platforms, but there are a couple of other fixes for the same behavior as well.  So there's a good chance the current build will behave differently.

Comment 3 Mike Mosley 2015-10-05 18:28:37 UTC
Peter,
I would be happy to try it out, but I can't seem to locate that particular version.  The latest I could find is 2.02-0.23.  Where should I download that version from?
Thanks,
Mike

Comment 4 Mike Mosley 2015-10-06 18:47:42 UTC
Peter,
I built 2.02-0.23 and tried it.  Same results.  I have not been able to find the version you referenced (grub2-2.02-28).

Mike

Comment 5 Peter Jones 2015-10-12 15:51:00 UTC
Created attachment 1082049 [details]
0.23 .. 0.29

So, is 28 was the current beta build for 7.2.  If you don't have access to that, then the attached patch should get you the same thing from 0.23.

Comment 6 Mike Mosley 2015-10-13 18:51:43 UTC
Ok, I checked the RHEL 7.2 Beta DVD and found 0.25.  Still unable to find 0.28 anywhere.  

I'm not familiar with the format of the attachment you provided because I seldom do source code patching.

After saving it as a raw unified diff, I attempted to split it into individual patches using splitpatch but I was unsuccessful.   What is the magic for applying the file of patches provided in the attachment such that I can use it in the rpmbuild environment for 0.23 to get to 0.29?

Comment 7 Mike Mosley 2015-10-14 19:19:31 UTC
Well, after much confusion and some amusing attempts on my part, I finally got things to work using 0.23 and the attachment of patches (to bring it to 0.29).   

I eventually used 'splitdiff' to break the individual patches out of the attachment.  In order to get 'rpmbuild' to like the patches, I had to remove a couple of proceeding characters on the lines in each file as well as remove 3 lines proceeding the From: lines.  Otherwise, rpmbuild complained that there was no valid email in the patch.

Next, I had to eliminate some of the patch files I generated with splitdiff as they were deemed 'empty' or 'not applicable'  Here is a list of the patches I did get to apply:

Applying: Reverse rpmvercmp return value (#1229329)
Applying: efinet: memory leak on module removal
Applying: efinet: cannot free const char * pointer
Applying: Revert "efinet: memory leak on module removal"
Applying: efinet: handle get_status() on buggy firmware properly
Applying: ppc64le sync mkconfig to disk (#1212114)
Applying: tcp: ack when we get an OOO/lost packet
Applying: tcp: add window scaling support
Applying: Be more aggro about actually using the *configured* network device.
Applying: efinet: add filter for the first exclusive reopen of SNP
Applying: Put the correct .file directives in our .S files.
Applying: Make efi machines load an env block from a variable
Applying: Make it possible to enabled --build-id=sha1
Applying: Add grub_qdprintf() - grub_dprintf() without the file+line number.
Applying: Make a "gdb" dprintf that tells us load addresses.

Anyway, it did actually compile, and I was able to test the resulting grubx64-efi file.  And strangely enough, it worked! :-)

I don't know if any of this information is useful or not.  My goal is to provide you guys with enough info so that we can ensure RHEL 7.2 will have a grux64.efi that works for us.

Mike