Bug 699313

Summary: Fedora 15 Beta fails to boot on thinkpad w520 in EFI mode
Product: [Fedora] Fedora Reporter: liucougar
Component: kernelAssignee: John Feeney <jfeeney>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 15CC: gansalmon, itamar, jfrates, jonathan, kernel-maint, madhu.chinakonda, matej, pjones, vbraun.name
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-25 05:04:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Output of "memmap" from the UEFI Shell
none
Output of "devtree" from the UEFI Shell
none
screenshots of reboot crash message
none
Screenshot of X220 hang on restart with EFI enabled 3.0.0-rc1-00049-g1fa7b6a kernel
none
dmesg from 3.0-rc1-something booting ThinkPad X220 in EFI mode
none
dmesg on thinkpad w520 with 2.6.38.6 with EFI patches none

Description liucougar 2011-04-25 04:26:54 UTC
Description of problem:
UEFI is set as the first boot option in thinkpad w520 BIOS settings, and Fedora 15 Beta fails to boot

Version-Release number of selected component (if applicable):
Fedora 15 Beta

How reproducible:


Steps to Reproduce:
1. boot from Fedora 15 Beta USB (created from the ISO)
2. select Fedora-15-Beta in the grub menu
3. at the top-left corner, I see:
Trying to allocate 940 pages for VMLINU2
[Linux-EFI, setup=0x1014, size=0x3ab7d0]
    [Initrd, addr=0x770de000, size=0x8a1e96c]

Actual results:
nothing is printed after these 3 lines, and it hangs there forever

Expected results:
it boots

Additional info:
by specifying noefi on the kernel line in grub makes it boot; with Fedora 15 Alpha, I encounter another bug https://bugzilla.redhat.com/show_bug.cgi?id=683693 where it does not boot with EFI either

Comment 1 liucougar 2011-04-25 16:53:35 UTC
forgot to mention: this thinkpad w520 has 16GB installed memory

Comment 2 Chuck Ebbert 2011-04-27 01:52:25 UTC
In bug 683693 you report that re-adding the EFI physical mode patch makes it boot, is that correct?

Comment 3 Matthew Garrett 2011-04-27 02:05:14 UTC
Boot and then explode on efivars use, so it's a choose your poison kind of exercise. We can't readd the physical mode patch - we need to figure out why SetVirtualAddressMap() is failing.

Comment 4 liucougar 2011-04-27 02:29:46 UTC
yes, with physical mode, I can indeed boot, but I have to not compile in efivars, otherwise I hit bug 683693

but physical mode patch introduces some side-effect, such as on every reboot, I will receive this error:
BUG: bug: unable to handle kernel paging request at 0x......

and it does not reboot

I find similar bug reports on ubuntu:
https://bugs.launchpad.net/ubuntu/+source/casper/+bug/635439
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/721576

I'd love to have my machine boot without that physical mode patch under UEFI

let me know if you need anything I can provide

Comment 5 Jarrod Frates 2011-04-27 20:40:09 UTC
Duplicate of bug 659905?  I haven't tried the physical mode patch, but the underlying symptoms look the same.

Comment 6 liucougar 2011-04-27 21:59:35 UTC
(In reply to comment #5)
> Duplicate of bug 659905?  I haven't tried the physical mode patch, but the
> underlying symptoms look the same.

yes, it does look quite similar. Jarrod, do you know what's your UEFI firmware make and revision? my thinkpad w520 has the following info (reported by UEFI Shell, "ver" command):
EFI Specification Revision : 2.0
EFI Vendor             : Phoenix Technologies Ltd.
EFI Revision            : 4660.22136

also, if you try physical mode patch, you have to turn off the efivars (not compile it in) and not load it on boot (I think the efivars module is either in the initrd image or compiled into the kernel on Fedora 15 Alpha installation disk, and there is no way to disable it on boot command line)

Comment 7 liucougar 2011-04-27 23:44:03 UTC
I also notice this row in dmesg when I boot up with physical mode patch, can it be the cause of the crash in SetVirtualAddressMap() without physical mode patch?

[ 0.000000] Kernel-defined memdesc doesn't match the one from EFI!

Comment 8 Matthew Garrett 2011-04-28 00:12:34 UTC
This turns out to be Linux complaining about something the specification explicitly allows. I've submitted a patch to get rid of it.

Comment 9 Jarrod Frates 2011-04-28 00:41:22 UTC
I would love to provide the EFI revision, but AFAICT, this model does not have a UEFI shell available for it.  Even the diagnostics disc does not provide the information, and mechanisms that provide it for workstation and server models do not appear to function on this notebook.  If you know of a bootable environment that would provide it, I will be happy--ecstatic, even--to look it up.

Matthew: I'll watch for updates and try it on my system as soon as it's published.

Comment 10 liucougar 2011-04-28 00:51:32 UTC
(In reply to comment #9)
> I would love to provide the EFI revision, but AFAICT, this model does not have
> a UEFI shell available for it.  Even the diagnostics disc does not provide the
> information, and mechanisms that provide it for workstation and server models
> do not appear to function on this notebook.  If you know of a bootable
> environment that would provide it, I will be happy--ecstatic, even--to look it
> up.
you could install one yourself, download the following file, rename it shellx64.efi and move it to (ESP)/EFI/shellx64.efi
http://tianocore.git.sourceforge.net/git/gitweb.cgi?p=tianocore/edk2;a=blob_plain;f=EdkShellBinPkg/FullShell/X64/Shell_Full.efi;hb=HEAD

then modify your grub-efi to add one entry, take a look at the last menuentry in the grub.conf file attached to the following bug report:
https://savannah.gnu.org/bugs/?33162

> Matthew: I'll watch for updates and try it on my system as soon as it's
> published.
I think Matthew's patch only fixes the warning, it does not make it boot, yet

Comment 11 Jarrod Frates 2011-04-28 16:52:05 UTC
Darn.  I just got a little over-excited.

Unfortunately, at this point, I do not have a functional Linux installation on the notebook.  Are there any other ways of getting it to run, such as a bootable CD?

Comment 12 Matthew Garrett 2011-04-28 17:05:32 UTC
Yes, disable EFI booting in the BIOS.

Comment 13 Volker Braun 2011-05-02 11:37:05 UTC
I ran into the same bug on my W520, also with 16GB RAM. Switching to Legacy boot in the BIOS works (though not with a GPT-formatted disk), but makes the boot initialization process noticeably slower. So I spent some time trying to get UEFI boot working with Fedora 15 beta. 

How to reproduce:

1. I dd'ed the efidisk.img from the F15 beta to a USB stick and booted from there. Boot hangs as in the ticket description. 


2. Editing the kernel command line and adding "noefi" allows the Linux kernel to boot and the F15 installer works fine until it is time for the reboot. System fails to boot. 


3. Anaconda did not create a EFI System Partition (ESP). There is no gdisk in efidisk.img (?!?) and parted is too buggy/featureless to manually create the ESP. I removed the SSD from the laptop and created the ESP on my F14 desktop with gdisk.


4. Cannot run efibootmgr because of "noefi". Fails with

Fatal: Couldn't open either sysfs or procfs directories for accessing EFI
variables.
Try 'modprobe efivars' as root.

And the modprobe efivars doesn't work, of course.


5. I copied grub.efi to (ESP)/EFI/BOOT/BOOTX64.EFI and manually created a grub legacy configuration file at (ESP)/EFI/BOOT/BOOTX64.CONF  Since these are the default EFI boot loaders, the thinkpad will load them without using efibootmgr.

At this point, I can EFI boot Fedora 15 beta.


I also tried the UEFI Shell from comment 10. I found no way to select the shell during bootup. The grub.cfg that is linked from comment 10 is using grub2 syntax, I'm pretty sure that that won't work with Fedoras legacy (efi-patched) grub. So I overwrote grub at (ESP)/EFI/BOOT/BOOTX64.EFI  The shell then loads as it should. "ver" returns

EFI Specification Revision : 2.0
EFI Vendor                 : Phoenix Technologies Ltd.
EFI Revision               : 4660.22136

Comment 14 Volker Braun 2011-05-02 12:57:57 UTC
Created attachment 496253 [details]
Output of "memmap" from the UEFI Shell

Comment 15 Volker Braun 2011-05-02 12:58:34 UTC
Created attachment 496254 [details]
Output of "devtree" from the UEFI Shell

Comment 16 liucougar 2011-05-08 19:12:44 UTC
a bit more info from https://bugs.launchpad.net/ubuntu/+source/linux/+bug/705588

this same error happens when enable virtualbox EFI boot option (Settings -> System -> Enable EFI) and boot the CD in EFI mode

maybe it's easier to debug this kernel issue in virtualbox?

Comment 17 liucougar 2011-05-08 22:56:16 UTC
Hi Matthew Garrett, I just noticed you submitted 5 patches to lkml (details https://lkml.org/lkml/2011/5/5/246 )

does it fix this issue for you on thinkpad w520?

Comment 18 Matthew Garrett 2011-05-08 23:24:34 UTC
If I had a W520, I could tell you...

I'll try to find out if we have access to any this week.

Comment 19 liucougar 2011-05-09 02:22:31 UTC
I tried these 5 patches against 3.6.38, and unfortunately it still does not boot :(

something else is still not working

Matthew, thanks for working on this

Comment 20 liucougar 2011-05-20 22:53:29 UTC
all the 5 patches in a single file: https://lkml.org/lkml/2011/5/19/616

another patch in the work: https://lkml.org/lkml/2011/5/19/377

Comment 21 liucougar 2011-06-02 03:48:51 UTC
the other patch is landed: https://lkml.org/lkml/2011/6/1/172

haven't yet got around to try it out

Comment 22 liucougar 2011-06-06 06:55:46 UTC
just applied patch https://lkml.org/lkml/2011/6/1/172 (on top of https://lkml.org/lkml/2011/5/19/616 I had to manually merge one changes in efi_64.c file) to my 3.6.38, and my thinkpad w520 finally boots fine in EFI mode 

finally I can remove noefi on my kernel boot line

thanks Matthew

Comment 23 liucougar 2011-06-06 08:16:47 UTC
the 5 patches plus the one in https://lkml.org/lkml/2011/6/1/172 are all required for thinkpad w520 to boot (tested on 2.6.38.4 and 2.6.38.6)

efivars can be loaded successfully (did not actually try using it)

but even with all these patches, kernel crashes and freezes on reboot, will add attachement of some screen shots

Comment 24 liucougar 2011-06-06 08:18:48 UTC
Created attachment 503169 [details]
screenshots of reboot crash message

Comment 25 Matěj Laitl 2011-06-06 13:36:29 UTC
Created attachment 503224 [details]
Screenshot of X220 hang on restart with EFI enabled 3.0.0-rc1-00049-g1fa7b6a kernel

I have similar problems, with Linux 3.0-rc1 my ThinkPad X220 laptop boots fine, but often hangs during restarts. See the screenshot.

Comment 26 Matthew Garrett 2011-06-06 13:46:35 UTC
It's jumping to a physical address while in virtual mode. This is so far outside the EFI spec that it's not even funny, so I'm completely unsurprised. Can you attach dmesg from an EFI boot?

Comment 27 Matěj Laitl 2011-06-06 13:50:37 UTC
Created attachment 503228 [details]
dmesg from 3.0-rc1-something booting ThinkPad X220 in EFI mode

Here you go... :)

Comment 28 Matthew Garrett 2011-06-06 14:05:27 UTC
Yup, the address it's trying to access is the physical address of one of the runtime sections. Everything about this is incompetent. Alternatives here are either to set up a physical page table for rebooting or just ignore the EFI reboot and use the existing methods instead. Can you try passing 

reboot=a

to your boot parameters and see if that successfully reboots?

Comment 29 Matěj Laitl 2011-06-06 14:28:04 UTC
(In reply to comment #28)
> Can you try passing  reboot=a to your boot parameters and see if that
> successfully reboots?

With reboot=a my system reboots just fine. (at least 2 times I tested it)

Comment 30 liucougar 2011-06-06 19:50:54 UTC
Created attachment 503315 [details]
dmesg on thinkpad w520 with 2.6.38.6 with EFI patches

my thinkpad w520 can reboots fine with reboot=a kernel command line argument with these EFI patches

it reboots without crashes even if I disable BIOS boot support (so only pure UEFI boot is enabled).

however, I did notice the following error in my dmesg:

[    0.000591] Call Trace:
[    0.000598]  [<ffffffff810a09b5>] ? bad_page+0x95/0xe0
[    0.000601]  [<ffffffff810a1314>] ? free_pages_prepare+0x94/0xa0
[    0.000605]  [<ffffffff810a2168>] ? free_hot_cold_page+0x48/0x1a0
[    0.000610]  [<ffffffff816088eb>] ? free_bootmem_late+0x37/0x4c
[    0.000613]  [<ffffffff8160466d>] ? efi_enter_virtual_mode+0x24d/0x337
[    0.000618]  [<ffffffff815f3bd2>] ? start_kernel+0x28e/0x2ed
[    0.000621]  [<ffffffff815f33d7>] ? x86_64_start_kernel+0xf3/0xf7

the full dmesg log is attached. the same error is present in Matěj Laitl's X220 dmesg as well

Comment 31 Matthew Garrett 2011-06-06 20:23:03 UTC
Yeah, there's a patch for that heading upstream. Thanks for testing!

Comment 32 liucougar 2011-06-06 20:27:16 UTC
good to know. 

could you add a comment here with the link to the LKML page with the patch when it's sent to upstream? thanks

Comment 33 liucougar 2011-06-06 22:37:37 UTC
oh, Matthew, you were talking about this patch https://lkml.org/lkml/2011/6/6/314 to fix that bad_page, right?

Comment 34 Matthew Garrett 2011-06-06 23:58:12 UTC
That's the one, yes.

Comment 35 Chuck Ebbert 2011-06-25 05:04:23 UTC
There are still problems with EFI in 3.0-rc, and even if we eventually get the fixes into F15 there will not be an official respin of the install disks. Please test rawhide/F16.