Bug 748516

Summary: kernel does not boot with patch to fix invalid EFI remap calls from 2011-10-18
Product: [Fedora] Fedora Reporter: Jason Montleon <jmontleo>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 16CC: awilliam, gansalmon, itamar, jfeeney, jonathan, kernel-maint, madhu.chinakonda, mads, matt.fleming, robatino, satellitgo, shneige, sreenivasa-reddy.berahalli, the.ridikulus.rat
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AcceptedBlocker
Fixed In Version: kernel-2.6.41.1-1.fc15 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-11-02 14:19:36 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 713568    
Attachments:
Description Flags
fix the check for mapping nocache'd
none
don't use ioremap_cache() for the cached case
none
dmesg from working 3.1.0-5.fc16.x86_64 without x86-efi-Calling-__pa-with-an-ioremap-address-is-invalid.patch
none
memmap output from efi shell none

Description Jason Montleon 2011-10-24 12:36:43 EDT
Description of problem:
One of my systems no longer boots with new kernels starting with 3.1.0.0.rc10.git0.1.fc16. git1.1 also does not boot. All previous Fedora 16 Alpha/Beta kernels worked.

The system shows that it is loading the initial ramdisk is loading and then halts. (This is not due to a missing initrd line; I have been hit by this on multiple systems in the past and checked to make sure that grub2 config was correct and that the initramfs file existed before proceeding.)

Removing rhgb and quiet from the kernel line did not provide any additional output.

I have two other physical systems and a vm that boot just fine with the new kernel, but one does not.

Downloading the source, commenting out this patch, rebuilding, and reinstalling gets it working again.

Version-Release number of selected component (if applicable):
3.1.0.0.rc10.git0.1.fc16

How reproducible:
Always on the affected system

Steps to Reproduce:
1. Install 3.1.0.0.rc10.git0.1.fc16.x86_64
2. Reboot
  
Actual results:
System hangs

Expected results:
System boots.

Additional info:
From the affected system:
dmidecode | grep -i product
	Product Name: MacBookAir1,1
	Product Name: Mac-F42C8CC8

The system is booting using grub2-efi
Comment 1 Josh Boyer 2011-10-24 13:44:59 EDT
Could you install kernel-debug and see if it produces some more information?
Comment 2 Jason Montleon 2011-10-24 14:59:21 EDT
I installed kernel-debug, but it does not give me any more information. It stops at the same point in the same way, even with rhgb and quiet removed from the kernel line. If I add nomodeset it switches to a blank screen instead of pausing after the message 'Loading initial ramdisk ...', but it still does not give me any output.
Comment 3 neige 2011-10-29 07:00:13 EDT
same problem using 3.1.0-5.fc16.x86_64,
EFI boot from grub2-efi hang

using grub2 (BIOS mode) boots.

I've also dropped the x86-efi-Calling-__pa-with-an-ioremap-address-is-invalid.patch
and rebuilt the kernel, it works.

add 'set debug=all' to grub2-efi.cfg shows lines as
mmap/efi/mmap.c:65: EFI memory region 0xffed8000-0xfff00000: 11
and hang
excepted result is kernel loaded and plymouth splash started

Product Name: MacBookPro8,1
Comment 4 Jason Montleon 2011-10-29 12:33:31 EDT
OK, I get similar output now using 'set debug=all'; in my case it ends with:

mmap/efi/mmap.c:65 EFI memory region 0xfffa0000-0xfffd0000: 11 

then hangs.
Comment 5 Josh Boyer 2011-10-31 14:30:55 EDT
Thanks for the continued debug.  I've added Matt to CC, so hopefully we can get something going here.
Comment 6 Matt Fleming 2011-10-31 17:03:20 EDT
Created attachment 531039 [details]
fix the check for mapping nocache'd
Comment 7 Matt Fleming 2011-10-31 17:06:38 EDT
Could someone please revert x86-efi-Calling-__pa-with-an-ioremap-address-is-invalid.patch and apply the patch attached to this bug? I'm suspecting that these macbooks have an EFI memory descriptor for a I/O memory mapped region with EFI_MEMORY_WB set in its attribute.
Comment 8 Josh Boyer 2011-10-31 20:09:44 EDT
(In reply to comment #7)
> Could someone please revert
> x86-efi-Calling-__pa-with-an-ioremap-address-is-invalid.patch and apply the
> patch attached to this bug? I'm suspecting that these macbooks have an EFI
> memory descriptor for a I/O memory mapped region with EFI_MEMORY_WB set in its
> attribute.

I'll get this done tonight and post a link to some scratch builds.
Comment 9 Josh Boyer 2011-10-31 20:30:16 EDT
http://koji.fedoraproject.org/koji/taskinfo?taskID=3476930

The above scratch build contains the patch in comment #6, replacing x86-efi-Calling-__pa-with-an-ioremap-address-is-invalid.patch.  When it is finished building, please download and test and let us know how it works.
Comment 10 neige 2011-10-31 23:21:39 EDT
tested x86_64 rpm, still not work on my macbookpro, same problem as above.
Comment 11 Jason Montleon 2011-10-31 23:30:26 EDT
I also just tried and had the same result.
Comment 12 Matt Fleming 2011-11-01 09:13:18 EDT
Created attachment 531126 [details]
don't use ioremap_cache() for the cached case

Maybe we can't just ioremap_cache() any memory we want. Revert to the old init_memory_mapping() method.
Comment 13 Matt Fleming 2011-11-01 09:16:57 EDT
How about the patch in comment #12? It would be really useful if someone could attach the dmesg output from a known good kernel. Also, if anybody can get to an EFI shell the output of the 'memmap' command might help shed some light on this bug.
Comment 14 Jason Montleon 2011-11-01 09:36:34 EDT
Created attachment 531130 [details]
dmesg from working 3.1.0-5.fc16.x86_64 without x86-efi-Calling-__pa-with-an-ioremap-address-is-invalid.patch
Comment 15 Jason Montleon 2011-11-01 10:26:00 EDT
Created attachment 531140 [details]
memmap output from efi shell
Comment 16 Matt Fleming 2011-11-01 11:00:40 EDT
I think I see what's happening now. The key part of the changelog for x86-efi-Calling-__pa-with-an-ioremap-address-is-invalid.patch read,

      "The intention of init_memory_mapping() usage is to make EFI virtual
      address unchanged after kexec.  But in fact, init_memory_mapping()
      can not handle some memory range, so ioremap_xxx() is introduced as
      a fix.  Now we decide to use ioremap_xxx() anyway and use some other
      scheme for kexec support, so init_memory_mapping() here is
      unnecessary.  IMHO, init_memory_mapping() is not as good as
      ioremap_xxx() here."

but that's bogus - init_memory_mapping() is not unnecessary, it plays a vital role when an EFI_RUNTIME_SERVICES_DATA region appears after the last E820_RAM region, which is what is happening in this case.

Looking at the dmesg output these regions are where the problem occurs,

EFI: mem188: type=2, attr=0xf, range=[0x000000007eef1000-0x000000007eef9000) (0MB)
EFI: mem189: type=0, attr=0xf, range=[0x000000007eef9000-0x000000007eeff000) (0MB)
EFI: mem190: type=6, attr=0x800000000000000f, range=[0x000000007eeff000-0x000000007ef00000) (0MB)

The initial init_memory_mapping() only maps upto 0x7eef9000 (mem188 being the last E820_RAM region because EFI_RUNTIME_SERVICES_DATA regions are marked as E820_RESERVED),

[    0.000000] init_memory_mapping: 0000000000000000-000000007eef9000
[    0.000000]  0000000000 - 007ee00000 page 2M
[    0.000000]  007ee00000 - 007eef9000 page 4k

Without the patch applied, efi_ioremap() will call init_memory_mapping() for mem descriptor 190 above, and extend the direct kernel mapping table to include the EFI_RUNTIME_SERVICES_DATA region. With the patch applied, this no longer happens, instead we try to ioremap_cache() the region and I don't think mapping that region into vmalloc space will work properly.

The debug patch in comment #12 should result in a working machine. If it doesn't then we'll have to investigate further.
Comment 17 Josh Boyer 2011-11-01 11:17:55 EDT
Given the very short runway we have for F16, we've dropped x86-efi-Calling-__pa-with-an-ioremap-address-is-invalid.patch and started a new kernel build.  The intention is to get this into F16 final to avoid a 'can't install' regression.

That build can be found here:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3478121

However, I will create a scratch build with the patch from comment #12 in a bit and put the link here.  Once that is started, we'd appreciate if those having issues can give it a test as well.
Comment 18 Josh Boyer 2011-11-01 11:19:06 EDT
Marking this as a possible blocker bug for F16, as suggested by Matthew Garrett, given it would prevent people from installing
Comment 19 Adam Williamson 2011-11-01 11:27:47 EDT
We don't have time to put a new kernel in. Today is the go/no-go. If we do that, we slip. If someone had told anyone in the release loop about this a week ago, we'd have had a chance.

All reporters appear to be using grub2-efi - or claim to be - which is not supported. Fedora uses grub for EFI installs by default. We also don't support EFI on Macs as their EFI implementation is terrible (though in this particular bug the fault appears to be ours).

Can anyone please confirm or deny whether the issue also occurs if you use grub legacy rather than grub2, i.e., a stock install of F16?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 20 Matthew Garrett 2011-11-01 11:37:37 EDT
The issue is in the kernel and has nothing to do with using grub 2.
Comment 21 neige 2011-11-01 11:49:31 EDT
The patch in comment #12 works.
Comment 22 Josh Boyer 2011-11-01 11:51:44 EDT
(In reply to comment #21)
> The patch in comment #12 works.

Awesome.  Here's a scratch build with the patch from comment #12 for anyone else.

http://koji.fedoraproject.org/koji/taskinfo?taskID=3478290
Comment 23 Adam Williamson 2011-11-01 12:15:18 EDT
so for verification and testing if other people are hitting this specific bug, F16 RC3 netinst ISO is here:

http://dl.fedoraproject.org/pub/alt/stage/16.RC3/Fedora/x86_64/iso/Fedora-16-x86_64-netinst.iso

and can be booted EFI either by just writing it to a DVD/CD or writing it to USB with:

livecd-iso-to-disk --format --reset-mbr --efi Fedora-16-x86_64-netinst.iso /dev/foo

I'll build a boot.iso with the new kernel build and link it here for comparison.





-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 24 Dennis Gilmore 2011-11-01 12:45:47 EDT
id like to see this tested on the same hardware in both bios and efi modes.

if this is only triggered when using efi mode and worked around successfully using bios mode its not a release blocker. especially considering it seems to be on hardware where we do not support efi mode.
Comment 25 Matthew Garrett 2011-11-01 12:57:32 EDT
It'll obviously work in BIOS mode, but that's not an acceptable workaround. We have no idea how many machines will have this (entirely legitimate) memory layout.
Comment 26 neige 2011-11-01 13:22:31 EDT
BIOS modes works, but the Mac's BIOS mode lets the SATA controller in IDE mode, on the macbookpro with intel 6 Series chipset(happens using two SATA 3 ports) , there's another bug, cdrom disabled when booted, here's the dmesg:
[   10.790795] ata2.01: failed to resume link (SControl 0)
[   10.941871] ata2.00: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   10.941899] ata2.01: SATA link down (SStatus 0 SControl 0)
[   10.941923] ata2.01: link offline, clearing class 3 to NONE
[   10.950890] ata2.00: configured for UDMA/100
[   15.949287] ata2.00: qc timeout (cmd 0xa0)
[   15.949299] ata2.00: TEST_UNIT_READY failed (err_mask=0x4)
[   15.949310] ata2.00: limiting SATA link speed to 1.5 Gbps
[   15.949317] ata2.00: limiting speed to UDMA/100:PIO3
[   17.260786] ata2.01: failed to resume link (SControl 0)
[   17.411798] ata2.00: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[   17.411826] ata2.01: SATA link down (SStatus 0 SControl 0)
[   17.411850] ata2.01: link offline, clearing class 3 to NONE
[   17.420840] ata2.00: configured for UDMA/100
[   22.419237] ata2.00: qc timeout (cmd 0xa0)
[   22.419249] ata2.00: TEST_UNIT_READY failed (err_mask=0x4)
[   22.419255] ata2.00: disabled
[   22.419295] ata2.00: hard resetting link
[   22.723977] ata2.01: hard resetting link
[   23.730735] ata2.01: failed to resume link (SControl 0)
[   23.881715] ata2.00: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[   23.881743] ata2.01: SATA link down (SStatus 0 SControl 0)
[   23.881766] ata2.01: link offline, clearing class 3 to NONE
[   23.881781] ata2: EH complete

so EFI boot might be the only 'easy' way for installing fedora with DVD.

In fact, recent two kernels in f15 updates-testing also with the EFI patch and doesn't work on my machine ,but no one reported there are regressions. So currently, seems only mac be affected.
Comment 27 Josh Boyer 2011-11-01 13:37:21 EDT
(In reply to comment #26)
> In fact, recent two kernels in f15 updates-testing also with the EFI patch and
> doesn't work on my machine ,but no one reported there are regressions. So
> currently, seems only mac be affected.

We'll be dropping the patch in F15 as well.  I'm going to take the updates-testing kernel down and put a fixed one back up.
Comment 28 Jason Montleon 2011-11-01 13:54:45 EDT
http://dl.fedoraproject.org/pub/alt/stage/16.RC3/Fedora/x86_64/iso/Fedora-16-x86_64-netinst.iso fails in the same way at:
mmap/efi/mmap.c:65 EFI memory region 0xfffa0000-0xfffd0000: 11
Comment 29 Adam Williamson 2011-11-01 15:06:40 EDT
Discussed at 2011-11-01 emergency blocker review meeting. Accepted as a blocker per criterion "The installer must boot and run on systems using EFI other than Apple Macs" (we're fairly sure this must affect some systems other than Macs as well). RC4 will have the updated kernel, I'll also try to provide a boot.iso with the updated kernel for confirmation soon.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 30 Jason Montleon 2011-11-01 15:54:36 EDT
The scratch build containing the patch from comment 12 works for me as well.
Comment 31 Fedora Update System 2011-11-01 15:58:11 EDT
kernel-2.6.40.8-4.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.40.8-4.fc15
Comment 32 satellitgo 2011-11-01 18:09:39 EDT
http://dl.fedoraproject.org/pub/alt/stage/16.RC3/Spins/x86_64/Fedora-16-x86_64-Live-SoaS.iso
01-Nov-2011 03:20  442M  
still fails to boot on USB on MacBook Pro i7

4GB USB formatted in f16 diskutility Ms Dos /dev/sdb
                                     fat  /dev/sdb1
I tried /dev/sdb1  dos partition
then    /dev/sdb   gpt partition  (Same USB)
on same USB failed to boot past blue screen with cursor in left top corner both
times.
Comment 33 Adam Williamson 2011-11-01 18:11:46 EDT
This boot.iso should have the fix for this bug:

http://tflink.fedorapeople.org/iso/20111101_preRc4.x64.boot.iso

can people affected please grab it and test? Remember, to write to USB, use 'livecd-iso-to-disk --format --reset-mbr --efi' . thanks!



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 34 satellitgo 2011-11-01 18:41:28 EDT
http://tflink.fedorapeople.org/iso/20111101_preRc4.x64.boot.iso

boots EFI USB to MacBook Pro i7 fine starts anaconda 16.24 I did not test farther.
Comment 35 Jason Montleon 2011-11-01 18:43:53 EDT
The kernel from http://tflink.fedorapeople.org/iso/20111101_preRc4.x64.boot.iso works for me.
Comment 36 Fedora Update System 2011-11-01 18:53:21 EDT
kernel-3.1.0-7.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.1.0-7.fc16
Comment 38 Fedora Update System 2011-11-02 02:54:38 EDT
Package kernel-2.6.40.8-4.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-2.6.40.8-4.fc15'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2011-15230
then log in and leave karma (feedback).
Comment 39 Fedora Update System 2011-11-02 14:19:36 EDT
kernel-3.1.0-7.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 40 Fedora Update System 2011-11-11 19:07:06 EST
kernel-2.6.41.1-1.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.41.1-1.fc15
Comment 41 Fedora Update System 2011-11-17 18:27:50 EST
kernel-2.6.41.1-1.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.