Bug 698688

Summary: system freeze during boot with IOMMMU enabled
Product: [Fedora] Fedora Reporter: Robin Axelsson <gu99roax>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 15CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-30 10:48:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Robin Axelsson 2011-04-21 14:31:27 UTC
Description of problem:
Since I updated the firmware of my BIOS to a revision that supports AMD IOMMU and turned that feature on my Fedora 14 install freeze during bootup. All I see about half a second after I pass the GRUB bootloader is a blinking cursor in the upper left corner. The mouse gets disabled (i.e. powered off in spite of being plugged in) and the keyboard doesn't even respond to caps-lock, num-lock and scroll-lock toggles.

This also happens to any (x86_64) Linux based LiveCD I have tested so far: lubuntu 10.4, ubuntu 10.10, Fedora 14, Fedora 15 beta, FreeBSD 8 (is not Linux but...) and latest Debian Live.

I used the liveUSB in most of these attempts and it has a LED light that just shuts off in the middle of the boot process when the freeze happens.

I don't have this problem with Windows 7 64 bit, XenServer (Install CD with Linux tools as well as installed partition) and FreeDOS/MS-DOS based boot CDs.

Version-Release number of selected component (if applicable):
2.6.32.X and 2.6.35.X ...

How reproducible:
On my system; very. The failure rate is 100%.

Steps to Reproduce:
1. Use a computer with MSI 890FXA-GD70 and BIOS version 1.9 with IOMMU enabled in the Advanced BIOS feature menu, XFX ATI5450 GPU, Intel Server NIC (Copper), a DVD drive and two large harddrives.
2. Insert a Linux liveCD or liveUSB
3. Boot
  
Actual results:
Immediate crash with no info about it and all peripherals except the display seemingly severed.

Expected results:
Successful boot.

Additional info:
I also get this problem with the Fedora 15 beta i686 liveCD/USB, but in this case the screen displays error messages:

[    0.242337]  [<c07cdf2c>] ? panic+0x5x/0x156
[    0.242467]  [<c043d898>] ? do_exit+0x66/0x61c
[    0.242597]  [<c07d57d3>] ? _raw_spin_unlock_irqrestore+0x13/0x15
[    0.242730]  [<c043beda>] ? kmsg_dump+0x3a/0xb8
[    0.242863]  [<c07d6c8b>] ? oops_end+0xa2/0xa8
[    0.242976]  [<c07cd9ec>] ? no_context+0x128/0x130
[    0.243106]  [<c07cdb0e>] ? __bad_area_nosemaphore+0x11a/0x122
[    0.243239]  [<c07cdb2d>] ? bad_area_nosemaphore+0x17/0x19
[    0.243370]  [<c07d8624>] ? do_page_fault+0x159/0x30c
[    0.243500]  [<c07d57d3>] ? _raw_spin_unlock_irqrestore+0x13/0x15
[    0.243634]  [<c0404ed7>] ? do_softirq+0x8c/0x92
[    0.243766]  [<c043fe77>] ? irq_exit+0x4c/0x70
[    0.243896]  [<c0404bff>] ? do_IRQ+0x7e/0x92
[    0.243975]  [<c07d84cb>] ? do_page_fault+0x0/0x30c
[    0.244108]  [<c07d62bf>] ? error_code+0x67/0x6c
[    0.244241]  [<c0a8db0b>] ? pci_pcibios_init+0xe5/0x234
[    0.244373]  [<c0a8e787>] ? __pci_mmcfg_init+0x1b1/0x1ea
[    0.244506]  [<c0a8d9f1>] ? pci_arch_init+0x2e/0x63
[    0.244638]  [<c0a8cb13>] ? dmi_id_init+0x266/0x28e
[    0.244771]  [<c0401194>] ? do_one_initcall+0x8c/0x140
[    0.244975]  [<c0a8d9c3>] ? pci_arch_init+0x0/0x63
[    0.245106]  [<c0a57a08>] ? kernel_init+0x1ec/0x278
[    0.245238]  [<c0a5781c>] ? kernel_init+0x0/0x278
[    0.245369]  [<c040377e>] ? kernel_thread_helper+0x6/0x10

These are the last error message lines shown on the screen before the crash. The mouse and the USB stick is not "powered off" as in the x86_64 cases but t he keyboard is irresponsive as in the prior cases.

Comment 1 Chuck Ebbert 2011-04-27 11:15:31 UTC
(In reply to comment #1)
> Additional info:
> I also get this problem with the Fedora 15 beta i686 liveCD/USB, but in this
> case the screen displays error messages:
> 

Boot with kernel option "vga=1" for 50-line mode and get the entire report.

Comment 2 Robin Axelsson 2011-04-27 16:05:27 UTC
I don't think 50 lines is enough to fit the entire report (btw isn't there an even higher resolution avaliable or a mode with a smaller font?) but here is a longer message with that mode enabled:

[    0.24????]  5a111002 00000001 5f32335f 00000001 00000040 c0a53740 f34a9f94 c0a8d9f1
[    0.24????]  c0a8cb13 c0ad6ab0 f34a9fc4 c0401194 00000000 82bb2757 000008ff 00000040
[    0.24????] Call Trace:
[    0.24????]  [<c0??????>] ? pci_pcbios_init+0xe5/0x234
[    0.24????]  [<c0??????>] ? __pci_mmcfg_init+0x1b1/0xea
[    0.24????]  [<c0??????>] pci_arch_init+0x2e/0x63
[    0.24????]  [<c0??????>] ? dmi_id_init+0x266/0x28e
[    0.24????]  [<c0??????>] do_one_initcall+0x8c/0x140
[    0.24????]  [<c0??????>] ? pci_arch_init+0x0/0x63
[    0.24????]  [<c0??????>] kernel_init+0x1ec/0x278
[    0.24????]  [<c0??????>] ? kernel_init+0x0/0x278
[    0.24????]  [<c0??????>] kernel_thread_helper+0x6/0x10
[    0.24????] Code: f8 ff cb 7c 27 ff ff ff ff ff ff ea 53 ff 00 f0 ff ff ff ff ff ff ff ff ff ff ff 5f 33 32 5f 10 00 0f 00 00 01 bd 00 00 00 00 00
[    0.24????] 00 50 43 49 b0 80 75 17 b0 01 0a db 75 11 bb 00 00 0f 00 b9
[    0.24????]  [<c0??????>] EIP [<c00f0010>] 0xc00f0010 SS:ESP 0068:f34a9f4c
[    0.24????]  [<c0??????>] CR2: 000000049435024
[    0.24????]  [<c0??????>] ---[ end trace 44593438a59a9533 ]---
[    0.24????]  [<c0??????>] Kernel panic - not syncing: Attempted to kill init+
[    0.24????]  [<c0??????>] Pid: 1, comm: swapper Tainted: G      D     2.6.38.2-9.fc15.i686#1
[    0.24????]  Call Trace:
[    0.24????]  [<c0??????>] ? panic+0x5x/0x156
[    0.24????]  [<c0??????>] ? do_exit+0x66/0x61c
[    0.24????]  [<c0??????>] ? _raw_spin_unlock_irqrestore+0x13/0x15
[    0.24????]  [<c0??????>] ? kmsg_dump+0x3a/0xb8
[    0.24????]  [<c0??????>] ? oops_end+0xa2/0xa8
[    0.24????]  [<c0??????>] ? no_context+0x128/0x130
[    0.24????]  [<c0??????>] ? __bad_area_nosemaphore+0x11a/0x122
[    0.24????]  [<c0??????>] ? bad_area_nosemaphore+0x17/0x19
[    0.24????]  [<c0??????>] ? do_page_fault+0x159/0x30c
[    0.24????]  [<c0??????>] ? _raw_spin_unlock_irqrestore+0x13/0x15
[    0.24????]  [<c0??????>] ? do_softirq+0x8c/0x92
[    0.24????]  [<c0??????>] ? irq_exit+0x4c/0x70
[    0.24????]  [<c0??????>] ? do_IRQ+0x7e/0x92
[    0.24????]  [<c0??????>] ? do_page_fault+0x0/0x30c
[    0.24????]  [<c0??????>] ? error_code+0x67/0x6c
[    0.24????]  [<c0??????>] ? pci_pcibios_init+0xe5/0x234
[    0.24????]  [<c0??????>] ? __pci_mmcfg_init+0x1b1/0x1ea
[    0.24????]  [<c0??????>] ? pci_arch_init+0x2e/0x63
[    0.24????]  [<c0??????>] ? dmi_id_init+0x266/0x28e
[    0.24????]  [<c0??????>] ? do_one_initcall+0x8c/0x140
[    0.24????]  [<c0??????>] ? pci_arch_init+0x0/0x63
[    0.24????]  [<c0??????>] ? kernel_init+0x1ec/0x278
[    0.24????]  [<c0??????>] ? kernel_init+0x0/0x278
[    0.24????]  [<c0??????>] ? kernel_thread_helper+0x6/0x10

I have omitted the numbers concealed in the ?????. I can supply these numbers if needed. These numbers are different at each boot time but at least the last 23 messages are the same.

Comment 3 Josh Boyer 2011-08-29 15:04:10 UTC
Does this still happen with F15 final, or the F16 livecds?

Comment 4 Robin Axelsson 2011-08-30 10:40:05 UTC
These problems disappeared altogether after I RMAed the motherboard and picked up another one. The motherboard I had problems with is the MSI 890FXA-GD70. I switched to the Gigabyte GA-990FXA-UD7 and it now works fine. Since I no longer have the old motherboard in possession I'm unable to reproduce these bugs.

Comment 5 Josh Boyer 2011-08-30 10:48:41 UTC
Thanks for letting us know.

Comment 6 Robin Axelsson 2011-08-30 11:43:04 UTC
You're welcome. The reason why I RMA:ed the motherboard was not related to the IOMMU feature. The onboard audio chip was defect. Not that I cared about the audio but at least I had a valid reason to use to get rid of it.