Bug 671593

Summary: Boot panic on new i7-2720QM HP DV7 model XN093AV
Product: [Fedora] Fedora Reporter: Randy <schusr>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 14CC: gansalmon, itamar, jfeeney, jonathan, kernel-maint, madhu.chinakonda, schusr
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-16 21:27:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
screen shot of kernel panic
none
iommu=soft
none
acpi=ht irqpool panic screen shot
none
acpi=off system boots but WIFI adapter does not work none

Description Randy 2011-01-21 23:57:56 UTC
Created attachment 474710 [details]
screen shot of kernel panic

Install media and updated Fedora 14 both kernel panic with no additional kernel parameters 

See attached screen shot 

If noapic is used system boots

All other apic options seem to fail.

Comment 1 Randy 2011-01-22 03:25:05 UTC
Similar problem with the latest rawhide desktop image kernel 2.6.37-2.fc15.x86_64

Unfortunately noapic causes WIFI adapter to not work so probably have to use windows on this box :( might actually have to return it.

Comment 2 Randy 2011-01-22 03:27:14 UTC
noapic also causes the 8 core system to appear to have 4 since HT is not operational, not really workable

Comment 3 Chuck Ebbert 2011-01-22 17:40:47 UTC
Try adding "iommu=soft" to the boot options.

Comment 4 Chuck Ebbert 2011-01-22 19:15:32 UTC
The panic message is:

  Kernel panic - not syncing: No mapping iommu for ioapic 0

From arch/x86/kernel/apic/io-apic.c:setup_ioapic_entry()

Comment 5 Randy 2011-01-23 10:30:00 UTC
Created attachment 474803 [details]
iommu=soft

iommu=soft kernel panic screen shot

Comment 6 Randy 2011-01-23 10:41:36 UTC
Created attachment 474804 [details]
acpi=ht irqpool panic screen shot

acpi=ht irqpool panic screen shot

Comment 7 Randy 2011-01-23 10:52:25 UTC
Created attachment 474807 [details]
acpi=off system boots but WIFI adapter does not work

This maybe should be another bug, not sure if message relates to WIFI failure or if it's because of acpi=off, same problem with irqpool

Comment 8 Randy 2011-01-23 11:08:39 UTC
noapic iommu=soft fixes all of the problems in this bug report. 

system boots, WIFI works, HT works

There is a new problem but it's pretty minor.  After a reboot the HP BIOS complains saying it thinks the system restarted from a temperature problem.  Let me know if you want a SS of that.

Comment 9 Chuck Ebbert 2011-01-24 03:52:51 UTC
This is still a bug. Workarounds should not be necessary.

Comment 10 Randy 2011-01-24 19:37:24 UTC
Okay, everything works now except one of the USB controllers which I think the fingerprint reader is connected to.  I was going to create another bug report but then realised that the non-working USB hub maybe related to using noapic.  Let me know if I should create another bug report for that, put details about it here, or ignore it for now.

Comment 11 Randy 2011-01-25 01:44:11 UTC
Been wrestling with the system shutting itself down claiming to have overheated.

After changing BIOS "Fan Always On" to Disabled the system worked fine all day even under load only to spontaneously boot later while doing almost nothing :(

Not sure if it's a problem with this make/model, this individual system, or the operating conditions (meaning kernel options). 

If it continues, I'll have to return system.

Comment 12 Chuck Ebbert 2011-01-25 03:00:47 UTC
You should be able to turn off the iommu in the BIOS setup screens. (I assume you checked for an updated BIOS, as this does look like a bug in there.) Look for something like "advanced virtualization" or "VT-d" in the options. Or you can just disable virtualization completely if you're not using that.

Comment 13 Randy 2011-01-25 14:51:31 UTC
Ah, I do use virtualization allot and did have to enable that in the BIOS. 

Did check for updated BIOS but this is brand new system, meaning model also so the BIOS is up to date. 

The system worked fine over night with 4 guest OSes, email, im, automated backup jobs using 7zip taking hours...

Comment 14 Randy 2011-01-25 17:36:41 UTC
Also sensors-detect didn't find anything.

Comment 15 Randy 2011-01-25 23:54:38 UTC
Deleted columns CPU1-CPU7

[rschuster@ajax ~]$ cat /proc/interrupts
           CPU0
  0:        356    XT-PIC-XT        timer
  1:      58766    XT-PIC-XT        i8042
  2:          0    XT-PIC-XT        cascade
  8:          1    XT-PIC-XT        rtc0
  9:     323654    XT-PIC-XT        acpi
 10:          0    XT-PIC-XT        ehci_hcd:usb2
 11:    1019180    XT-PIC-XT        ehci_hcd:usb1
 12:    4228679    XT-PIC-XT        i8042
 43:    2373437    PCI-MSI-edge      ahci
 44:          0    PCI-MSI-edge      eth0
 45:    1803419    PCI-MSI-edge      iwlagn
 46:      35932    PCI-MSI-edge      hda_intel
 47:         68    PCI-MSI-edge      hda_intel
 48:      99255    PCI-MSI-edge      fglrx[0]@PCI:1:0:0
NMI:          0    Non-maskable interrupts
LOC:   81201965    Local timer interrupts
SPU:          0    Spurious interrupts
PMI:          0    Performance monitoring interrupts
PND:          0    Performance pending work
RES:    3173230    Rescheduling interrupts
CAL:    2316910    Function call interrupts
TLB:     144772    TLB shootdowns
TRM:          0    Thermal event interrupts
THR:          0    Threshold APIC interrupts
MCE:          0    Machine check exceptions
MCP:        234    Machine check polls
ERR:          0
MIS:          0

Comment 16 Randy 2011-01-25 23:55:48 UTC
dmesg output 


[    8.195910] ehci_hcd 0000:00:1d.0: Unlink after no-IRQ?  Controller is probably using the wrong IRQ.
[   18.718560] usb 2-1: device not accepting address 2, error -110
[   18.820464] usb 2-1: new high speed USB device using ehci_hcd and address 3
[   34.337736] usb 2-1: device not accepting address 3, error -110
[   34.439639] usb 2-1: new high speed USB device using ehci_hcd and address 4
[   44.850408] usb 2-1: device not accepting address 4, error -110
[   44.952317] usb 2-1: new high speed USB device using ehci_hcd and address 5
[   55.363091] usb 2-1: device not accepting address 5, error -110
[   55.363120] hub 2-0:1.0: unable to enumerate USB device on port 1

Comment 17 Randy 2011-01-30 08:58:24 UTC
Also system has these errors from dmesg, maybe related to system spontaneously rebooting when idle later claiming it had to reboot because of overheating. 

[  131.561077] ACPI Error: Field [D128] at 1040 exceeds Buffer [NULL] size 160 (bits) (20100428/dsopcode-597)
[  131.561087] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.WMID.HWMC] (Node ffff880246461680), AE_AML_BUFFER_LIMIT
[  131.561139] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.WMID.WMAD] (Node ffff880246461920), AE_AML_BUFFER_LIMIT
[  131.561228] ACPI Error: Field [D128] at 1040 exceeds Buffer [NULL] size 160 (bits) (20100428/dsopcode-597)
[  131.561234] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.WMID.HWMC] (Node ffff880246461680), AE_AML_BUFFER_LIMIT
[  131.561265] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.WMID.WMAD] (Node ffff880246461920), AE_AML_BUFFER_LIMIT
[  131.561538] ACPI Error: Field [D128] at 1040 exceeds Buffer [NULL] size 160 (bits) (20100428/dsopcode-597)
[  131.561547] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.WMID.HWMC] (Node ffff880246461680), AE_AML_BUFFER_LIMIT
[  131.561597] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.WMID.WMAD] (Node ffff880246461920), AE_AML_BUFFER_LIMIT
[  131.561960] ACPI Error: Field [D128] at 1040 exceeds Buffer [NULL] size 160 (bits) (20100428/dsopcode-597)
[  131.561967] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.WMID.HWMC] (Node ffff880246461680), AE_AML_BUFFER_LIMIT
[  131.562006] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.WMID.WMAD] (Node ffff880246461920), AE_AML_BUFFER_LIMIT
[  131.562093] ACPI Error: Field [D128] at 1040 exceeds Buffer [NULL] size 160 (bits) (20100428/dsopcode-597)
[  131.562098] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.WMID.HWMC] (Node ffff880246461680), AE_AML_BUFFER_LIMIT
[  131.562130] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.WMID.WMAD] (Node ffff880246461920), AE_AML_BUFFER_LIMIT
[  131.562267] ACPI Error: Field [D128] at 1040 exceeds Buffer [NULL] size 160 (bits) (20100428/dsopcode-597)
[  131.562275] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.WMID.HWMC] (Node ffff880246461680), AE_AML_BUFFER_LIMIT
[  131.562315] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.WMID.WMAD] (Node ffff880246461920), AE_AML_BUFFER_LIMIT
[  131.573327] lis3lv02d: probe of HPQ0004:00 failed with error -22

Comment 18 Chuck Ebbert 2011-01-31 03:58:10 UTC
> [  131.561077] ACPI Error: Field [D128] at 1040 exceeds Buffer [NULL] size 160
> (bits) (20100428/dsopcode-597)
> [  131.561087] ACPI Error (psparse-0537): Method parse/execution failed
> [\_SB_.WMID.HWMC] (Node ffff880246461680), AE_AML_BUFFER_LIMIT
> [  131.561139] ACPI Error (psparse-0537): Method parse/execution failed
> [\_SB_.WMID.WMAD] (Node ffff880246461920), AE_AML_BUFFER_LIMIT

Interesting, though I have no idea what the consequences of that are.

Comment 19 Fedora End Of Life 2012-08-16 21:27:53 UTC
This message is a notice that Fedora 14 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 14. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '14' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 14 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping