Bug 593003 - intel VT-d bios option causes kernel chaos
Summary: intel VT-d bios option causes kernel chaos
Keywords:
Status: CLOSED DUPLICATE of bug 548198
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 13
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-17 15:16 UTC by Joshua Roys
Modified: 2010-06-15 20:24 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-06-15 20:24:16 UTC
Type: ---


Attachments (Terms of Use)
/var/log/messages (157.56 KB, text/plain)
2010-05-17 15:16 UTC, Joshua Roys
no flags Details
lspci (3.49 KB, text/plain)
2010-05-17 15:17 UTC, Joshua Roys
no flags Details
dmidecode (31.00 KB, text/plain)
2010-05-17 15:17 UTC, Joshua Roys
no flags Details
kernel patch for F-12 latest (2.6.32.14-127) (1.56 KB, patch)
2010-06-03 21:22 UTC, Don Zickus
no flags Details | Diff
kernel patch, cruft removed (1.25 KB, patch)
2010-06-03 21:24 UTC, Don Zickus
no flags Details | Diff

Description Joshua Roys 2010-05-17 15:16:57 UTC
Created attachment 414580 [details]
/var/log/messages

Description of problem:
If the Intel VT-d BIOS option is set, the system will fail to boot properly.  Numerous strange error messages are displayed:
DRHD: handling fault status reg 2
DMAR:[DMA Read] Request device [04:00.0] fault addr ffed9000
DMAR:[fault reason 06] PTE Read access is not set
and:
Uhhuh. NMI received for unknown reason b0 on CPU 0.
You have some hardware problem, likely on the PCI bus.
Dazed and confused, but trying to continue


Version-Release number of selected component (if applicable):
fedora 13 RC3
2.6.33.3-85.fc13.i686


How reproducible:
boot on this platform with VT-d enabled

  
Actual results:
system fails to boot up and hangs around udev


Expected results:
system boots

Comment 1 Joshua Roys 2010-05-17 15:17:30 UTC
Created attachment 414581 [details]
lspci

Comment 2 Joshua Roys 2010-05-17 15:17:48 UTC
Created attachment 414583 [details]
dmidecode

Comment 3 Anton Arapov 2010-05-18 07:29:48 UTC
duplicate of the bug 573173.

Comment 4 Don Zickus 2010-06-03 20:15:23 UTC
Hi,

I have a kernel patch that I can provide that will walk the pci tree to determine which device is causing the NMI you are seeing.  If you are comfortable with compiling a kernel let me know and I will attach the patch.  Unless you can generate the NMI problems after boot up (but it doesn't seem like you can boot up), then I have a kernel module to try instead.

Cheers,
Don

Comment 5 Joshua Roys 2010-06-03 20:37:40 UTC
Yes, I can do that.  If you could make the patch so it applies cleanly to a f12 src rpm, that would be helpful.

Thanks,

Josh

Comment 6 Don Zickus 2010-06-03 21:22:54 UTC
Created attachment 421037 [details]
kernel patch for F-12 latest (2.6.32.14-127)

Hi,

I have created a kernel patch to locate the device causing the NMI.

In order to use it please follow the instructions below.
(slightly modified from my instructions for the kernel module)

- download the attached patch
- download the kernel src rpm
- add the patch to the kernel.spec file
- rpm -ba <path to spec>/kernel.spec
- install / boot the new kernel
- try to generate the nmi

Once an nmi is generated, some info should have been generated in the kernel
logs (dmesg and /var/log/messages).  Ignore the WARN for now, it is 
misplaced.

Please run the following to gather data:

dmesg | grep RHNMI > /tmp/nmi.txt
echo "LSPCI OUTPUT" >> /tmp/nmi.txt
lspci >> /tmp/nmi.txt
lspci -t >> /tmp/nmi.txt

Then attach the /tmp/nmi.txt to this bugzilla so I can review the data.

It should have enough data to pinpoint the device that is causing the 
problem.  After that is determined, we can decide the next steps (most 
likely a firmware update if possible).

Please let me know if you have issues with the above steps.

Thanks,
Don

Comment 7 Don Zickus 2010-06-03 21:24:57 UTC
Created attachment 421039 [details]
kernel patch, cruft removed

had some extra cruft in there original patch.  sorry about that.

Comment 8 Joshua Roys 2010-06-15 20:24:16 UTC
After a number of reboots, I was unable to reproduce the "Uhhuh" error message, although the kernel would still panic almost immediately after bootup.  I was unable to get logs for this...  however I found rhbz 548198 and followed 548198#c11 and updated the P410i to firmware version 3.30.  Early results indicate this has fixed the immediate hang that was seen before.

Thanks!

*** This bug has been marked as a duplicate of bug 548198 ***


Note You need to log in before you can comment on or make changes to this bug.