Created attachment 414580 [details]
Description of problem:
If the Intel VT-d BIOS option is set, the system will fail to boot properly. Numerous strange error messages are displayed:
DRHD: handling fault status reg 2
DMAR:[DMA Read] Request device [04:00.0] fault addr ffed9000
DMAR:[fault reason 06] PTE Read access is not set
Uhhuh. NMI received for unknown reason b0 on CPU 0.
You have some hardware problem, likely on the PCI bus.
Dazed and confused, but trying to continue
Version-Release number of selected component (if applicable):
fedora 13 RC3
boot on this platform with VT-d enabled
system fails to boot up and hangs around udev
Created attachment 414581 [details]
Created attachment 414583 [details]
duplicate of the bug 573173.
I have a kernel patch that I can provide that will walk the pci tree to determine which device is causing the NMI you are seeing. If you are comfortable with compiling a kernel let me know and I will attach the patch. Unless you can generate the NMI problems after boot up (but it doesn't seem like you can boot up), then I have a kernel module to try instead.
Yes, I can do that. If you could make the patch so it applies cleanly to a f12 src rpm, that would be helpful.
Created attachment 421037 [details]
kernel patch for F-12 latest (220.127.116.11-127)
I have created a kernel patch to locate the device causing the NMI.
In order to use it please follow the instructions below.
(slightly modified from my instructions for the kernel module)
- download the attached patch
- download the kernel src rpm
- add the patch to the kernel.spec file
- rpm -ba <path to spec>/kernel.spec
- install / boot the new kernel
- try to generate the nmi
Once an nmi is generated, some info should have been generated in the kernel
logs (dmesg and /var/log/messages). Ignore the WARN for now, it is
Please run the following to gather data:
dmesg | grep RHNMI > /tmp/nmi.txt
echo "LSPCI OUTPUT" >> /tmp/nmi.txt
lspci >> /tmp/nmi.txt
lspci -t >> /tmp/nmi.txt
Then attach the /tmp/nmi.txt to this bugzilla so I can review the data.
It should have enough data to pinpoint the device that is causing the
problem. After that is determined, we can decide the next steps (most
likely a firmware update if possible).
Please let me know if you have issues with the above steps.
Created attachment 421039 [details]
kernel patch, cruft removed
had some extra cruft in there original patch. sorry about that.
After a number of reboots, I was unable to reproduce the "Uhhuh" error message, although the kernel would still panic almost immediately after bootup. I was unable to get logs for this... however I found rhbz 548198 and followed 548198#c11 and updated the P410i to firmware version 3.30. Early results indicate this has fixed the immediate hang that was seen before.
*** This bug has been marked as a duplicate of bug 548198 ***