Description of problem: After upgrading to kernel-2.6.32.9-67 or kernel-2.6.32.9-70 X can not start. Version-Release number of selected component (if applicable): Hardware: HP Pavilion dv5. More info is available at Smolts web page: http://www.smolts.org/client/show/pub_185f33b5-e602-4fae-8714-fbc22a26e63f Kernels: kernel-2.6.32.9-67.fc12.i686 or kernel-2.6.32.9-70.fc12.i686 akmod-nvidia-195.36.08-1.fc12.i686 xorg-x11-drv-nvidia-libs-195.36.08-1.fc12.i686 xorg-x11-drv-nvidia-195.36.08-1.fc12.i686 xorg-x11-server-Xorg-1.7.5.901-1.fc12.i686 How reproducible: Always Steps to Reproduce: 1. Install 2.6.32 kernel from Fedora update 2. Reboot 3. Wait when X will try to start Actual results: Black screen, non-responsive keyboard. Expected results: kdm login screen Additional info: /var/log/messages is flooded with DRHD: handling fault status reg 2 DMAR:[DMA Read] Request device [01:00.0] fault addr 337b4000 DMAR:[fault reason 01] Present bit in root entry is clear NVRM: Xid (0001:00): 54, CMDre 00000000 00000000 00000000 00000001 00000001 Also a strange string appears right at the beginning of kernel boot: ehci_hcd 0000:00:1d.7: dma_pool_free ehci_qh, c112c060/fffff060 (bad dma) That was not happen before with 2.6.31 kernels. Kernel command line: rhgb quiet SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us nouveau.modeset=0 rdblacklist=nouveau I tried to add iommu=soft to this line for kernel 2.6.32 but that does not help.
Probably only large displays (e.g. larger than 1280x1024) are affected.
Well, mine is 1680x1050...
Try intel_iommu=off.
I have identical problem with kernels 2.6.32.9-67.fc12.x86_64 and 2.6.32.9-70.fc12.x86_64. "intel_iommu=off" does not help. As soon as X starts screen goes black and wireless stops working and /ver/log/messages fills with Mar 7 20:58:22 lcn kernel: DMAR:[DMA Read] Request device [01:00.0] fault addr 119db6000 Mar 7 20:58:22 lcn kernel: DMAR:[fault reason 01] Present bit in root entry is clear Mar 7 20:58:22 lcn kernel: DRHD: handling fault status reg 2 Mar 7 20:58:22 lcn kernel: DMAR:[DMA Read] Request device [01:00.0] fault addr 119db6000 Mar 7 20:58:22 lcn kernel: DMAR:[fault reason 01] Present bit in root entry is clear Mar 7 20:58:22 lcn kernel: DRHD: handling fault status reg 2 Continues for about 6 minutes before total crash. Here is the start of the same crash when I was next in the same location as my computer: Mar 12 23:02:24 lcn abrtd: Non-processed crash in /var/cache/abrt/kerneloops-1268013723-3, saving into database Mar 12 23:02:24 lcn abrtd: RunApp('/var/cache/abrt/kerneloops-1268013723-3','test x"`cat component`" = x"xorg-x11-server-Xorg" && cp /var/lo g/Xorg.0.log .') Mar 12 23:02:24 lcn abrtd: Getting local universal unique identification Mar 12 23:02:25 lcn smbd[2240]: [2010/03/12 23:02:25, 0] smbd/server.c:457(smbd_open_one_socket) Mar 12 23:02:25 lcn smbd[2240]: smbd_open_once_socket: open_socket_in: Address already in use Mar 12 23:02:25 lcn smbd[2240]: [2010/03/12 23:02:25, 0] smbd/server.c:457(smbd_open_one_socket) Mar 12 23:02:25 lcn smbd[2240]: smbd_open_once_socket: open_socket_in: Address already in use Mar 12 23:02:25 lcn abrtd: Crash is in database already (dup of /var/cache/abrt/kerneloops-1268013723-4) Mar 12 23:02:25 lcn abrtd: Done checking for unsaved crashes Mar 12 23:02:25 lcn abrtd: Init complete, entering main loop Mar 12 23:02:29 lcn kernel: CE: hpet increasing min_delta_ns to 15000 nsec Mar 12 23:02:29 lcn kernel: CE: hpet increasing min_delta_ns to 22500 nsec Mar 12 23:02:31 lcn kernel: DRHD: handling fault status reg 3 Mar 12 23:02:31 lcn kernel: DMAR:[DMA Read] Request device [01:00.0] fault addr 110199000 Mar 12 23:02:31 lcn kernel: DMAR:[fault reason 01] Present bit in root entry is clear Mar 12 23:02:32 lcn kernel: CE: hpet increasing min_delta_ns to 33750 nsec Mar 12 23:02:36 lcn kernel: DRHD: handling fault status reg 2
(In reply to comment #3) > Try intel_iommu=off. OK, in my notebook this options works. Thank you. Could somebody comment anything regarding kernel change from 2.6.31 to 2.6.32? Is there anything to change in BIOS or somewhere else to fix it completely? Is it problem with CPU or other hardware issue?
The intel_iommu=off works here too. My observation that only displays larger than 1280x1024 are affected proved wrong. I did some tests on some more machines today. So it looks like the support for io virtualization is the difference why it does not work for some people. So the interesting question is - is this a bug in the kernel or X11 or even both?
It's likely a bug in the nvidia kernel driver, if you're using it... It's responsible for setting up the DMA mappings, and it appears to be using them incorrectly resulting in the IOMMU catching an illegal access. (It's like your hardware looking up a null pointer.) I'm going to mark this NOTABUG, and we can release note turning off the IOMMU if you want to install the nvidia driver.
(In reply to comment #7) > It's likely a bug in the nvidia kernel driver, if you're using it... It's > responsible for setting up the DMA mappings, and it appears to be using them > incorrectly resulting in the IOMMU catching an illegal access. (It's like your > hardware looking up a null pointer.) > > I'm going to mark this NOTABUG, and we can release note turning off the IOMMU > if you want to install the nvidia driver. Yes, I use nvidia driver because I need VDPAU and 3D. But, please, wait. There is also a strange string that appears it the moment when the kernel starts to boot: ehci_hcd 0000:00:1d.7: dma_pool_free ehci_qh, c112c060/fffff060 (bad dma) It is not from nvidia driver. This string appears neither with 2.6.31 kernels nor with 2.6.32 with option intel_iommu=off. Also, if it is a nvidia driver bug, then why doesn't it show up with earlier kernels?
It's a new feature in recent kernels. Nvidia needs to update their drivers to correctly do DMA.
NVIDIA believes it unlikely that the problem is the result of a bug in the NVIDIA kernel module. Both GT200GL and G71GL, the GPUs on which the Quadro FX4800 and Quadro FX3500 are based on, and which we understand the problem has been reproduced with, are capable of addressing any page allocated on their behalf on PC hardware. However, even if neither GPU was capable of addressing a given page, the NVIDIA driver would only attempt to remap it if the kernel it was built against did not define the GFP_DMA32 zone. Else the IOMMU support code is not built into the NVIDIA kernel module. Note, also, that this only applies to Linux/x86 kernels: the NVIDIA driver only allocates from low memory, i.e. never specifies the __GFP_HIGHMEM flag, and is therefore never built with IOMMU support on x86.
Dear Garrison Wu, thank you for your comment. I think that the bug must be reopened. Just a note for Kyle McMartin: you have not answered to my question about this suspicious line which appears right after grub: ehci_hcd 0000:00:1d.7: dma_pool_free ehci_qh, c112c060/fffff060 (bad dma) What does it mean?
Serguei, no problem and just to avoid confusion, my comment above should have read Linux/x86-64
@garrison: Thank you for your comments Garrison, but you really do need to include iommu support in the nvidia kernel module in x86_64, even for addresses < 4GB. For instance I'm running vmware vmplayer on my laptop to run my windows xp image, and the nvidia driver to run my graphics. vmware (and xen, and kvm) all use the Intel VT-d that is in my laptop (ICH9 based) which is an IOMMU. The IOMMU means my VM can't mess up my host, and that the host can't mess up the VM. I assume by iommu support you mean the pci_*map* calls or dma_*map* calls right? @serguei: the ehci_hcd message is from the usb driver, and means there is probably a bug in that driver. Right now with 2.6.32+ the only way I can get the nvidia driver loaded is with kernel boot option iommu=soft, or intel_iommu=off Otherwise I just get a black screen as my log fills with errors from the nvidia driver.
Same here with Kernel 2.6.32.11-99.fc12.i686.PAE Nvidia module kmod-nvidia-173xx-2.6.32.11-99.fc12.i686.PAE-173.14.25-1.fc12.2.i686 Adding intel_iommu=off as argument in /etc/grub.conf appears to fix this - thankyou for the suggestion. Before this was seeing a log (/var/log/messages) full of (repeating constantly): Apr 17 08:07:13 hsem kernel: DRHD: handling fault status reg 2 Apr 17 08:07:13 hsem kernel: DMAR:[DMA Read] Request device [02:00.0] fault addr 32006000 Apr 17 08:07:13 hsem kernel: DMAR:[fault reason 01] Present bit in root entry is clear Apr 17 08:07:13 hsem kernel: DRHD: handling fault status reg 102 Apr 17 08:07:13 hsem kernel: DMAR:[DMA Read] Request device [02:00.0] fault addr 32006000 Apr 17 08:07:13 hsem kernel: DMAR:[fault reason 01] Present bit in root entry is clear Apr 17 08:07:13 hsem kernel: DRHD: handling fault status reg 202 Apr 17 08:07:13 hsem kernel: DMAR:[DMA Read] Request device [02:00.0] fault addr 32006000 Apr 17 08:07:13 hsem kernel: DMAR:[fault reason 01] Present bit in root entry is clear Apr 17 08:07:13 hsem kernel: DRHD: handling fault status reg 302 Apr 17 08:07:13 hsem kernel: DMAR:[DMA Read] Request device [02:00.0] fault addr 32006000 Apr 17 08:07:13 hsem kernel: DMAR:[fault reason 01] Present bit in root entry is clear Apr 17 08:07:13 hsem kernel: DRHD: handling fault status reg 402 Apr 17 08:07:13 hsem kernel: DMAR:[DMA Read] Request device [02:00.0] fault addr 32006000 Apr 17 08:07:13 hsem kernel: DMAR:[fault reason 01] Present bit in root entry is clear Apr 17 08:07:13 hsem kernel: DRHD: handling fault status reg 502 Apr 17 08:07:13 hsem kernel: DMAR:[DMA Read] Request device [02:00.0] fault addr 32006000 Apr 17 08:07:13 hsem kernel: DMAR:[fault reason 01] Present bit in root entry is clear Apr 17 08:07:13 hsem kernel: DRHD: handling fault status reg 602 Apr 17 08:07:13 hsem kernel: DMAR:[DMA Read] Request device [02:00.0] fault addr 32006000 Apr 17 08:07:13 hsem kernel: DMAR:[fault reason 01] Present bit in root entry is clear Apr 17 08:07:13 hsem kernel: DRHD: handling fault status reg 702 Apr 17 08:07:13 hsem kernel: DMAR:[DMA Read] Request device [02:00.0] fault addr 32006000 Apr 17 08:07:13 hsem kernel: DMAR:[fault reason 01] Present bit in root entry is clear
I get these same messages using the nouveau driver, using 01:00.0 VGA compatible controller: nVidia Corporation Quadro FX 770M (rev a1) these messages are constantly repeating (very similar to what you all have): Apr 21 10:31:06 localhost kernel: DRHD: handling fault status reg 3 Apr 21 10:31:06 localhost kernel: DMAR:[DMA Read] Request device [01:00.0] fault addr 0 Apr 21 10:31:06 localhost kernel: DMAR:[fault reason 06] PTE Read access is not set
I just upgraded to Fedora 13 and am having the exact same problem. I have to boot with the "intel_iommu=off" kernel option so X does not segfault.
There is a patch for the nvidia drivers here: http://www.nvnews.net/vbulletin/showthread.php?s=5508b00020d562e14c1c1f33787f815d&t=151791 With the patched drivers I don't need the "intel_iommu=off" kernel option any more. kernel: 2.6.33.5-112.fc13.x86_64
This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.