Bug 561267
Summary: | Nouveau does DMA from invalid addresses | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michal Hlavinka <mhlavink> | ||||||
Component: | xorg-x11-drv-nouveau | Assignee: | Ben Skeggs <bskeggs> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 13 | CC: | airlied, ajax, amluto, anton, awilliam, Bert.Deknuydt, bskeggs, chemobejk, corsac, cra, dougsland, dwmw2, eric.brunet, fkoliver2, gansalmon, itamar, jmoskovc, jonathan, kernel-maint, llg, manisandro, maristgeek, markjx, martin, maurizio.antillon, mbreuer, mcepl, mhlavink, mishu, mpope, murraysj, p.a.crook, selinux, stefanrin, tbzatek, tomek, xgl-maint, zeekec, zing | ||||||
Target Milestone: | --- | Keywords: | Reopened, Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | 538163 | Environment: | |||||||
Last Closed: | 2010-09-20 15:22:10 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Michal Hlavinka
2010-02-03 08:33:00 UTC
Looks like the graphics device is attempting to do DMA from address zero. Is the address _always_ zero? Perhaps the driver needs to allocate (and dma-map) a scratch page and point 'unused' pointers to that page instead of assuming that address zero will be valid? (In reply to comment #1) > Looks like the graphics device is attempting to do DMA from address zero. > Is the address _always_ zero? yes Hm, it _does_ use a scratch page. To start with, can you try something like this? Although I don't see how it could ever trigger; we do seem to be initialising the whole page table with the scratch page at startup... unless we calculate the size of the table incorrectly in nouveau_sgdma_init()? diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c b/drivers/gpu/drm/nouveau/nouveau_sgdma.c index 4c7f1e4..74ab2ce 100644 --- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c +++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c @@ -103,6 +103,11 @@ nouveau_sgdma_bind(struct ttm_backend *be, struct ttm_mem_reg *mem) uint32_t offset_l = lower_32_bits(dma_offset); uint32_t offset_h = upper_32_bits(dma_offset); + if (WARN_ON_ONCE(!offset_l && !offset_h)) { + dma_offset = dev_priv->gart_info.sg_dummy_bus; + offset_l = lower_32_bits(dma_offset); + offset_h = upper_32_bits(dma_offset); + } for (j = 0; j < PAGE_SIZE / NV_CTXDMA_PAGE_SIZE; j++) { if (dev_priv->card_type < NV_50) nv_wo32(dev, gpuobj, pte++, offset_l | 3); I've built new kernel with this patch and rebooted. What should I look for? Usual kernel oops trace or just some one-line message in log? You'd get a warning, which looks very much like an oops. It's not a good idea to create a bug as a clone of another bug, generally; it adds a lot of stuff you don't necessarily want (like, everyone who is CCed on the other bug was CCed on this one, and this one depends on that one, which it shouldn't). I've fixed it up now. In future just file a new bug, not a clone :) thanks! *** Bug 570142 has been marked as a duplicate of this bug. *** I've got what I think is exactly same problem on a DELL Quad Core Xeon box, see snip it from log/messages below (of Gigs of messages). The problem started when I turned on virtualisation in the BIOS, specifically the "Intel VT I/O option" (approximating Dell's phrasing). I turned in on to run virtualised guests so I don't want to turn it off again or really set intel_iommu=off. The problem is generated by both the proprietary nvidia and open source nouveau drivers. It has a nVidia quadro 295. As my best work around at this moment is to use a low resolution vesa driver I'm keen to help find a solution. Let me know what I can do to help. --- Box has Fedora 12 install with kernel (if needed I'm willing to upgrade this) 2.6.32.9-70.fc12.x86_64 #1 SMP Wed Mar 3 04:40:41 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux Intel(R) Xeon(R) CPU X5550 @ 2.67GHz --- Mar 27 00:11:00 localhost kernel: DRHD: handling fault status reg 302 Mar 27 00:11:00 localhost kernel: DMAR:[DMA Read] Request device [03:00.0] fault addr 0 Mar 27 00:11:00 localhost kernel: DMAR:[fault reason 06] PTE Read access is not set Mar 27 00:11:00 localhost kernel: DRHD: handling fault status reg 402 Mar 27 00:11:00 localhost kernel: DMAR:[DMA Read] Request device [03:00.0] fault addr 0 Mar 27 00:11:00 localhost kernel: DMAR:[fault reason 06] PTE Read access is not set Mar 27 00:11:00 localhost kernel: DRHD: handling fault status reg 502 Mar 27 00:11:00 localhost kernel: DMAR:[DMA Read] Request device [03:00.0] fault addr 0 Mar 27 00:11:00 localhost kernel: DMAR:[fault reason 06] PTE Read access is not set Mar 27 00:11:00 localhost kernel: DRHD: handling fault status reg 602 Mar 27 00:11:00 localhost kernel: DMAR:[DMA Read] Request device [03:00.0] fault addr 0 Mar 27 00:11:00 localhost kernel: DMAR:[fault reason 06] PTE Read access is not set Mar 27 00:11:00 localhost kernel: DRHD: handling fault status reg 702 Mar 27 00:11:00 localhost kernel: DMAR:[DMA Read] Request device [03:00.0] fault addr 0 Mar 27 00:11:00 localhost kernel: DMAR:[fault reason 06] PTE Read access is not set Mar 27 00:11:00 localhost kernel: DRHD: handling fault status reg 2 Mar 27 00:11:00 localhost kernel: DMAR:[DMA Read] Request device [03:00.0] fault addr 0 Mar 27 00:11:00 localhost kernel: DMAR:[fault reason 06] PTE Read access is not set Mar 27 00:11:00 localhost kernel: DRHD: handling fault status reg 102 Mar 27 00:11:00 localhost kernel: DMAR:[DMA Read] Request device [03:00.0] fault addr 0 Mar 27 00:11:00 localhost kernel: DMAR:[fault reason 06] PTE Read access is not set Mar 27 00:11:00 localhost kernel: DRHD: handling fault status reg 202 Mar 27 00:11:00 localhost kernel: DMAR:[DMA Read] Request device [03:00.0] fault addr 0 Mar 27 00:11:00 localhost kernel: DMAR:[fault reason 06] PTE Read access is not set Mar 27 00:11:00 localhost kernel: DRHD: handling fault status reg 302 Mar 27 00:11:00 localhost kernel: DMAR:[DMA Read] Request device [03:00.0] fault addr 0 Mar 27 00:11:00 localhost kernel: DMAR:[fault reason 06] PTE Read access is not set FYI problem still exists with the latest "git" snapshot from nouveau, i.e. Linus's kernel 2.6.34-rc2 git and nouveau git. this affects also F13-Beta The IOMMU fault messages should probably be ratelimited as well -- bogus hardware shouldn't be able to flood the logs that easily. On modern systems we should be able to completely disable broken hardware which is causing such faults. Better to fix the driver though, I suspect. jftr: I've tried this on F-13 machine with xorg-x11-drv-nouveau repository snapshot (20100519) and rebuilt rawhide kernel (2.6.34-3) and bug is still there That doesn't surprise me; there's been precisely zero activity on the upstream fd.o bug and our own nouveau developers don't seem to have looked at this bug either. Ben? I've looked at it a bit, and completely confused as to where these are coming from actually. On the machine I had access too we seemed to get one every second or so. All our page tables are cleared with scratch pages etc, so it's some part of the GPU we don't know anything about that's doing the DMA in all likelihood. But, not real ideas from here. (In reply to comment #15) > I've looked at it a bit, and completely confused as to where these are coming > from actually. On the machine I had access too we seemed to get one every > second or so. I don't know if it does matter, but I have 2 lcds and I'm getting more than one message every second - hundreds. After two weeks system was unusable - I found out /var/log/messages was 5.6 GB big which exhausted all free space on root partition > All our page tables are cleared with scratch pages etc, so it's some part of > the GPU we don't know anything about that's doing the DMA in all likelihood. > But, not real ideas from here. well, there were probably a lot of changes, but maybe looking into changes between 2.6.31,* (which was working fine) and 2.6.32 (first broken) could help (In reply to comment #16) > (In reply to comment #15) > > I've looked at it a bit, and completely confused as to where these are coming > > from actually. On the machine I had access too we seemed to get one every > > second or so. > > I don't know if it does matter, but I have 2 lcds and I'm getting more than one > message every second - hundreds. After two weeks system was unusable - I found > out /var/log/messages was 5.6 GB big which exhausted all free space on root > partition > > > All our page tables are cleared with scratch pages etc, so it's some part of > > the GPU we don't know anything about that's doing the DMA in all likelihood. > > But, not real ideas from here. > > well, there were probably a lot of changes, but maybe looking into changes > between 2.6.31,* (which was working fine) and 2.6.32 (first broken) could help My guess is that 2.6.32 is where VT-d either got added, or turned on, and that it's always been broken. But regardless, I've double-checked every place we know of that can reference system memory on the GPU, and we're all fine there. It's still a mystery. To "hide" the issue for now, you can disable VT-d in your BIOS setup. Does the binary nvidia driver work for you by the way? On the machine I was using, it caused a massive VT-d flood which essentially made the machine appear to be hung. (In reply to comment #17) > My guess is that 2.6.32 is where VT-d either got added, or turned on, and that > it's always been broken. But regardless, I've double-checked every place we > know of that can reference system memory on the GPU, and we're all fine there. > It's still a mystery. > > To "hide" the issue for now, you can disable VT-d in your BIOS setup. is it possible to disable VT-d just for nouveau? > Does the binary nvidia driver work for you by the way? On the machine I was > using, it caused a massive VT-d flood which essentially made the machine appear > to be hung. I've tried nvidia, nouveau both with and without VT-d enabled nvidia without VT-d works fine nvidia with VT-d produces similar DMAR error messages: DRHD: handling fault status reg 2 DMAR:[DMA Read] Request device [02:00.0] fault addr 128785000 DMAR:[fault reason 01] Present bit in root entry is clear addr is different every time I restart X server also server won't start, in xorg.log everything seems ok but last line: (EE) NVIDIA(0): WAIT: (E, 0, 0x827d, 0) nouveau with VT-d = this bug nouveau without VT-d seems working fine so far (I'm using it right now) These disappeared for me somewhere around kernel.org 2.6.33 rc5. They're still gone for me in 2.6.34. Unfortunately, with 2.6.34 & rawhide nouveau drm updates I can't log in using Gnome or KDE (dead keyboard after telinit 5... but OK after chvt away from X. Probably something I did, so no bug report for that yet. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers No such luck here. I've been trying rawhide for a while now and the problem has never gone away. Can see it now even with 2.6.34-2. Typical 3-4 DMA errors per second though it possible depends on how much on screen activity there is. Had to delete some 9G of message logs the other day when hard drive space dried up. When I first came across this problem I was using the nvidia driver which produced the same DMA errors (and X failed to start) so it looks like the problem is common to both nvidia and nouvea. Not really a comment on the bug, but to help with the log file size. I've added the following lines to my rsyslog.conf (before the other rules) to stop the log spam. Dmesg is still useless, but my log files don't grow out of controll. # Coverup nouveau log spam. :msg, contains, "DMAR" ~ :msg, contains, "DRHD" ~ Just wanted to add that disabling VT-d worked for me. System configuration: * Asus P6T Deluxe Motherboard * Intel i7 920 CPU * nVidia Corporation GT216 [GeForce GT 220] * Fedora 13 * Kernel 2.6.33.5-112.fc13.x86_64 * xorg-x11-drv-nouveau-0.0.16-6.20100423git13c1043.fc13.x86_64 Oh, and I have other machines with i7 Extreme Editions on P6T (Supercomputer and P6TD Deluxe) motherboards. The Intel dual port 10 Gig Ethernet cards didn't work until I disabled VT-d. I did iommu=soft which also works. It appears reporter and others using at least F13 now, so re-targeting. This should be fixed in kernel-2.6.33.5-120.fc13 (http://koji.fedoraproject.org/koji/buildinfo?buildID=177442). I've installed latest kernel from koji: kernel-2.6.33.5-122.fc13 and even after re-enabling VT-d in the bios I no longer get those DMAR/DRHD error messages in the log. So it's fixed, at least for me. Yet another `me too'. I have kernel-2.6.33.5-112.fc13.x86_64, xorg-x11-drv-nouveau-0.0.16-6.20100423git13c1043.fc13.x86_64, on Intel Core 2 Duo / NV50. If I boot with Vt-d enabled, I get the DMAR-spew in messages and a hard lock after about 0-5 minutes. If Vt-d is disabled... no sign yet of either. Mike, if you read the last few comments here, the issue is fixed in 2.6.33.5-120.fc13 Installed kernel-2.6.33.5-120.fc13.x86_64 and removed the kernel option "intel_iommu=off". No error messages in the log yet... Re #27, apologies Ben, I misread the kernel number. -120 does work for me, but alas only keeps the machine up long enough for #566987 to bite. Actually now I am not so sure this is fully fixed. Since re-enabling Vt-d yesterday I have had two hard locks with -120 (rebooted afterward as single user and checked Xorg.0.log but no sign of the #566987 signature), although there were no more spewed DMAR messages. -124 is downloading now. (In reply to comment #30) > Actually now I am not so sure this is fully fixed. Since re-enabling Vt-d > yesterday I have had two hard locks with -120 (rebooted afterward as single > user and checked Xorg.0.log but no sign of the #566987 signature), although > there were no more spewed DMAR messages. -124 is downloading now. I had the same experience on my work desktop yesterday. Can you please check /var/log/messages? Do you see there: kernel: [drm] nouveau 0000:02:00.0: PGRAPH_TRAP - Ch 2/5 Class 0x8297 Mthd 0x0f04 Data 0x00000000:0x00000000 kernel: [drm] nouveau 0000:02:00.0: PGRAPH_TRAP_CCACHE_FAULT - VM: Trapped read at 00412a2000 status 00000560 00000000 channel 2 kernel: [drm] nouveau 0000:02:00.0: PGRAPH_TRAP_CCACHE_FAULT - 00000000 00000000 00000000 00000000 00000000 00000000 00000000 On my system X was frozen, with the X server eating 100% CPU on one core. Other functionality was still OK, i.e. I was able to login remotely and initiate a shutdown command. Unfortunately the shutdown got stuck, probably because it couldn't kill the X server. You might also want to check Bug #566987 if your system supports PCI-E ASPM. My work desktop doesn't have this feature, so pcie_aspm=off won't help The hard locks I am seeing leave nothing in /var/log/messages. AFAICT nouveau initializes happily, and runs well, emitting no messages after post-boot settle down. I am pretty confident this is not #566987, which this machine also does suffer from. Here that one is always associated with a PFIFO message, leaves the mouse alive, and the machine still accepts ssh, none of which is the case for the lockups. The box does have PCI-E, and I set pcie_aspm=off this morning, following the first hard lockup but before the second. The only thing I changed before the lockups started was re-enabling Vt-d. Its off again now while I try to get some work done, but I will turn it on again later. Could this be the same bug as 578108, about a problem with nvidia hardware, related to iommu, which appeared with kernel 2.6.32 and which is fixed with intel_iommu=off ? (In reply to comment #31) > kernel: [drm] nouveau 0000:02:00.0: PGRAPH_TRAP - Ch 2/5 Class 0x8297 Mthd > 0x0f04 Data 0x00000000:0x00000000 > kernel: [drm] nouveau 0000:02:00.0: PGRAPH_TRAP_CCACHE_FAULT - VM: Trapped read > at 00412a2000 status 00000560 00000000 channel 2 > kernel: [drm] nouveau 0000:02:00.0: PGRAPH_TRAP_CCACHE_FAULT - 00000000 > 00000000 00000000 00000000 00000000 00000000 00000000 The same X freeze happened with -124 just 10 minutes ago. I was able "telinit 3", kill X and shutdown the machine cleanly, so only X is affected by this. I guess if this happens again I'll go back to intel_iommu=off. Coïncidence: I've also just had a freeze of my computer (no answer from mouse or keyboard, not even the CapsLock led working) with 2.6.33.5-124.fc13.x86_64. I didn't try to log in remotely and just pressed the reset button. The last three lines of /var/log/messages are kernel: [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 2 kernel: [drm] nouveau 0000:0f:00.0: PFIFO_CACHE_ERROR - Ch 2/0 Mthd 0x0000 Data 0x03000000 kernel: [drm] nouveau 0000:0f:00.0: PFIFO_DMA_PUSHER - Ch 2 So, what do we loose with going back to intel_iommu=off ? Is there a reason why it shouldn't be the default ? What does this stuff do ? (Remember, I am coming from bug 578108 which I suspect has the same origin as this bug, but I had different symptoms in previous kernels as everybody here.) (In reply to comment #35) > So, what do we loose with going back to intel_iommu=off ? Is there a reason why > it shouldn't be the default ? What does this stuff do ? It disables an Intel virtualization feature (same as disabling VT-d in the BIOS). If you don't use virtualization you shouldn't loose anything. The PFIFO_DMA_PUSHER one sounds like #566987. I see this bug (#561267) quite quickly if I enable Vt-d in the BIOS, but if I disable it I can run for some time before eventually seeing #566987. If I also add intel_iommu=off, I have not seen either and now have an uptime of 5 days. (All with 2.6.33.5-124.fc13.x86_64) Just updated to the 2.6.34-45.fc14.x86_64 kernel on a machine that using the rawhide repositories (actually fedora 14 but I don't think there's a different as yet). Looks like the problem might of gone away. No sure about stability thought, so far it's been up for a couple of hours and nothing gone wrong. Up 1 day, 18 hours and counting. No more DMAR log messages. VT-d enabled and running virtualised machines. This is the same machine as I mentioned in comment#8. Looks like this is fixed in kernel 2.6.34-45. Spoke too soon. Machine just locked up after 2 days running. X locked up, ssh not working pings not being answered. Nothing in /var/log/messages. I noticed a similar problem with the previous version 2.6.34 (not sure but possible 2.6.34-40). Obviously this is different to the original problem, however a completely different machine without an nvidia card but using the same kernel and with VT-d turned on hasn't locked up suggesting there's possibly still some linkage to nvidia. Now trying kernel 2.6.35-0.2.rc3.git0.fc14.x86_64. No DMAR messages. Okay lets see if this last more than 2 days before freezing. Michal, do you agree with the comment 41? Is it fixed in 2.6.34-45? Thank you Just to clarify this bug, it's referring to nouveau triggering IOMMU faults. This bug *is* fixed. Any lockups (and there's known issues on NV86 and NVA3/NVA5/NVA8) are different bugs. Fair enough Ben, but where then do you want such reports to go? I have now seen another lockup with 2.6.33.5-124.fc13.x86_64 + intel_iommu=off + NV92. Is there a bug open for mysterious-NV-lockups-which-used-to-DMAR-spew? You seen the bug *with* intel_iommu=off or without? I don't think there's any link between the two bugs, the DMAR spew would happen on *any* NVIDIA chipset that's plugged into a board supporting VT-d. Yes, with intel_iommu=off. When I first saw this it was a DMAR-spew followed quickly by lockup. Now its just a rare lockup. It certainly could be different problems, but there is nothing in /var/log/messages or Xorg.0.log to characterize it better. Okay, then I'm even more convinced now they're completely separate issues :) A new bug report for the lockup would be great, though as with all these random lockups, they're quite hard to track down. (In reply to comment #42) > Michal, do you agree with the comment 41? Is it fixed in 2.6.34-45? I didn't try this kernel, its not available for F-13, but Ben fixed it in another build (see comment #24) and it's working for me (comment #25). So I agree that this bug is fixed Just had a hard lockup + reset as well, with kernel 2.6.33.5-124.fc13.x86_64. Hardware: http://www.smolts.org/client/show/pub_dbe294f3-62e0-40f9-b141-547eeb979466 Until a few weeks ago, before the DMA spewing was fixed, the X server would just hang, and I would ssh in and reboot the box. This time, however, it crashed by itself, too quickly for me to bring up the other machine and log in. There is nothing in /var/log/messages. I'm using pcie_aspm=off intel_iommu=off. Ben, if you would like a distinct bug report for just the hard lockup I am happy to open one. I just wish there was more useful detail to put in it. All I have right now is `machine <description> hard locks occasionally without warning, did not happen in F11, nothing in logs, pcie_aspm setting does not matter, intel_iommu setting does not matter, Vt-d off in BIOS' (actually I need to confirm the latter, its not explicit in my notes). That is not giving you much signal. Is there some extra logging you can recommend I turn on to increase the log verbosity? I was getting similar hard locks with a couple of 2.6.34 kernels that I tried (from rawhide/F14), see comment #38 - comment #41. Now running 2.6.35-0.2.rc3.git0.fc14.x86_64 and so far so good. Uptime so far 4 days 16 hours. I've got Vt-d enabled and I'm *not* using pcie_aspm=off intel_iommu=off. (In reply to comment #50) > Ben, if you would like a distinct bug report for just the hard lockup I am > happy to open one. I just wish there was more useful detail to put in it. All > I have right now is `machine <description> hard locks occasionally without > warning, did not happen in F11, nothing in logs, pcie_aspm setting does not > matter, intel_iommu setting does not matter, Vt-d off in BIOS' (actually I need > to confirm the latter, its not explicit in my notes). > > That is not giving you much signal. Is there some extra logging you can > recommend I turn on to increase the log verbosity? What *exact* chipset is your card? dmesg or X log will be useful to know this. Created attachment 428144 [details]
All nouveau traces from /var/log/messages+/var/log/Xorg.0.log
The kernel says: Detected an NV50 generation card (0x092a00a2)
The X server says: NOUVEAU(0): Chipset: "NVIDIA NV92"
See attached.
Okay, thanks. Then yeah, definitely file a new bug, that hang isn't a known one. Done. #609764. IMHO this should be closed. The (hard)lock issues are reported in several other bug reports already? Does the original reporter concur? yes, I agree see comment #48 Then let's close this one.(In reply to comment #57) > yes, I agree see comment #48 Then please you or bskeggs close it, because I don't have the rights to do it. I'm sorry to say that, but the bug doesn't seem to be fixed at least on my HW, still get those DMAR messages and /var/log/messages is filling my harddrive. Disabling VT'd in bios or using intel_iommu=off helps, so I guess it's the same bug. I'm running kernel-2.6.34.6-54.fc13.x86_64. (In reply to comment #59) > I'm sorry to say that, but the bug doesn't seem to be fixed at least on my HW, > still get those DMAR messages and /var/log/messages is filling my harddrive. > Disabling VT'd in bios or using intel_iommu=off helps, so I guess it's the same > bug. I'm running kernel-2.6.34.6-54.fc13.x86_64. When you look at the error message similar to this: kernel: DMAR:[DMA Read] Request device [02:00.0] fault addr 0 does the device number (in this case 02:00.0 but yours may be different) match up to an nvidia video adapter in the output of lspci, or is it some other device? In the original reporter's machine, device 02:00.0 is this: 02:00.0 VGA compatible controller [0300]: nVidia Corporation Quadro NVS 290 [10de:042f] (rev a1) in my case it's: Sep 20 15:09:13 dhcp-25-200 kernel: DRHD: handling fault status reg 2 Sep 20 15:09:13 dhcp-25-200 kernel: DMAR:[DMA Read] Request device reg 2 Sep 20 15:09:13 dhcp-25-200 kernel: DMAR:[DMA Read] Request device [0d:00.0] fault addr fffff000 reg 2 Sep 20 15:09:13 dhcp-25-200 kernel: DMAR:[DMA Read] Request device [0d:00.0] fault addr fffffreg 2 Sep 20 15:09:13 dhcp-25-200 kernel: DMAR:[DMA Read] Request device [0d:00.0] fault addr fffff000 Sep 20 15:09:13 dhcp-25-200 kernel: DMreg 2 Sep 20 15:09:13 dhcp-25-200 kernel: DMAR:[DMA Read] Requece [0d:00.0] fault addr ffreg 2 Sep 20 15:09:13 dhcp-25-200 kernel: DMAR:[DMA Read] Request device [0d:00.0] fault addr fffff000 Sep 20 15:09:13 dhcp-25-200 kernel: <reg 2 Sep 20 15:09:13 dhcp-25-200 kernel: DMAR:[DMA Read] Request device [0d:00.0] fault addr fffff0reg 2 Sep 20 15:09:13 dhcp-25-200 kernel: DMAR:[DMA Rereg 2 Sep 20 15:09:13 dhcp-25-200 kernel: DMAR:[DMA Reace [0d:00.0] fault addr ffreg 2 Sep 20 15:09:13 dhcp-25-200 kernel: DMAR:[DMA Read] Request device [0d:00.0] fault addr fffff000 and the device seems to be: 0d:00.0 SD Host controller: Ricoh Co Ltd Device e822 (rev 01) So it's probably a different component, sorry for the noise then.. (In reply to comment #61) > > and the device seems to be: > 0d:00.0 SD Host controller: Ricoh Co Ltd Device e822 (rev 01) > > So it's probably a different component, sorry for the noise then.. That's bug 605888. (In reply to comment #24) > It appears reporter and others using at least F13 now, so re-targeting. > > This should be fixed in kernel-2.6.33.5-120.fc13 > (http://koji.fedoraproject.org/koji/buildinfo?buildID=177442). Out of curiosity, which upstream commit fixes this bug? Fwiw, it's 4eb3033c72099fab3536ed8ac54a5dc99f0832d7 The problem has resurfaced in Fedora 15. My log is full of these messages: Jun 1 11:36:03 murraysj kernel: [ 3.084032] DRHD: handling fault status reg 2 Jun 1 11:36:03 murraysj kernel: [ 3.084037] DMAR:[DMA Read] Request device [02:00.0] fault addr 0 Jun 1 11:36:03 murraysj kernel: [ 3.084037] DMAR:[fault reason 06] PTE Read access is not set The X session is sluggish and sometimes fails to respond completely. I run VT-d with Windows guests. Machine is a Dell Precision T3500 QuadCore. [root@murraysj ~]# lspci -v | grep VGA 02:00.0 VGA compatible controller: nVidia Corporation NV43GL [Quadro FX 550] (rev a2) (prog-if 00 [VGA controller]) [root@murraysj ~]# lspci -vn | grep VGA 02:00.0 0300: 10de:014d (rev a2) (prog-if 00 [VGA controller]) Problem did not occur on this machine with Fedora 13 or Fedora 14. When running with nouveau the gnome3 graphics worked correctly (when they worked). I am now running the nVidia driver from rpmfusion, the error has vanished but the 3D graphics don't work, I'm stuck in gnome3 fallback mode. FWIW, I am running Fedora 15 on a Dell Precision 390 with similar graphics hardware, it does not have the problem. Is it possible to reopen this bugzilla or should a new one be started ? I disabled VT-d in the BIOS as suggested by an earlier post, now the nouveau error has gone away. The Windows guest is also performing correctly. As I stated, the error in nouveau did not occur under Fedora 13 or 14, or even 12 I seem to recall. The computer hardware has not changed, just the level of Fedora. The nouveau driver in Fedora 15 appears to have interaction problems with VT-d that the earlier versions did not. |