Bug 541397
| Summary: | DMAR and DRHD errors - sky2 & nouveau vt-d & intel_iommu | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Michael Breuer <mbreuer> |
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
| Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | rawhide | CC: | david.brown, dougsland, gansalmon, hongjiu.lu, itamar, james, kernel-maint, mhlavink, mike, stefanrin, vedran, zxvdr.au |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2010-03-12 19:38:47 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Michael Breuer
2009-11-25 18:55:21 UTC
(In reply to comment #0) > Thought I'd start a new report as this doesn't seem the same as the various bad > firmware iommu reports. I think ASUS iommu errors are all the same issue. Bad BIOSes. I've had a P45M (ASUS laptop) and a new core i5 desktop (ASUS motherboard) have DMAR errors spewing out from the kernel. Turning off VT-d solves the problem. USB becomes broken on the desktop with VT-d on. It's unfortunate that this issue was not seen on 2.6.30 and lower kernels. Will a work-around be applied to 2.6.31+ kernels or are kernel maintainers going to give us a cold shoulder and say talk to ASUS? I'd have a look at comment #66 here: https://bugzilla.redhat.com/show_bug.cgi?id=533952 (In reply to comment #1) > (In reply to comment #0) > > Thought I'd start a new report as this doesn't seem the same as the various bad > > firmware iommu reports. > > I think ASUS iommu errors are all the same issue. Bad BIOSes. I've had a P45M > (ASUS laptop) and a new core i5 desktop (ASUS motherboard) have DMAR errors > spewing out from the kernel. Turning off VT-d solves the problem. USB becomes > broken on the desktop with VT-d on. It's unfortunate that this issue was not > seen on 2.6.30 and lower kernels. Will a work-around be applied to 2.6.31+ > kernels or are kernel maintainers going to give us a cold shoulder and say talk > to ASUS? The some problems. I have Asus N61V, kernel 2.6.32.7-37.fc12.x86_64 and when I OFF in BIOS this VT-d, notebook works fine ! FYI - as of kernel 2.6.33 rc6 (git) this appears resolved. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers I'm getting thousands of these: Mar 10 12:11:44 localhost kernel: DRHD: handling fault status reg 3 Mar 10 12:11:44 localhost kernel: DMAR:[DMA Read] Request device [01:00.0] fault addr 0 Mar 10 12:11:44 localhost kernel: DMAR:[fault reason 06] PTE Read access is not set with kernel-2.6.33-1.fc13.x86_64 Hardware is HP dc7900 (Intel(R) Core(TM)2 Quad CPU Q9400), Intel ICH10 chipset I'm not sure if VT-d is on or off; don't know how to find out without rebooting. Forgot about this one... it's solved for me. The sky2 issue was a sky2 driver bug - dma transmit buffers were never unmapped. This was fixed during the 2.6.33 cycle, and backported to 2.6.32. The Nouveau issue *may* have been triggered by sky2 consuming and not releasing dma buffer space. It may also have been something else entirely. Regardless, for me, that's also solved (using the nouveau driver in the 2.6.33 kernel git staging tree). Stefan: as to your issue - I'd suggest that it's not at all the same and should probably get it's own report. You should include your dmesg, that would note the status of intel_iommu (i.e., whether or not VT-d is enabled). If you can rebuild your kernel, you can also enable dma debugging. That should result in a useful stack trace that can be added to your bug report. I'm also closing my report, as it's fixed. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers Michael, thanks for your helpful comment. I set out to install a kernel with dma debugging enabled today, only to find out that I'm not sure which knobs to touch. Interestingly, the Fedora kernel is already built with CONFIG_DMA_API_DEBUG=y. Is this the switch you were talking about? If it is, then how do I enable stack traces? You should have (or be able to mount) debugfs and see a dma_api directory. There's a little documentation in the kernel doc directory. For example, in my fstab I've got: sysfs /sys sysfs rw,relatime 0 0 debug /sys/kernel/debug debugfs 0 0 So in this case you should have /sys/kernel/debug/dma_api. Assuming it's enabled, then you would get a stack trace on the first detected error. If you set dma_api/all_errors to 1, then you'd get repeated traces. There are some other options commented in the source code for dma api debugging. In my case (sky2), I didn't get stack traces, but quickly had debugging disabled as the table entries ran out (dma buffers weren't being freed). dma_api/min_free_entries went to zero... no stack trace, but still useful data. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers (In reply to comment #8) Thanks again. Unfortunately, I cannot get stack traces for the life of me. I mounted debugfs and keep getting the errors. However, there is no stack trace to be seen. Also, num_free_entries stays the same forever. [root@tinker dma-api]# pwd /sys/kernel/debug/dma-api [root@tinker dma-api]# for d in *; do echo $d; cat $d; done all_errors 0 disabled N driver_filter error_count 0 min_free_entries 27999 num_errors 1 num_free_entries 28149 It's also not that the kernel is not able to do stack traces in principle, because at shutdown, it always generates a "possible recursive locking" warning, and there it spits out a nice stack trace. So I've been getting a lot of the same issues with my ASUS motherboard as this bug. However, I'm seeing it with the nvidia card I have instead of the sky2 nic. [drm] nouveau 0000:03:00.0: Allocating FIFO number 2 [drm] nouveau 0000:03:00.0: nouveau_channel_alloc: initialised FIFO 2 DRHD: handling fault status reg 2 DMAR:[DMA Read] Request device [03:00.0] fault addr 0 DMAR:[fault reason 06] PTE Read access is not set DRHD: handling fault status reg 102 DMAR:[DMA Read] Request device [03:00.0] fault addr 0 DMAR:[fault reason 06] PTE Read access is not set DRHD: handling fault status reg 202 ... etc. Eventually the automatic bug reporting tool dumps out a message... ------------[ cut here ]------------ WARNING: at drivers/pci/intel-iommu.c:3791 init_dmars+0x373/0x739() Hardware name: System Product Name Your BIOS is broken; DMA routed to ISOCH DMAR unit but no TLB space. BIOS vendor: American Megatrends Inc.; Ver: 0703 ; Product Version: System Version Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.33.3-85.fc13.x86_64 #1 Call Trace: [<ffffffff8104b558>] warn_slowpath_common+0x77/0x8f [<ffffffff8104b5bd>] warn_slowpath_fmt+0x3c/0x3e [<ffffffff81bd0aec>] init_dmars+0x373/0x739 [<ffffffff81bd113d>] intel_iommu_init+0x28b/0x376 [<ffffffff81badb69>] ? pci_iommu_init+0x0/0x31 [<ffffffff81badb73>] pci_iommu_init+0xa/0x31 [<ffffffff8100205f>] do_one_initcall+0x59/0x154 [<ffffffff81ba6762>] kernel_init+0x210/0x26a [<ffffffff8100a924>] kernel_thread_helper+0x4/0x10 [<ffffffff81ba6552>] ? kernel_init+0x0/0x26a [<ffffffff8100a920>] ? kernel_thread_helper+0x0/0x10 I was curious what the "resolution" for this bug was, and could there be something similar with the nouveau driver as well? Should I put what I'm describing in a different bug? [14380.067103] CE: hpet increased min_delta_ns to 11250 nsec [14381.882656] CE: hpet increased min_delta_ns to 16875 nsec [14398.012915] CE: hpet increased min_delta_ns to 25312 nsec [14811.019640] DRHD: handling fault status reg 2 [14811.019652] DMAR:[DMA Read] Request device [06:00.0] fault addr fffd8000 [14811.019655] DMAR:[fault reason 06] PTE Read access is not set [14811.052205] DRHD: handling fault status reg 102 [14811.052217] DMAR:[DMA Read] Request device [06:00.0] fault addr fff8f000 [14811.052220] DMAR:[fault reason 06] PTE Read access is not set Lenovo IdeaPad Y550P with an nVidia GeForce 240M. Using the binary nvidia driver, not nouveau. The ethernet is tg3, not sky2 (My desktop at home has a sky2 chip and I've never seen this error - running Arch Linux on the desktop). VT-d is ENABLED on my laptop right now. |