Thought I'd start a new report as this doesn't seem the same as the various bad firmware iommu reports. Description of problem: Recurring DMAR & DRHD errors using an Nvidia graphics card (nouveau driver) with recent kernels. Same issue described in bug 490477 comment #42: https://bugzilla.redhat.com/show_bug.cgi?id=490477#c42. Note that I've also got an Asus p6t deluxe v2 (1366/X58) as noted in the referenced comment. Also as noted, the fault sequence repeats continuously. System seems stable, but also as noted, the errors do seem to be slowing things down. I did try both intel_iommu=igfx_on and _off. No change in behaviour. Log example from video dmar error: Nov 25 00:06:42 mail kernel: DRHD: handling fault status reg 2 Nov 25 00:06:42 mail kernel: DMAR:[DMA Read] Request device [02:00.0] fault addr 0 Nov 25 00:06:42 mail kernel: DMAR:[fault reason 06] PTE Read access is not set Nov 25 00:06:42 mail kernel: DRHD: handling fault status reg 102 Nov 25 00:06:42 mail kernel: DMAR:[DMA Read] Request device [02:00.0] fault addr 0 Nov 25 00:06:42 mail kernel: DMAR:[fault reason 06] PTE Read access is not set Nov 25 00:06:42 mail kernel: DRHD: handling fault status reg 202 Nov 25 00:06:42 mail kernel: DMAR:[DMA Read] Request device [02:00.0] fault addr 0 Nov 25 00:06:42 mail kernel: DMAR:[fault reason 06] PTE Read access is not set Nov 25 00:06:42 mail kernel: DRHD: handling fault status reg 302 Nov 25 00:06:42 mail kernel: DMAR:[DMA Read] Request device [02:00.0] fault addr 0 Nov 25 00:06:42 mail kernel: DMAR:[fault reason 06] PTE Read access is not set Nov 25 00:06:42 mail kernel: DRHD: handling fault status reg 402 Nov 25 00:06:42 mail kernel: DMAR:[DMA Read] Request device [02:00.0] fault addr 0 Nov 25 00:06:42 mail kernel: DMAR:[fault reason 06] PTE Read access is not set Nov 25 00:06:42 mail kernel: DRHD: handling fault status reg 502 Nov 25 00:06:42 mail kernel: DMAR:[DMA Read] Request device [02:00.0] fault addr 0 Nov 25 00:06:42 mail kernel: DMAR:[fault reason 06] PTE Read access is not set Also seeing same error on the Marvell sky2 driver. Version-Release number of selected component (if applicable): kernels at least since 2.6.31.5-127. Currently on 2.6.32rc8 (git) + nouveau from git. Log from sky2 error: Nov 22 06:14:01 mail kernel: DRHD: handling fault status reg 302 Nov 22 06:14:01 mail kernel: DMAR:[DMA Read] Request device [06:00.0] fault addr fff742bfe000 Nov 22 06:14:01 mail kernel: DMAR:[fault reason 06] PTE Read access is not set Nov 22 06:14:01 mail kernel: sky2 0000:06:00.0: error interrupt status=0xc0000000 Nov 22 06:14:01 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010) Followed by kerneloops: ------------[ cut here ]------------ WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0xf3/0x164() Hardware name: System Product Name NETDEV WATCHDOG: eth0 (sky2): transmit queue 0 timed out Modules linked in: tun iptable_raw iptable_mangle ipt_MASQUERADE iptable_nat nf_nat bridge stp appletalk psnap llc nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc hwmon_vid coretemp acpi_cpufreq sit tunnel4 ipt_LOG nf_conntrack_netbios_ns nf_conntrack_ftp nf_conntrack_ipv6 xt_multiport ip6table_filter xt_DSCP xt_dscp xt_MARK ip6table_mangle ip6_tables ipv6 dm_multipath raid1 kvm_intel kvm snd_hda_codec_analog snd_hda_intel snd_ens1371 gameport snd_hda_codec snd_rawmidi snd_ac97_codec ac97_bus snd_hwdep snd_seq snd_seq_device snd_pcm gspca_spca505 gspca_main snd_timer videodev iTCO_wdt snd ata_generic pata_acpi firewire_ohci asus_atk0110 v4l1_compat i2c_i801 sky2 soundcore v4l2_compat_ioctl32 firewire_core pcspkr snd_page_alloc wmi iTCO_vendor_support hwmon crc_itu_t pata_marvell raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx nouveau ttm drm_kms_helper drm agpgart nvidiafb fb fb_ddc i2c_algo_bit cfbcopyarea vgastate i2c_core cfbimgblt cfbfillrect [las t unloaded: nbd] Pid: 14, comm: ksoftirqd/5 Tainted: G W 2.6.32-rc8 #1 Call Trace: <IRQ> [<ffffffff81055319>] warn_slowpath_common+0x7c/0x94 [<ffffffff81055388>] warn_slowpath_fmt+0x41/0x43 [<ffffffff813f2932>] dev_watchdog+0xf3/0x164 [<ffffffff814803d1>] ? sub_preempt_count+0xe/0x4e [<ffffffff8147d990>] ? _spin_unlock_irqrestore+0x67/0x69 [<ffffffff814803d1>] ? sub_preempt_count+0xe/0x4e [<ffffffff81065578>] run_timer_softirq+0x1c6/0x284 [<ffffffff813f283f>] ? dev_watchdog+0x0/0x164 [<ffffffff8105cbcd>] __do_softirq+0x115/0x1f6 [<ffffffff81012f0c>] call_softirq+0x1c/0x30 <EOI> [<ffffffff81014493>] do_softirq+0x4b/0xa6 [<ffffffff8105c7ea>] ksoftirqd+0x96/0x167 [<ffffffff8105c754>] ? ksoftirqd+0x0/0x167 [<ffffffff81074f00>] kthread+0x7f/0x87 [<ffffffff81012e0a>] child_rip+0xa/0x20 [<ffffffff8104b073>] ? finish_task_switch+0x50/0xa8 [<ffffffff81012741>] ? restore_args+0x0/0x30 [<ffffffff81074e81>] ? kthread+0x0/0x87 [<ffffffff81012e00>] ? child_rip+0x0/0x20 ---[ end trace 57f7151f6a5def07 ]---
(In reply to comment #0) > Thought I'd start a new report as this doesn't seem the same as the various bad > firmware iommu reports. I think ASUS iommu errors are all the same issue. Bad BIOSes. I've had a P45M (ASUS laptop) and a new core i5 desktop (ASUS motherboard) have DMAR errors spewing out from the kernel. Turning off VT-d solves the problem. USB becomes broken on the desktop with VT-d on. It's unfortunate that this issue was not seen on 2.6.30 and lower kernels. Will a work-around be applied to 2.6.31+ kernels or are kernel maintainers going to give us a cold shoulder and say talk to ASUS?
I'd have a look at comment #66 here: https://bugzilla.redhat.com/show_bug.cgi?id=533952
(In reply to comment #1) > (In reply to comment #0) > > Thought I'd start a new report as this doesn't seem the same as the various bad > > firmware iommu reports. > > I think ASUS iommu errors are all the same issue. Bad BIOSes. I've had a P45M > (ASUS laptop) and a new core i5 desktop (ASUS motherboard) have DMAR errors > spewing out from the kernel. Turning off VT-d solves the problem. USB becomes > broken on the desktop with VT-d on. It's unfortunate that this issue was not > seen on 2.6.30 and lower kernels. Will a work-around be applied to 2.6.31+ > kernels or are kernel maintainers going to give us a cold shoulder and say talk > to ASUS? The some problems. I have Asus N61V, kernel 2.6.32.7-37.fc12.x86_64 and when I OFF in BIOS this VT-d, notebook works fine !
FYI - as of kernel 2.6.33 rc6 (git) this appears resolved. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
I'm getting thousands of these: Mar 10 12:11:44 localhost kernel: DRHD: handling fault status reg 3 Mar 10 12:11:44 localhost kernel: DMAR:[DMA Read] Request device [01:00.0] fault addr 0 Mar 10 12:11:44 localhost kernel: DMAR:[fault reason 06] PTE Read access is not set with kernel-2.6.33-1.fc13.x86_64 Hardware is HP dc7900 (Intel(R) Core(TM)2 Quad CPU Q9400), Intel ICH10 chipset I'm not sure if VT-d is on or off; don't know how to find out without rebooting.
Forgot about this one... it's solved for me. The sky2 issue was a sky2 driver bug - dma transmit buffers were never unmapped. This was fixed during the 2.6.33 cycle, and backported to 2.6.32. The Nouveau issue *may* have been triggered by sky2 consuming and not releasing dma buffer space. It may also have been something else entirely. Regardless, for me, that's also solved (using the nouveau driver in the 2.6.33 kernel git staging tree). Stefan: as to your issue - I'd suggest that it's not at all the same and should probably get it's own report. You should include your dmesg, that would note the status of intel_iommu (i.e., whether or not VT-d is enabled). If you can rebuild your kernel, you can also enable dma debugging. That should result in a useful stack trace that can be added to your bug report. I'm also closing my report, as it's fixed. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
Michael, thanks for your helpful comment. I set out to install a kernel with dma debugging enabled today, only to find out that I'm not sure which knobs to touch. Interestingly, the Fedora kernel is already built with CONFIG_DMA_API_DEBUG=y. Is this the switch you were talking about? If it is, then how do I enable stack traces?
You should have (or be able to mount) debugfs and see a dma_api directory. There's a little documentation in the kernel doc directory. For example, in my fstab I've got: sysfs /sys sysfs rw,relatime 0 0 debug /sys/kernel/debug debugfs 0 0 So in this case you should have /sys/kernel/debug/dma_api. Assuming it's enabled, then you would get a stack trace on the first detected error. If you set dma_api/all_errors to 1, then you'd get repeated traces. There are some other options commented in the source code for dma api debugging. In my case (sky2), I didn't get stack traces, but quickly had debugging disabled as the table entries ran out (dma buffers weren't being freed). dma_api/min_free_entries went to zero... no stack trace, but still useful data. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
(In reply to comment #8) Thanks again. Unfortunately, I cannot get stack traces for the life of me. I mounted debugfs and keep getting the errors. However, there is no stack trace to be seen. Also, num_free_entries stays the same forever. [root@tinker dma-api]# pwd /sys/kernel/debug/dma-api [root@tinker dma-api]# for d in *; do echo $d; cat $d; done all_errors 0 disabled N driver_filter error_count 0 min_free_entries 27999 num_errors 1 num_free_entries 28149 It's also not that the kernel is not able to do stack traces in principle, because at shutdown, it always generates a "possible recursive locking" warning, and there it spits out a nice stack trace.
So I've been getting a lot of the same issues with my ASUS motherboard as this bug. However, I'm seeing it with the nvidia card I have instead of the sky2 nic. [drm] nouveau 0000:03:00.0: Allocating FIFO number 2 [drm] nouveau 0000:03:00.0: nouveau_channel_alloc: initialised FIFO 2 DRHD: handling fault status reg 2 DMAR:[DMA Read] Request device [03:00.0] fault addr 0 DMAR:[fault reason 06] PTE Read access is not set DRHD: handling fault status reg 102 DMAR:[DMA Read] Request device [03:00.0] fault addr 0 DMAR:[fault reason 06] PTE Read access is not set DRHD: handling fault status reg 202 ... etc. Eventually the automatic bug reporting tool dumps out a message... ------------[ cut here ]------------ WARNING: at drivers/pci/intel-iommu.c:3791 init_dmars+0x373/0x739() Hardware name: System Product Name Your BIOS is broken; DMA routed to ISOCH DMAR unit but no TLB space. BIOS vendor: American Megatrends Inc.; Ver: 0703 ; Product Version: System Version Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.33.3-85.fc13.x86_64 #1 Call Trace: [<ffffffff8104b558>] warn_slowpath_common+0x77/0x8f [<ffffffff8104b5bd>] warn_slowpath_fmt+0x3c/0x3e [<ffffffff81bd0aec>] init_dmars+0x373/0x739 [<ffffffff81bd113d>] intel_iommu_init+0x28b/0x376 [<ffffffff81badb69>] ? pci_iommu_init+0x0/0x31 [<ffffffff81badb73>] pci_iommu_init+0xa/0x31 [<ffffffff8100205f>] do_one_initcall+0x59/0x154 [<ffffffff81ba6762>] kernel_init+0x210/0x26a [<ffffffff8100a924>] kernel_thread_helper+0x4/0x10 [<ffffffff81ba6552>] ? kernel_init+0x0/0x26a [<ffffffff8100a920>] ? kernel_thread_helper+0x0/0x10 I was curious what the "resolution" for this bug was, and could there be something similar with the nouveau driver as well? Should I put what I'm describing in a different bug?
[14380.067103] CE: hpet increased min_delta_ns to 11250 nsec [14381.882656] CE: hpet increased min_delta_ns to 16875 nsec [14398.012915] CE: hpet increased min_delta_ns to 25312 nsec [14811.019640] DRHD: handling fault status reg 2 [14811.019652] DMAR:[DMA Read] Request device [06:00.0] fault addr fffd8000 [14811.019655] DMAR:[fault reason 06] PTE Read access is not set [14811.052205] DRHD: handling fault status reg 102 [14811.052217] DMAR:[DMA Read] Request device [06:00.0] fault addr fff8f000 [14811.052220] DMAR:[fault reason 06] PTE Read access is not set Lenovo IdeaPad Y550P with an nVidia GeForce 240M. Using the binary nvidia driver, not nouveau. The ethernet is tg3, not sky2 (My desktop at home has a sky2 chip and I've never seen this error - running Arch Linux on the desktop). VT-d is ENABLED on my laptop right now.