Description of problem: From time to time I get this message kernel: [120431.630837] DMA: Out of SW-IOMMU space for 16 bytes at device 0000:00:1d.7 in /var/log/messages and dmesg, the message repeats thousands of times and the messages file grows to hundreds of megabytes. Eventualy I have to reboot most of the times because a panic ocurs and the fewest because I can't use the network usb, Version-Release number of selected component (if applicable): 2.6.41.10-3.fc15.x86_64 but the problem has ocurred in other kernel versions How reproducible: It happens after several hours or days of continuous use (its a home server) Steps to Reproduce: 1. Leave the system turned on Actual results: The system panics Expected results: The system should run as long as required Additional info: I think the device is a usb hub
If you install kernel-debug, does it print some warnings?
Sorry I don't have experience with kernel debugging, what do I have to do? just install the package and look for warnings when the problem occurs? Thanks
Yes, just do "yum install kernel-debug", boot the new installed kernel, run dmesg command from time to time and see if there is "WARNING: at" message. Warning should be also detected by ABRT tool.
try the kernel currently in updates-testing too. It's been rebased to a newer upstream release, so may have fixes in this area.
After two days running the debug kernel i found this warnings [ 135.692479] avahi-daemon[1469]: WARNING: No NSS support for mDNS detected, consider installing nss-mdns! [ 689.292402] WARNING: at drivers/net/wireless/ath/ath9k/htc_drv_txrx.c:501 ath9k_htc_tx_process+0x3bb/0x3d0 [ath9k_htc]() I will try the new kernel tomorrow
After trying 2.6.42.3-2.fc15.x86_64.debug igot the same warnings: [ 146.880902] avahi-daemon[1441]: WARNING: No NSS support for mDNS detected, consider installing nss-mdns! [ 446.314648] WARNING: at drivers/net/wireless/ath/ath9k/htc_drv_txrx.c:501 ath9k_htc_tx_process+0x3bb/0x3d0 [ath9k_htc]() The error still doesnt appear but sometimes several days pass without error
I'm getting something similar too. This is on a Lenovo X220. FC16, kernel: 3.2.7-1.fc16.x86_64 #1 SMP Tue Feb 21 01:40:47 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux This message is repeated in /var/log/messages about 10 times a second: DMA: Out of SW-IOMMU space for 92 bytes at device 0000:03:00.0 lspci shows the problem appears to be with the Realtek wireless LAN (wifi) adapter, which makes sense as wireless networking stops working at this point: # lspci 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09) 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) 00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04) 00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04) 00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04) 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b4) 00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b4) 00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b4) 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b4) 00:1c.6 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 7 (rev b4) 00:1d.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04) 00:1f.0 ISA bridge: Intel Corporation QM67 Express Chipset Family LPC Controller (rev 04) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 04) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 04) 03:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8188CE 802.11b/g/n WiFi Adapter (rev 01) 0d:00.0 System peripheral: Ricoh Co Ltd Device e823 (rev 07) 0e:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04) There seems to be no way to fix this but to reboot. The problem begins to happen at random times after a reboot.
Just updated and can confirm the problem persists with: 3.2.9-1.fc16.x86_64 #1 SMP Thu Mar 1 01:41:10 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
It would seem your loosing dma space either to a leak in the driver, or to simple exhaustion of the available space. Can you tell if this exhaustion is coupled with any other sort of behavior (an interuption of wireless service perhaps that would cause the tx queue to back up, or something of that nature? the rtl tx queue is 128 skbs long. If that queue backs up I could imagine how the the swiotlb space might get exhausted, especially if you have other devices competing for it
(In reply to comment #9) > Can you tell if this exhaustion is coupled with any other sort of behavior... Thanks for helping, but sadly I have no idea how to check these things. If someone can give some commands to enter that could show some sort of debugging that could help here I'm happy to try them though.
Its going to be tough to check, its the sort of thing you'd likely want a stap script for. I'll see what I can write up on it.
With 2.6.42.3-2.fc15.x86_64 I'm having an uptime of 22 days, maybe the original problem is solved
Sorry, but I'm seeing these messages still with... 3.2.9-2.fc16.x86_64 #1 SMP Mon Mar 5 20:55:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ... so if it ws fixed in a 2.6 it's back in 3.2. Also, I had previously only seen the issue with a WiFi device, but plugged a USB thumb drive in the other day while watching the log and notice that also caused the, "DMA: Out of SW-IOMMU space" message on a different device ID so it seems the problem is not limited to just WiFi, at least in my setup (Lenovo X220, FC16).
Hm, I assume you mean to indicate that, in comment 13, you didn't have the wifi device in use? If so, that suggests you have some other piece of hardware sucking up your swiotlb space.
Yes, I also saw... Mar 10 15:04:33 x220 kernel: [ 4502.707910] DMA: Out of SW-IOMMU space for 3 bytes at device 0000:00:1a.0 From my previous lspci you can see that device 1a.0 is a USB controller.
well, yes, I know about the USB controller from comment 13. what I'm saying is if, previuosly you saw your wifi NIC getting errors, but didn't have this USB device plugged in, and now are seeing this error on your usb device without using your wifi NIC, then you most likely have smoe third device that is hogging all the space in the software iommu. It is possibly allocating a huge chunk of dma space and never releasing it, causing other devices in your system to run out of dma-able space in the iommu. I'm writing a stap script to track this down now.
Created attachment 570008 [details] stap script to track dma applications Here you go. This is a stap script to track dma memory allocations in the software iommu. Please boot your system and run this script under systemtap. Any hardware that you have which might use DMA please try to keep dormant until after you start the stap script. If you send me the output I can take a look and see whats eating all your swiotlb memory.
Re: #17, I assume one is meant to use this script with 'stap dma.stap'? I tried that and got... ERROR: kernel read fault at 0x (null) (addr) near identifier '$hwdev' at dma.stap:4:3 WARNING: Number of errors: 1, skipped probes: 1 WARNING: /usr/bin/staprun exited with status: 1 Pass 5: run failed. Try again with another '--vp 00001' option. Sorry once more if I've done this wrong - it's all new to me!
yes, thats all you should have to do. Sounds like you don't have the debuginfo packages for your running kernel installed.
Hmmm... well... call me a n00b at kernel debugging but just can't get this to work. I have all these packages installed: kernel.x86_64 3.2.9-2.fc16 @updates kernel-debug.x86_64 3.2.9-2.fc16 @updates kernel-debug-debuginfo.x86_64 3.2.9-2.fc16 @updates-debuginfo kernel-debug-devel.x86_64 3.2.9-2.fc16 @updates kernel-debuginfo.x86_64 3.2.9-2.fc16 @updates-debuginfo kernel-debuginfo-common-x86_64.x86_64 3.2.9-2.fc16 @updates-debuginfo kernel-devel.x86_64 3.2.9-2.fc16 @updates kernel-headers.x86_64 3.2.9-2.fc16 @updates kernel-tools.x86_64 3.2.9-2.fc16 @updates kernel-tools-debuginfo.x86_64 3.2.9-2.fc16 @updates-debuginfo kernel-tools-devel.x86_64 3.2.9-2.fc16 @updates It doesn't matter if I boot into the standard kernel or the debug kernel am still seeing that 'read fault' error when trying to execute stap script :( Any idea what I'm doing wrong?
not sure, its working fine for me under only a slightly more recent kernel. You can try running stap with --skip-badvars to see if that helps, its possible that hwdev is NULL when its passed into the swiotlb code, but that would be bad for lots of reasons.
Well - I just can't get any sense out of stap :( With the '--skip-badvars' some output does come but there are the percent placeholders shown rather than any values. Also, the output doesn't seem to vary based on the problem being present or not. I'm now on 3.2.10-3.fc16.x86_64 and the problem persists. One thing I have noticed is this: I'm using a bridge between Wifi and wired LAN connection on an X220 laptop. hostapd is running to turn the Wifi into a hotspot and the bridge allows traffic between clients on the hotspot and the main LAN. When clients are near to the laptop then data rates would be assumed to be good, when the signal gets weaker you would assume rates to drop. Interestingly I note that it takes a lot longer for the SW-IOMMU issue to occur when the clients (2 smartphones) have a good signal. That is, when one or both are very far from the laptop hotspot with a poor signal, the SW-IOMMU problem occurs quite quickly (within an hour perhaps). Could this be related to buffering of packets in the bridge? I wouldn't have a clue but it seems very suspicious. To be honest though, if some buffer were filling how come the system cannot recover after that is drained? When the SW-IOMMU issue occurs, the hotspot stops working and the clients disconnect, but... the messages keep flowing from the log. The wired LAN connection carries on functioning just fine but the only way to get the Wifi adapter to work once more is to reboot. I'm currently travelling and only have the laptop to use as a Wifi hotspot and this is a bit frustrating. Does this help at all?
Unfortunately no, it doesn't help much. Without any visibility into the problem its difficult to tell whats going wrong on your system. If stap just isn't working for you, I can find some time to build you a kernel with some extra debug information. In the interim, if you feel like the problem is definately related to your wireless interface, the above description wouldn't point me to the bridge, but rather the wireless hardware itself. My guess would be the wireless NIC your using is having to retransmit packets as your radio quality degrades, and that extra time (coulpled with what is likely a deep hardware queue, is causing the lifetime of a dma-mapped frame to grow, to the point where we are running out of space. I would recommend using iwconfig to shorten the number of retries per packet to see if dropping those retransmitted frames earlier prevents the problem from occuring.
ping any feedback on the suggestions in comment 23?
I removed the bridge and used a NAT solution instead but the problem persists. I also note that these issues can still occur (albeit less frequently) when USB devices are used. Am now on 3.3.1-3.fc16.x86_64 #1 SMP Wed Apr 4 18:08:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Thats great, but it does nothing in relation to the suggestions I gave in comment 23. Could you please try using iwconfig to shorten the the number of retries per packet on your wireless interface
I tried this with no noticeable effect: ifconfig wlan0 txqueuelen 10000 iwconfig wlan0 retry 1 rts 1024
Why did you make the txqueuelen so large? Thats going to undo any effects from the retry changes. Having 10000 frames backed up is going to wind up causing simmilar problems.
The rationale was that if there was a problem with signals causing the network to back up, and only when that back up overflowed something the problem was caused, then making the queue larger would help. I will try again with the default queue.
Nope. This on it's own (without the queuelen change) has no effect on the problem, which still persists: iwconfig wlan0 retry 1 rts 1024
hmm, thats odd. That points back to this not being a wireless networking problem. If you don't use the wireless NIC, the problem doesn't occur though, right?
All I can say is that this problem occurs more often when wireless is in use, but I have seen it occur from time to time when using USB devices. It would be true to say that this does not appear to be related solely to the wireless hardware then.
Is this still happening with the 3.4 or 3.5 kernel updates?
# Mass update to all open bugs. Kernel 3.6.2-1.fc16 has just been pushed to updates. This update is a significant rebase from the previous version. Please retest with this kernel, and let us know if your problem has been fixed. In the event that you have upgraded to a newer release and the bug you reported is still present, please change the version field to the newest release you have encountered the issue with. Before doing so, please ensure you are testing the latest kernel update in that release and attach any new and relevant information you may have gathered. If you are not the original bug reporter and you still experience this bug, please file a new report, as it is possible that you may be seeing a different problem. (Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).
With no response, we are closing this bug under the assumption that it is no longer an issue. If you still experience this bug, please feel free to reopen the bug report.
Hi. I wanted to create a new bugreport but this one seems to suit my problem. OS: Fedora 18 x86_64 HW: Lenovo ThinkPad E320 RAM: 8G (upgraded from 4) Linux jarvis 3.8.3-201.fc18.x86_64 #1 SMP Thu Mar 14 21:28:05 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux I am not able to upload any data bigger than a few megs to the target storage over the ethernet (1Gbit nic, 1Gbit connection). I tried: scp, nfs, sshfs, ftp When I start the upload, the speed goes rapidly down and dmesg starts to generate thousand messages like: DMA: Out of SW-IOMMU space for 1448 bytes at device 0000:08:00.0 Number of bytes changes. The speed goes down from MB to kB and if I let it continue, system will freeze. In fedora kernel 3.7.7 is everything ok. In 3.8.1, 3.8.2 and 3.8.3 (see my uname above) it is not ok. lspci: 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09) 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) 00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04) 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04) 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b4) 00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b4) 00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b4) 00:1c.5 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 6 (rev b4) 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04) 00:1f.0 ISA bridge: Intel Corporation HM65 Express Chipset Family LPC Controller (rev 04) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 04) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 04) 02:00.0 Network controller: Intel Corporation Centrino Wireless-N 1000 [Condor Peak] 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01) 03:00.1 SD Host controller: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01) 08:00.0 Ethernet controller: Atheros Communications Inc. AR8151 v2.0 Gigabit Ethernet (rev c0) In kernel 3.7.7 - never happend In kernels 3.8.1, 3.8.3, 3.8.3 - always happend with high network traffic Thank you for help. JJ
Hi, same problem as Jan Jurko OS: Fedora 18 x86_64 lspci: 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor DRAM Controller (rev 09) 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) 00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04) 00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 (rev 04) 00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04) 00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4) 00:1c.3 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 4 (rev c4) 00:1c.5 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c4) 00:1c.6 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 7 (rev c4) 00:1c.7 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 8 (rev c4) 00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 (rev 04) 00:1f.0 ISA bridge: Intel Corporation Z77 Express Chipset LPC Controller (rev 04) 00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04) 00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus Controller (rev 04) 01:00.0 VGA compatible controller: NVIDIA Corporation Device 11c0 (rev a1) 01:00.1 Audio device: NVIDIA Corporation Device 0e0b (rev a1) 03:00.0 PCI bridge: PLX Technology, Inc. PEX8112 x1 Lane PCI Express-to-PCI Bridge (rev aa) 04:04.0 Multimedia audio controller: C-Media Electronics Inc CMI8788 [Oxygen HD Audio] 05:00.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 30) 07:00.0 Ethernet controller: Atheros Communications Inc. AR8151 v2.0 Gigabit Ethernet (rev c0) 08:00.0 USB controller: Etron Technology, Inc. EJ168 USB 3.0 Host Controller (rev 01) it starts with message like : DMA: Out of SW-IOMMU space for 30432 bytes at device 0000:07:00.0 then DMA: Out of SW-IOMMU space for 2336 bytes at device 0000:07:00.0 and DMA: Out of SW-IOMMU space for 54 bytes at device 0000:07:00.0 after that, a reboot is needed.