Created attachment 674375 [details] /var/log/messages ath9k excerpt Description of problem: After the latest upgrade to the kernel 3.6.11-1 I noticed the problem with the following Wifi card type: 03:00.0 Network controller: Atheros Communications Inc. AR9485 Wireless Network Adapter (rev 01) Subsystem: Samsung Electronics Co Ltd Device 4105 Flags: bus master, fast devsel, latency 0, IRQ 17 Memory at f7c00000 (64-bit, non-prefetchable) [size=512K] Expansion ROM at f7c80000 [disabled] [size=64K] Capabilities: [40] Power Management version 2 Capabilities: [50] MSI: Enable- Count=1/4 Maskable+ 64bit+ Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Virtual Channel Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00 Kernel driver in use: ath9k In the /var/log/messages it started to dump a lot of messages as following: [98739.471337] ath: phy0: Failed to wakeup in 500us [98739.581268] ath: phy0: Failed to wakeup in 500us [98739.691165] ath: phy0: Failed to wakeup in 500us [98739.701760] ath: phy0: Failed to wakeup in 500us [98739.712877] ath: phy0: DMA failed to stop in 10 ms AR_CR=0xffffffff AR_DIAG_SW=0xffffffff DMADBG_7=0xffffffff [98739.712897] ath: phy0: Could not stop RX, we could be confusing the DMA engine when we start RX up [98739.773400] ath: phy0: Failed to stop TX DMA, queues=0x10f! [98739.884046] ath: phy0: Chip reset failed [98739.884050] ath: phy0: Unable to reset channel, reset status -22 more complete log is attached. Version-Release number of selected component (if applicable): kernel-PAE-3.6.11-1.fc17.i686 How reproducible: Couldn't find reproducible path. I happened twice after leaving a laptop for a while without using it directly or remotely. Steps to Reproduce: Try to not use a machine for a while. Actual results: The Wifi card suddenly stopped to work. Reboot cures the problem. Expected results: Wifi is working incessantly. Additional info: It seems to be similar issue to the one described in the bug #755370 or [1], [2] on various Atheros cards. I've tried to check differences between the kernel version 3.6.11 and 3.7 and even on a git {master|wireless-testing}, but couldn't find any patch or fix which might address this issue. The only patch which modifies the area which is very close where the problem seems to begin is the following patch[3] from the openwrt project. Rebuilding the kernel sources[4] with just the patch[3] applied so far fixes the problem. Actually my son is using the laptop in a similar way as before for more than a week without any Wifi problems. Please check with your deep expertise whether this patch could really address this issue and consider to apply it, if it sounds reasonable. [1]. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/736171 [2]. https://dev.openwrt.org/ticket/9654 [3]. https://dev.openwrt.org/browser/trunk/package/mac80211/patches/552-ath9k_rx_dma_stop_check.patch?rev=34910 [4]. http://koji.fedoraproject.org/koji/buildinfo?buildID=373325
Did the problem occur after a suspend / resume? Seems to have some indication of this in the log or was it just ok then stopped without a suspend / resume? When it stops it's really, really gone. Reset of chip doesn't help either. If so, avoiding suspend resume may be a temporary if unsatisfying workaround.. Could be related to a PCI bus issue also, some PCI updates may help here too.
(In reply to comment #1) > Did the problem occur after a suspend / resume? Last time it happened it was ~10 hours, during which the Wifi worked properly, since the last resume which was the 6th suspend/resume cycle after the boot (uptime was ~3days). > When it stops it's really, really gone. Reset of chip doesn't help either. I didn't try, but the log says: ath: phy0: Chip reset failed > Could be related to a PCI bus issue also, some PCI updates may help here too. Maybe, but I don't observe any other problem with a different PCI device. On the other hand the same symptoms could be observed on a completely different hardware and architecture (the aforementioned openwrt issue) - it's MIPS based where the suspend/resume functionality is not used at all and the PCI subsystem has a little bit different timings. Additionally searching in linux sources for the following: $ git log --grep "DMA failed to stop" drivers/net/wireless/ath/ath9k is showing that this problem is known for at least 2 years, and grepping the output of above command for: | grep -i PCI didn't get any indication that people blame the PCI subsystem for this issue.
Just a guess here from a previous life, but when I see 0xff's from device register reads like this and the phrase wake up fails, it can indicate that the wifi chip is in a power down state and the driver/firmware get fouled trying to get it awake at the right time. So driver tried to read registers thinking it's aware, but the chip is still powered down and deaf. If this is the case, reset will be the only way to recover. A driver/firmware issue, maybe hardware possibly since as you point out it exists in MIPS land on AP too. Could be more than one issue here too: the "can't stop DMA" things could be related to a sleepy chip not completing either causing a DMA timeout. To find out, if you can modify the max platform power down state for wifi driver to disallow a deep sleep mode and see if the problem clears up some as a debug step. May not be exposed in the driver, could be a hack needed. Will have to check around on howto. If you know how, try that and let me know your results.
Having the same problem here with F18. ~$ uname -a Linux hell-note 3.7.2-201.fc18.x86_64 #1 SMP Fri Jan 11 22:16:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux lspci -v ... 08:00.0 Network controller: Atheros Communications Inc. AR9485 Wireless Network Adapter (rev 01) Subsystem: Dell Device 0209 Flags: bus master, fast devsel, latency 0, IRQ 17 Memory at c1500000 (64-bit, non-prefetchable) [size=512K] Expansion ROM at 9fb00000 [disabled] [size=64K] Capabilities: [40] Power Management version 2 Capabilities: [50] MSI: Enable- Count=1/4 Maskable+ 64bit+ Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Virtual Channel Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00 Kernel driver in use: ath9k I'm trying the steps from the bug #815377 for F16. iwconfig wlanX power off iw dev wlanX set power_save off This helped me when using Fedora 17
The commands above in comment 4 will turn of power managment should work around the issue, but will keep the device on and eat battery (on laptop of course) faster. It should be a workaround, not a permanent fix. Check it out and let us know if this will at least stabilize your system. The need to do hard reset is painful, and this should remove that issue till a fix for power management (there are two mechanisms, 802.11 power_save and platform power management). Let me know what you find with the above cmds, issue should go away, but so should some power savings too.
The WiFi connection is stable now by turning off power management, but the issue still needs a correct fix. Is there any way that I can help ?
Ok, good, then we are on the right track. So by power manangement you mean *only* iwconfig wlanX power off or did you do the: iw dev wlanX set power_save off I expect it was the first one, want to be sure which you mean. I agree with the need for a real fix also. Will see about what might be available in terms of a true fix (firmware and/or driver probably) also.
John, I'm using the patched kernel as aforementioned in the initial report and since that time (currently >3weeks, 1-2 suspend/resume cycles daily) my son doesn't observe any problems but I'm not sure whether it could help others. In other words until the patch works I'm not motivated to use any more drastic method like switching off the power management mechanism.
F17 test kernels with the patch[3] from the original description are building here: http://koji.fedoraproject.org/koji/taskinfo?taskID=4894611 Please give them a try when they finish building, and post the results here -- thanks!
(In reply to comment #9) Wifi is working without any problems. $ uname -a Linux samsung 3.7.4-101.bz892811.1.fc17.i686.PAE #1 SMP Tue Jan 22 19:41:17 UTC 2013 i686 i686 i386 GNU/Linux $ uptime 15:56:32 up 9 days, 18:31, 8 users, load average: 0.17, 0.18, 0.22 $ grep 'PM: noirq resume' /var/log/messages | wc -l 15 $ ifconfig wlan0 | grep X RX packets 3274637 bytes 3466636800 (3.2 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 2070892 bytes 1406817546 (1.3 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 $ grep ath9k /proc/interrupts 17: 6367111 3845 2822 2419 17500 11628 8991 7687 IO-APIC-fasteoi ath9k
Created attachment 691989 [details] debug filesystem dump of /sys/kernel/debug/ieee80211/phy/* Please find attached a full dump of the /sys/kernel/debug/ieee80211/phy/ directory in case it provides some useful statistical data.
Added [3]. https://dev.openwrt.org/browser/trunk/package/mac80211/patches/552-ath9k_rx_dma_stop_check.patch?rev=34910 to F17-rawhide.
i see the file .tar and they said debug filesystem dump of /sys/kernel/debug/ieee80211/phy/* i don't know what i should to do now am a new user and i don't know many thing in linux. Thanks