Bug 892811 - AR9485: ath: phy0: Failed to wakeup in 500us. DMA failed to stop in 10 ms AR_CR=0xffffffff AR_DIAG_SW=0xffffffff DMADBG_7=0xffffffff
Summary: AR9485: ath: phy0: Failed to wakeup in 500us. DMA failed to stop in 10 ms AR_...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 17
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: John Greene
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-07 21:58 UTC by Damian Wrobel
Modified: 2021-06-29 09:46 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-03-11 19:17:49 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
/var/log/messages ath9k excerpt (617.44 KB, text/plain)
2013-01-07 21:58 UTC, Damian Wrobel
no flags Details
debug filesystem dump of /sys/kernel/debug/ieee80211/phy/* (540.00 KB, application/x-tar)
2013-02-02 15:36 UTC, Damian Wrobel
no flags Details

Description Damian Wrobel 2013-01-07 21:58:31 UTC
Created attachment 674375 [details]
/var/log/messages ath9k excerpt

Description of problem:

After the latest upgrade to the kernel 3.6.11-1 I noticed the problem with the following Wifi card type:

03:00.0 Network controller: Atheros Communications Inc. AR9485 Wireless Network Adapter (rev 01)
        Subsystem: Samsung Electronics Co Ltd Device 4105
        Flags: bus master, fast devsel, latency 0, IRQ 17
        Memory at f7c00000 (64-bit, non-prefetchable) [size=512K]
        Expansion ROM at f7c80000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 2
        Capabilities: [50] MSI: Enable- Count=1/4 Maskable+ 64bit+
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
        Kernel driver in use: ath9k

In the /var/log/messages it started to dump a lot of messages as following:

[98739.471337] ath: phy0: Failed to wakeup in 500us
[98739.581268] ath: phy0: Failed to wakeup in 500us
[98739.691165] ath: phy0: Failed to wakeup in 500us
[98739.701760] ath: phy0: Failed to wakeup in 500us
[98739.712877] ath: phy0: DMA failed to stop in 10 ms AR_CR=0xffffffff AR_DIAG_SW=0xffffffff DMADBG_7=0xffffffff
[98739.712897] ath: phy0: Could not stop RX, we could be confusing the DMA engine when we start RX up
[98739.773400] ath: phy0: Failed to stop TX DMA, queues=0x10f!
[98739.884046] ath: phy0: Chip reset failed
[98739.884050] ath: phy0: Unable to reset channel, reset status -22

more complete log is attached.

Version-Release number of selected component (if applicable):
kernel-PAE-3.6.11-1.fc17.i686

How reproducible:
Couldn't find reproducible path. I happened twice after leaving a laptop for a while without using it directly or remotely.

Steps to Reproduce:
Try to not use a machine for a while.

Actual results:
The Wifi card suddenly stopped to work. Reboot cures the problem.

Expected results:
Wifi is working incessantly.


Additional info:

It seems to be similar issue to the one described in the bug #755370 or [1], [2] on various Atheros cards.

I've tried to check differences between the kernel version 3.6.11 and 3.7 and even on a git {master|wireless-testing}, but couldn't find any patch or fix which might address this issue. 

The only patch which modifies the area which is very close where the problem seems to begin is the following patch[3] from the openwrt project.

Rebuilding the kernel sources[4] with just the patch[3] applied so far fixes the problem. Actually my son is using the laptop in a similar way as before for more than a week without any Wifi problems.

Please check with your deep expertise whether this patch could really address this issue and consider to apply it, if it sounds reasonable.

[1]. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/736171
[2]. https://dev.openwrt.org/ticket/9654
[3]. https://dev.openwrt.org/browser/trunk/package/mac80211/patches/552-ath9k_rx_dma_stop_check.patch?rev=34910
[4]. http://koji.fedoraproject.org/koji/buildinfo?buildID=373325

Comment 1 John Greene 2013-01-11 15:49:08 UTC
Did the problem occur after a suspend / resume?  Seems to have some indication of this in the log or was it just ok then stopped without a suspend / resume?  When it stops it's really, really gone.  Reset of chip doesn't help either.  If so, avoiding suspend resume may be a temporary if unsatisfying workaround..

Could be related to a PCI bus issue also, some PCI updates may help here too.

Comment 2 Damian Wrobel 2013-01-11 18:22:38 UTC
(In reply to comment #1)
> Did the problem occur after a suspend / resume?
Last time it happened it was ~10 hours, during which the Wifi worked properly, since the last resume which was the 6th suspend/resume cycle after the boot (uptime was ~3days).

> When it stops it's really, really gone.  Reset of chip doesn't help either.
I didn't try, but the log says:

ath: phy0: Chip reset failed

> Could be related to a PCI bus issue also, some PCI updates may help here too.

Maybe, but I don't observe any other problem with a different PCI device.

On the other hand the same symptoms could be observed on a completely different hardware and architecture (the aforementioned openwrt issue) - it's MIPS based where the suspend/resume functionality is not used at all and the PCI subsystem has a little bit different timings.

Additionally searching in linux sources for the following:

$ git log --grep "DMA failed to stop" drivers/net/wireless/ath/ath9k

is showing that this problem is known for at least 2 years, and grepping the output of above command for:

| grep -i PCI

didn't get any indication that people blame the PCI subsystem for this issue.

Comment 3 John Greene 2013-01-11 20:42:06 UTC
Just a guess here from a previous life, but when I see 0xff's from device register reads like this and the phrase wake up fails, it can indicate that the wifi chip is in a power down state and the driver/firmware get fouled trying to get it awake at the right time. So driver tried to read registers thinking it's aware, but the chip is still powered down and deaf.  If this is the case, reset will be the only way to recover. A driver/firmware issue, maybe hardware possibly since as you point out it exists in MIPS land on AP too.  Could be more than one issue here too: the "can't stop DMA" things could be related to a sleepy chip not completing either causing a DMA timeout.

To find out, if you can modify the max platform power down state for wifi driver to disallow a deep sleep mode and see if the problem clears up some as a debug step. May not be exposed in the driver, could be a hack needed.  Will have to check around on howto.  If you know how, try that and let me know your results.

Comment 4 Wellington Poi 2013-01-21 02:16:06 UTC
Having the same problem here with F18.
~$ uname -a
Linux hell-note 3.7.2-201.fc18.x86_64 #1 SMP Fri Jan 11 22:16:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

lspci -v
...
08:00.0 Network controller: Atheros Communications Inc. AR9485 Wireless Network Adapter (rev 01)
	Subsystem: Dell Device 0209
	Flags: bus master, fast devsel, latency 0, IRQ 17
	Memory at c1500000 (64-bit, non-prefetchable) [size=512K]
	Expansion ROM at 9fb00000 [disabled] [size=64K]
	Capabilities: [40] Power Management version 2
	Capabilities: [50] MSI: Enable- Count=1/4 Maskable+ 64bit+
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Virtual Channel
	Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
	Kernel driver in use: ath9k


I'm trying the steps from the bug #815377 for F16.

iwconfig wlanX power off
iw dev wlanX set power_save off

This helped me when using Fedora 17

Comment 5 John Greene 2013-01-21 14:17:31 UTC
The commands above in comment 4 will turn of power managment should work around the issue, but will keep the device on and eat battery (on laptop of course) faster.  It should be a workaround, not a permanent fix.  Check it out and let us know if this will at least stabilize your system.  The need to do hard reset is painful, and this should remove that issue till a fix for power management (there are two mechanisms, 802.11 power_save and platform power management).  Let me know what you find with the above cmds, issue should go away, but so should some power savings too.

Comment 6 Wellington Poi 2013-01-22 11:08:01 UTC
The WiFi connection is stable now by turning off power management, but the issue still needs a correct fix.
Is there any way that I can help ?

Comment 7 John Greene 2013-01-22 13:41:56 UTC
Ok, good, then we are on the right track. So by power manangement you mean *only*

iwconfig wlanX power off
or did you do the:
iw dev wlanX set power_save off

I expect it was the first one, want to be sure which you mean.

I agree with the need for a real fix also.  Will see about what might be available in terms of a true fix (firmware and/or driver probably) also.

Comment 8 Damian Wrobel 2013-01-22 16:22:09 UTC
John,

I'm using the patched kernel as aforementioned in the initial report and since that time (currently >3weeks, 1-2 suspend/resume cycles daily) my son doesn't observe any problems but I'm not sure whether it could help others. In other words until the patch works I'm not motivated to use any more drastic method like switching off the power management mechanism.

Comment 9 John W. Linville 2013-01-22 18:46:54 UTC
F17 test kernels with the patch[3] from the original description are building here:

http://koji.fedoraproject.org/koji/taskinfo?taskID=4894611

Please give them a try when they finish building, and post the results here -- thanks!

Comment 10 Damian Wrobel 2013-02-02 15:30:34 UTC
(In reply to comment #9)
Wifi is working without any problems.

$ uname -a
Linux samsung 3.7.4-101.bz892811.1.fc17.i686.PAE #1 SMP Tue Jan 22 19:41:17 UTC 2013 i686 i686 i386 GNU/Linux

$ uptime
15:56:32 up 9 days, 18:31,  8 users,  load average: 0.17, 0.18, 0.22

$ grep 'PM: noirq resume' /var/log/messages  | wc -l
15

$ ifconfig wlan0 | grep X
        RX packets 3274637  bytes 3466636800 (3.2 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2070892  bytes 1406817546 (1.3 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

$ grep ath9k /proc/interrupts
 17:    6367111       3845       2822       2419      17500      11628       8991       7687   IO-APIC-fasteoi   ath9k

Comment 11 Damian Wrobel 2013-02-02 15:36:51 UTC
Created attachment 691989 [details]
debug filesystem dump of /sys/kernel/debug/ieee80211/phy/*

Please find attached a full dump of the /sys/kernel/debug/ieee80211/phy/ directory in case it provides some useful statistical data.

Comment 13 Minh Hieu 2021-06-29 09:46:00 UTC
i see the file .tar and they said debug filesystem dump of /sys/kernel/debug/ieee80211/phy/*
i don't know what i should to do now am a new user and i don't know many thing in linux.
Thanks


Note You need to log in before you can comment on or make changes to this bug.