Bug 743435

Summary: ath9K: Wireless connection drops under traffic load
Product: [Fedora] Fedora Reporter: George Iosif <giosif>
Component: kernelAssignee: John Greene <jogreene>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 17CC: gansalmon, itamar, jonathan, jwboyer, kernel-maint, linville, madhu.chinakonda, mcgrof, rmy, shafi.wireless, statement
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: wireless ath9k first=2.6.40.4
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-01 01:23:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
lspci output
none
iwconfig output
none
dmesg output after the initial wireless connection (i.e. before the issue showed up)
none
dmesg output after the issue showed up none

Description George Iosif 2011-10-04 22:38:34 UTC
Created attachment 526339 [details]
lspci output

Description of problem:
I am using a netbook with an Atheros AR928X wireless adapter (using the ath9k driver) and everything seems to work fine:
 * I can connect to my home SSID (using WPA2-PSK).
 * I can send and receive traffic.
However, every time I am doing a large transfer, the wireless connection will eventually drop, with NetworkManager reconnecting to the SSID. After a few sequential drops and reassociations, the wireless connection remains up, but no traffic can be passed.

Version-Release number of selected component (if applicable):
kernel-2.6.40.4-5.fc15.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Connect to SSID.
2. Initiate the transfer of a large file.
  
Actual results:
Transfer starts and works for a while, then it stalls. After a few seconds, transfer is resumed, only to stall again. Eventually, after several such cycles, the transfer is interrupted and no further communication can take place over the wireless connection, even if the connection seems to be up.

Expected results:
Transfer proceeds & completes without any interruptions.

Additional info:
Running "dmesg" shows the following errors:
 ath: Failed to stop TX DMA!
 ath: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x42000020 DMADBG_7=0x000067c0
 ath: Could not stop RX, we could be confusing the DMA engine when we start RX up

This problem shows up in both Fedora 15 w/ latest updates and Fedora 16 Alpha (I tried the Alpha in both 32 and 64 bit variants, with the same results), as all of these releases use the 2.0.40.x (a.k.a. 3.0.x) kernel.
With Fedora 15, I tried different ath9k module parameters, also installed the wireless-compat-next package, but the result is the same.

The issue does NOT happen in 2.6.38.6 (the kernel that came initially with Fedora 15 on the day of its release), so I am guessing this is a regression in the ath9k driver (or any of the wireless stack modules on which ath9k depends).

Comment 1 George Iosif 2011-10-04 22:39:17 UTC
Created attachment 526340 [details]
iwconfig output

Comment 2 George Iosif 2011-10-04 22:40:32 UTC
Created attachment 526341 [details]
dmesg output after the initial wireless connection (i.e. before the issue showed up)

Comment 3 George Iosif 2011-10-04 22:41:17 UTC
Created attachment 526342 [details]
dmesg output after the issue showed up

Comment 4 John W. Linville 2011-11-07 19:35:15 UTC
Luis, any chance you can get an ath9k person to look at this?

Comment 5 Ron Yorston 2011-11-27 11:16:25 UTC
I'm seeing a very similar issue, but with ath5k on F16, kernel 3.1.2-1.fc16.i686. My hardware is:

Atheros Communications Inc. AR242x / AR542x Wireless Network Adapter (PCI-Express) (rev 01)

If I try to upload a large file to a local server using sftp the transfer eventually stalls and the network connection becomes unusable until I disable and reenable it.  The only way I can perform an upload is using the -l flag to sftp to limit the bandwidth.

Downloading a large file with sftp seems to always fail with the error: Corrupted MAC on input. This doesn't cause the connection to become unusable and the -l flag doesn't help.

Downloading using ftp results in a file of the correct size being delivered, but with an incorrect md5sum.  In none of these cases do I see anything that looks relevant in dmesg or /var/log/messages.

The same hardware works fine with F14.

Comment 6 Charles Bovy 2012-01-25 10:16:46 UTC
I'm seeing a very similar issue, since upgrade to F16.

The following hardware:
02:00.0 Network controller: Intel Corporation Centrino Advanced-N 6200 (rev 35)

[   13.525187] Intel(R) Wireless WiFi Link AGN driver for Linux, in-tree:d
[   13.525190] Copyright(c) 2003-2011 Intel Corporation
[   13.543069] iwlwifi 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[   13.543109] iwlwifi 0000:02:00.0: setting latency timer to 64
[   13.543157] iwlwifi 0000:02:00.0: pci_resource_len = 0x00002000
[   13.543160] iwlwifi 0000:02:00.0: pci_resource_base = f7f20000
[   13.543162] iwlwifi 0000:02:00.0: HW Revision ID = 0x35
[   13.543309] iwlwifi 0000:02:00.0: irq 43 for MSI/MSI-X
[   13.543405] iwlwifi 0000:02:00.0: Detected Intel(R) Centrino(R) Advanced-N 6200 AGN, REV=0x74
[   13.543736] iwlwifi 0000:02:00.0: L1 Enabled; Disabling L0S
[   13.563027] iwlwifi 0000:02:00.0: device EEPROM VER=0x43a, CALIB=0x6
[   13.563031] iwlwifi 0000:02:00.0: Device SKU: 0X1f0
[   13.563075] iwlwifi 0000:02:00.0: Tunable channels: 13 802.11bg, 24 802.11a channels
[   13.656650] iwlwifi 0000:02:00.0: loaded firmware version 9.221.4.1 build 25532

Running kernel: 3.2.1-3.fc16.i686.PAE #1 SMP Mon Jan 23 15:37:21 UTC 2012 i686 i686 i386 GNU/Linux

I see problems while using SSH, SCP, SSL and VPN. Packets get corrupted after a certain period. Very hard to reproduce, but it always happens while connected via wireless.

Any idea how to debug?

Comment 7 Ron Yorston 2012-01-25 10:48:44 UTC
Actually I've noticed that the problems I reported above seem to have gone away recently.  I can't pin down when the improvement happened, but it was in the last month or so.

Comment 8 Josh Boyer 2012-06-07 14:49:26 UTC
Is this still happening with 2.6.43/3.3?

Comment 9 Ron Yorston 2012-06-07 20:34:23 UTC
I'm still seeing wireless connections stalling with the 3.3 kernel in F16 and F17.  2.6.35 in F14 is fine.

Comment 10 Ron Yorston 2012-06-07 20:36:40 UTC
I'm still seeing wireless connections stalling with the 3.3 kernel in F16 and F17.  2.6.35 in F14 is fine.

Comment 11 Charles Bovy 2012-06-08 09:22:17 UTC
I'm running 3.3.7-1.fc16.i686.PAE now and it seems that problems have gone away.

Comment 12 Dave Jones 2012-06-18 22:20:17 UTC
This bug is started to collect a lot of 'me too's from unrelated problems.
If you're still seeing a problem on a chipset that isn't ath9k, please file a separate bug, and we'll track it there.

George, can you retest with 3.4 please ?

Comment 13 Chris Raess 2012-06-19 18:31:14 UTC
Hello all

I have the same problem since 3.4 (booth) on fc17. 
Kernel 3.3.7 was working fine with my WiFi - But now i have a total freeze when I download large Files. Tested "pci=nomsi" - No sulution.

Comment 14 George Iosif 2012-07-05 20:49:12 UTC
(In reply to comment #12)
> This bug is started to collect a lot of 'me too's from unrelated problems.
> If you're still seeing a problem on a chipset that isn't ath9k, please file
> a separate bug, and we'll track it there.
> 
> George, can you retest with 3.4 please ?

Hi Dave,

Yes, I would agree that the problems other people describe here don't seem to be related to what I've described.
In regards to re-testing, I'm afraid I moved that laptop to another distro (which doesn't have the problem), so I can't do that anymore.

From my perspective, this bug can be closed.

Regards,
George Iosif

Comment 15 Chris Raess 2012-07-07 09:04:00 UTC
U can solve this bug by compiling the compat-wireless-2012-06-29 and use its ath9k driver - i have no freezes now since four days.

wget http://linuxwireless.org/download/compat-wireless-2.6/compat-wireless.tar.bz2
unpack and cd in that file
./scripts/driver-select atheros 
make
sudo make install

after:
sudo make unload
sudo make wlunload
sudo make btunload

at last "mudprobe step" (u will see it in your terminal) select 
sudo modprobe ath9k


reboot and test

Comment 16 John Greene 2013-02-05 19:39:49 UTC
Josh Boyer: could you pick up the code in attachment 3 [details] for fedora build?

https://dev.openwrt.org/browser/trunk/package/mac80211/patches/552-ath9k_rx_dma_stop_check.patch?rev=34910

User reports fedora test build works well.  The openwrt folks are still testing it before submitting to upstream.  Are you able to put this in fedora soon?

Comment 17 Josh Boyer 2013-02-05 20:18:51 UTC
(In reply to comment #16)
> Josh Boyer: could you pick up the code in attachment 3 [details] for fedora
> build?
> 
> https://dev.openwrt.org/browser/trunk/package/mac80211/patches/552-
> ath9k_rx_dma_stop_check.patch?rev=34910
> 
> User reports fedora test build works well.  The openwrt folks are still
> testing it before submitting to upstream.  Are you able to put this in
> fedora soon?

I'm... confused.  The link you pointed to looks like it's been upstream since 2.6.39 as part of:

commit 5882da02e9d9089b7e8c739f3e774aaeeff8b7ba
Author: Felix Fietkau <nbd>
Date:   Fri Apr 8 20:13:18 2011 +0200

    ath9k_hw: fix stopping rx DMA during resets
    
    During PHY errors, the MAC can sometimes fail to enter an idle state on olde
    hardware (before AR9380) after an rx stop has been requested.

which is very much already in Fedora.

Is there something else you meant to point to?

Comment 18 John W. Linville 2013-02-06 14:12:51 UTC
Commit 5882da02 does have a hunk that resembles part of the referenced patch, but they are not the same.

More importantly, comment 16 should have been added to bug 892811 instead of here...

Comment 19 Josh Boyer 2013-02-06 14:53:45 UTC
(In reply to comment #18)
> Commit 5882da02 does have a hunk that resembles part of the referenced
> patch, but they are not the same.

OK, I see that now.  My mistake.

> More importantly, comment 16 should have been added to bug 892811 instead of
> here...

Well, then I'll get it applied for that bug.

Comment 20 John W. Linville 2013-02-06 15:39:14 UTC
Cool...thanks, Josh!

Comment 21 John Greene 2013-02-07 15:57:03 UTC
Well actually my mistake.. Sorry guys.

Comment 22 Fedora End Of Life 2013-07-03 23:03:15 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 23 Fedora End Of Life 2013-08-01 01:23:28 UTC
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.