Bug 541756 - Bad and buggy ath9k driver in Fedora 12
Summary: Bad and buggy ath9k driver in Fedora 12
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 12
Hardware: i686
OS: Linux
low
medium
Target Milestone: ---
Assignee: John W. Linville
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-11-27 00:11 UTC by Tomasz Sałaciński
Modified: 2013-01-21 20:27 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-08 20:36:36 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Update initvals for AR9285 based on 2.6.32 work (9.08 KB, patch)
2009-12-04 16:49 UTC, Luis R. Rodriguez
no flags Details | Diff
dmesg from 2.6.31.6-145.fc12.i686 with "disassociating" all over the place (44.71 KB, text/plain)
2009-12-07 05:57 UTC, Michal Jaegermann
no flags Details
an output of 'cat /proc/interrupts' (1.40 KB, text/plain)
2009-12-07 05:58 UTC, Michal Jaegermann
no flags Details
layout of PCI buses (1.25 KB, text/plain)
2009-12-07 06:00 UTC, Michal Jaegermann
no flags Details

Description Tomasz Sałaciński 2009-11-27 00:11:23 UTC
Description of problem:
Starting from FC12 Fedora introduced a new kernel, which have a buggy ath9k driver. I am using a Acer Aspire 5737Z laptop with integrated wireless card and with default driver from Fedora I cannot use it at all. I am getting a packet loss of 80%, constant disconnections, very bad link quality reported by iwconfig., low signal strength reported by NetworkManager (12% when I am standing 2 inches from the router). I've tried to disable power management, set the network to B (11mbps), stand close to the AP, but nothing helped.

I've compiled and installed a new version of ath9k driver available on this website:

http://linuxwireless.org/en/users/Drivers/ath9k

Right now wireless card works perfectly (98% signal strength, 65 link quality, no packet loss at all and superb performance). Please, upgrade your wireless ath9k driver to the newer version.

Version-Release number of selected component (if applicable):


How reproducible:
Connect and try to surf.

Steps to Reproduce:
1. Install Fedora 12
2. Update it
3. Connect to internet
  
Actual results:
Packet loss, low performance (1mbps, should be 20mbps)

Expected results:
Card works fine

Additional info:

Comment 1 Terry Moore 2009-11-30 02:09:55 UTC
I too am having really horrible preformance with my Asus Eee 1005ha with the Atheros ATH9k driver.  Upon a system resume the wireless card connects to my access point but doesn't transmit data.  I have to disable the card and re-enable it in order to transmit data. 

Also I sit no more then 6 feet from my access point and these are my statistics:

wlan0     IEEE 802.11bgn  ESSID:"gamehenge"  
          Mode:Managed  Frequency:2.437 GHz  Access Point: 00:1D:7E:38:C2:4E   
          Bit Rate=1 Mb/s   Tx-Power=20 dBm   
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Power Management:on
          Link Quality=46/70  Signal level=-64 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:0

I had similar performance issues when I was running Ubuntu 9.04/9.10. 

After following the advice given above (latest ath9k module from 11/29/2009), my wireless card now has the following stats:

wlan0     IEEE 802.11bgn  ESSID:"gamehenge"  
          Mode:Managed  Frequency:2.437 GHz  Access Point: 00:1D:7E:38:C2:4E   
          Bit Rate=1 Mb/s   Tx-Power=20 dBm   
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Power Management:off
          Link Quality=70/70  Signal level=-38 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:0

Now after resume the wireless card connects and works right away.  

Thanks Tomasz Sałaciński

Comment 2 David Woodhouse 2009-12-02 11:57:32 UTC
I don't believe that F12 has any patches which touch the ath9k driver. If it's broken in F-12, that means it's broken in 2.6.31.6. 

Maybe the ath9k developers are not properly ensuring that necessary bug fixes are (backported and) submitted to the upstream stable releases?

Comment 3 Tomasz Sałaciński 2009-12-02 12:12:41 UTC
(In reply to comment #2)
> I don't believe that F12 has any patches which touch the ath9k driver. If it's
> broken in F-12, that means it's broken in 2.6.31.6. 
> 
> Maybe the ath9k developers are not properly ensuring that necessary bug fixes
> are (backported and) submitted to the upstream stable releases?  

Maybe so, but this still leaves FC12 with broken ath9k driver, while for example Ubuntu works correctly since it uses older kernel. Is there a possibility to replace that driver (kernel update with modified ath9k)?

Comment 4 David Woodhouse 2009-12-02 12:41:06 UTC
Yeah, because distributions hacking on wireless drivers to ship something like the 2.6.30 ath9k with 2.6.31 is a great idea :)

Much better, surely, for the ath9k developers not to introduce regressions in to the release kernels?

Please could you test the 2.6.32 kernel from rawhide (it should install OK onto F-12), and see whether the problem is fixed there? If so, then we have a good idea where to look to find the fix, and we can chase them to submit it to 2.6.31-stable too.

Otherwise, we'll need to bisect the changes between 2.6.31 and the 'earlier' kernel that works, to find out where they broke it.

Comment 5 Tomasz Sałaciński 2009-12-02 12:47:33 UTC
I don't know how to install kernel from rawhide repo - if you can, could you please guide me? I'll do it as soon as I get home from work (which will be in around 5-6 hours from now). Thanks

Comment 6 David Woodhouse 2009-12-02 13:32:35 UTC
You should be able to do it with just 'yum --enablerepo=rawhide update kernel'.

It should only update the kernel and kernel-firmware packages.

Comment 7 Luis R. Rodriguez 2009-12-02 16:18:07 UTC
These two patches are stable fixes:

ath9k: Fix maximum tx fifo settings for single stream devices
ath9k: fix processing of TX PS null data frames

They may need some backport work.

Comment 8 David Woodhouse 2009-12-02 16:48:32 UTC
Thanks, Luis. Please could you provide the commit ID for those? It's much easier to find things unambiguously if we refer to them by their commit ID.

Is anyone currently working on backporting them and submitting them to stable?

Comment 9 Luis R. Rodriguez 2009-12-02 17:04:39 UTC
I didn't provide commit IDs as they are not yet merged even to 2.6.32 and only wireless-testing. Here is another stable fix:

commit 4f432f2cf87ac29f922ad09b139984545c8cd95d
Author: Sujith <Sujith.Manoharan>
Date:   Fri Oct 9 09:51:28 2009 +0530

    ath9k: Fix TX hang poll routine
    
    When TX is hung, the chip is reset. Ensure that
    the chip is awake by using the PS wrappers.
    
    Signed-off-by: Sujith <Sujith.Manoharan>
    Signed-off-by: John W. Linville <linville>


The other ones I was referring to are:

commit ec3659c91be483dc8fd4f7951073e5d2cfe60e9e
Author: Luis R. Rodriguez <lrodriguez>
Date:   Tue Nov 24 21:37:57 2009 -0500

    ath9k: Fix maximum tx fifo settings for single stream devices
    
    Atheros single stream AR9285 and AR9271 have half the PCU TX FIFO
    buffer size of that of dual stream devices. Dual stream devices
    have a max PCU TX FIFO size of 8 KB while single stream devices
    have 4 KB. Single stream devices have an issue though and require
    hardware only to use half of the amount of its capable PCU TX FIFO
    size, 2 KB and this requires a change in software.
    
    Technically a change would not have been required (except for frame
    burst considerations of 128 bytes) if these devices would have been
    able to use the full 4 KB of the PCU TX FIFO size but our systems
    engineers recommend 2 KB to be used only. We enforce this through
    software by reducing the max frame triggger level to 2 KB.
    
    Fixing the max frame trigger level should then have a few benefits:
    
      * The PER will now be adjusted as designed for underruns when the
        max trigger level is reached. This should help alleviate the
        bus as the rate control algorithm chooses a slower rate which
        should ensure frames are transmitted properly under high system
        bus load.
    
      * The poll we use on our TX queues should now trigger and work
        as designed for single stream devices. The hardware passes
        data from each TX queue on the PCU TX FIFO queue respecting each
        queue's priority. The new trigger level ensures this seeding of
        the PCU TX FIFO queue occurs as designed which could mean avoiding
        false resets and actually reseting hw correctly when a TX queue
        is indeed stuck.

      * Some undocumented / unsupported behaviour could have been triggered
        when the max trigger level level was being set to 4 KB on single
        stream devices. Its not clear what this issue was to me yet.
    
    Cc: Kyungwan Nam <kyungwan.nam>
    Cc: Bennyam Malavazi <bennyam.malavazi>
    Cc: Stephen Chen <stephen.chen>
    Cc: Shan Palanisamy <shan.palanisamy>
    Cc: Paul Shaw <paul.shaw>
    Signed-off-by: Vasanthakumar Thiagarajan <vasanth>
    Signed-off-by: Luis R. Rodriguez <lrodriguez>
    Signed-off-by: John W. Linville <linville>
commit bda101ab69b4e51f450ca48e85608c293911ca91
Author: Luis R. Rodriguez <lrodriguez>
Date:   Tue Nov 24 02:53:25 2009 -0500

    ath9k: fix processing of TX PS null data frames
    
    When mac80211 was telling us to go into Powersave we listened
    and immediately turned RX off. This meant hardware would not
    see the ACKs from the AP we're associated with and hardware
    we'd end up retransmiting the null data frame in a loop
    helplessly.
    
    Fix this by keeping track of the transmitted nullfunc frames
    and only when we are sure the AP has sent back an ACK do we
    go ahead and shut RX off.
    
    Signed-off-by: Vasanthakumar Thiagarajan <vasanth>
    Signed-off-by: Vivek Natarajan <Vivek.Natarajan>
    Signed-off-by: Luis R. Rodriguez <lrodriguez>
    Signed-off-by: John W. Linville <linville>

Comment 11 Terry Moore 2009-12-03 02:56:50 UTC
I loaded the kernel 2.6.32-0.56.rc8.git1.fc13.i686 from rawhide and the link quality was 68/70.  I was able to transmit data upon resume.  Although getting the system to show something other then a black screen upon resume was a choir. But that is not an issue for this bug :) 

Please let me know if you need more details and I will boot back to that kernel and get them for you.

Comment 12 David Woodhouse 2009-12-03 09:22:20 UTC
Terry, thanks. Please could you also test the 2.6.31.6-160.fc12 kernel which John built at http://koji.fedoraproject.org/koji/buildinfo?buildID=144126 ? 

That kernel is a stable Fedora 12 kernel with just the necessary fixes that Luis identified. If those are sufficient, we'll put them into an F-12 update.

Comment 13 Terry Moore 2009-12-04 02:00:31 UTC
[motersho@minime ~]$ uname -r 
2.6.31.6-160.fc12.i686

Wireless connects and then I could pull up google.com.  I started to write this and then tried to browse another page and I couldnt browse anything.  I waited a few seconds and was able connect but connection speeds was very very slow


[motersho@minime ~]$ iwconfig wlan0
wlan0     IEEE 802.11bgn  ESSID:"gamehenge"  
          Mode:Managed  Frequency:2.437 GHz  Access Point: 00:1D:7E:38:C2:4E   
          Bit Rate=1 Mb/s   Tx-Power=20 dBm
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Power Management:on
          Link Quality=50/70  Signal level=-60 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:

I noticed the bit rate changes depending on the kernel version that I use.  I have seen a bit rate of 0 in the current stable f12 kernel and connection speed is awesome but has the resume issues.  Why does the 0 mean? Auto? 

Could take the changes that was commited to the kernel that is in rawhide and build it into the 2.6.31.x? That module has a great connection speed and the one time that was able to get my system to resume, the wireless came right back online.  

Upon resume the connection reestablished its self but connection was terrible like above also check out this ping

64 bytes from yx-in-f100.1e100.net (74.125.45.100): icmp_seq=19 ttl=53 time=100 ms
64 bytes from yx-in-f100.1e100.net (74.125.45.100): icmp_seq=20 ttl=53 time=987 ms
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
64 bytes from yx-in-f100.1e100.net (74.125.45.100): icmp_seq=49 ttl=53 time=62.2 ms
64 bytes from yx-in-f100.1e100.net (74.125.45.100): icmp_seq=50 ttl=53 time=43.1 ms
64 bytes from yx-in-f100.1e100.net (74.125.45.100): icmp_seq=51 ttl=53 time=63.0 ms
64 bytes from yx-in-f100.1e100.net (74.125.45.100): icmp_seq=52 ttl=53 time=49.2 ms



After writing above I retryed the 2.6.32-x kernel:

[motersho@minime ~]$ uname -r
2.6.32-0.56.rc8.git1.fc13.i686
[motersho@minime ~]$ iwconfig wlan0
wlan0     IEEE 802.11bgn  ESSID:"gamehenge"  
          Mode:Managed  Frequency:2.437 GHz  Access Point: 00:1D:7E:38:C2:4E   
          Bit Rate=0 kb/s   Tx-Power=20 dBm   
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Power Management:on
          Link Quality=68/70  Signal level=-42 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:0

[motersho@minime ~]$ 

I managed to get my system resume with out a black screen with this kernel and everything works great. Please take the changes that are in this kernel and place them in the 2.6.31x and I will test that for you.  Thanks for all your work, I hope all of my ramblings help fix this.

Comment 14 Luis R. Rodriguez 2009-12-04 02:43:26 UTC
bit rate 0 on iwconfig means: wext is really old and crappy please use iw if you want to actually see MCS index rates.

MCS index rates are for 802.11n.

http://wireless.kernel.org/en/users/Documentation/iw

Also check out:

http://wireless.kernel.org/en/users/Documentation

Not sure what else 'stable' material quality fixes can go from ath9k to 2.6.31... See:

http://wireless.kernel.org/en/users/Documentation/Fix_Propagation

Comment 15 Luis R. Rodriguez 2009-12-04 02:44:51 UTC
I also should note we keep track of all known bugs for ath9k here:

http://wireless.kernel.org/en/users/Drivers/ath9k/bugs

If we see anything serious we do ask for stable to pull it.

Comment 16 Luis R. Rodriguez 2009-12-04 03:11:15 UTC
Oh I have one idea.. what card do you have?

rmmod ath9k
dmesg -c > /dev/null
modprobe ath9k
dmesg -c

paste that last bit of info here. The initvals.h *can* have potentially good fixes too. I forgot to consider that.

Comment 17 Terry Moore 2009-12-04 03:59:04 UTC
[root@minime ~]# uname -r 
2.6.31.6-145.fc12.i686

[root@minime ~]# rmmod ath9k
[root@minime ~]# dmesg -c > /dev/null
[root@minime ~]# modprobe ath9k
[root@minime ~]# dmesg -c
ath9k 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
ath9k 0000:02:00.0: setting latency timer to 64
ath: EEPROM regdomain: 0x60
ath: EEPROM indicates we should expect a direct regpair map
ath: Country alpha2 being used: 00
ath: Regpair used: 0x60
phy1: Selected rate control algorithm 'ath9k_rate_control'
Registered led device: ath9k-phy1::radio
Registered led device: ath9k-phy1::assoc
Registered led device: ath9k-phy1::tx
Registered led device: ath9k-phy1::rx
phy1: Atheros AR9285 MAC/BB Rev:2 AR5133 RF Rev:e0: mem=0xf9e40000, irq=17
ADDRCONF(NETDEV_UP): wlan0: link is not ready
[root@minime ~]#

Comment 18 Terry Moore 2009-12-04 04:04:54 UTC
Just in case it helps I have an Asus Eee 1005ha.  (And for others doing searches)

Comment 19 Jaiv 2009-12-04 08:08:43 UTC
Hi guys,

I have problem with ath9k as well, but I'm using it in AP mode with hostapd (I tried both 0.7.0 and latest git version). Kernel kernel-PAE-2.6.30.9-99.fc11.i686 works smoothly, but none F12 kernel works ok. I did not try to use client mode as I this is my AP and I do not have other AP available.

Here are the kernels I tested:
kernel-PAE-2.6.31.5-127.fc12.i686 - not working
kernel-PAE-2.6.31.6-145.fc12.i686 - not working

You suggested fixed kernel:
kernel-PAE-2.6.31.6-160.fc12.i686 - not working
for this kernel I have logs:

From hostapd:
-----------------------------
STA xx:xx:xx:xx:xx:xx sent probe request for our SSID
STA xx:xx:xx:xx:xx:xx sent probe request for our SSID
MGMT (TX callback) fail
mgmt::proberesp cb
MGMT (TX callback) fail
mgmt::proberesp cb
MGMT
mgmt::auth
authentication: STA=xx:xx:xx:xx:xx:xx auth_alg=0 auth_transaction=1 status_code=0 wep=0
  New STA
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authentication OK (open system)
wlan0: STA xx:xx:xx:xx:xx:xx MLME: MLME-AUTHENTICATE.indication(xx:xx:xx:xx:xx:xx, OPEN_SYSTEM)
wlan0: STA xx:xx:xx:xx:xx:xx MLME: MLME-DELETEKEYS.request(xx:xx:xx:xx:xx:xx)
authentication reply: STA=xx:xx:xx:xx:xx:xx auth_alg=0 auth_transaction=2 resp=0 (IE len=0)
MGMT (TX callback) fail
mgmt::auth cb
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: did not acknowledge authentication response
MGMT
mgmt::assoc_req
association request: STA=xx:xx:xx:xx:xx:xx capab_info=0x411 listen_interval=10
WMM IE - hexdump(len=7): 00 50 f2 02 00 01 00
Validating WMM IE: OUI 00:50:f2  OUI type 2  OUI sub-type 0  version 1  QoS info 0x0
nl80211: Set beacon (beacon_set=1)
  new AID 1
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: association OK (aid 1)
MGMT (TX callback) fail
mgmt::assoc_resp cb
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: did not acknowledge association response
STA yy:yy:yy:yy:yy:yy sent probe request for broadcast SSID
MGMT (TX callback) fail
mgmt::proberesp cb
STA yy:yy:yy:yy:yy:yy sent probe request for broadcast SSID
MGMT (TX callback) fail
mgmt::proberesp cb
-----------------------------
It also fired a bug:

BUG: unable to handle kernel paging request at 080b5f80
IP: [<080b5f80>] 0x80b5f80
*pdpt = 000000002593c001 *pde = 0000000025926067 *pte = 0000000000000000
Oops: 0010 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:08.0/net/wlan0/broadcast
Modules linked in: ath9k mac80211 ath cfg80211 rfkill fuse via drm sunrpc padlock_aes aes_i586 aes_generic nf_nat_ftp nf_conntrack_ftp xt_pkttype xt_limit ipt_LOG iptable_mangle iptable_nat nf_nat ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq dm_multipath uinput arc4 ecb snd_via82xx gameport snd_ac97_codec ac97_bus snd_seq snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device i2c_viapro snd via_rhine soundcore i2c_core firewire_ohci firewire_core crc_itu_t r8169 mii raid0 raid1 ata_generic pata_acpi pata_via [last unloaded: rfkill]
Pid: 3573, comm: bluetooth-apple Not tainted (2.6.31.6-160.fc12.i686.PAE #1) 
EIP: 0060:[<080b5f80>] EFLAGS: 00010246 CPU: 0
EIP is at 0x80b5f80
EAX: e59f9680 EBX: e59f9680 ECX: f88fe658 EDX: 00000000
ESI: 00000000 EDI: e5965e74 EBP: e5965f8c ESP: e5965be4
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process bluetooth-apple (pid: 3573, ti=e5964000 task=e58572c0 task.ti=e5964000)
Stack:
c04d4ee9 00001000 00080097 00080097 e5965ec4 09ce9df0 00000000 00c48740
 00000000 00000000 00000000 00000001 e5965ebc c04d4818 00000019 00000000
 e58572c0 00000000 00000000 00000009 e59cf980 00000019 00000000 e5965c18
Call Trace:
[<c04d4ee9>] ? do_sys_poll+0x213/0x3c8
[<c04d4818>] ? __pollwait+0x0/0xaa
[<c04d48c2>] ? pollwake+0x0/0x65
[<c04d48c2>] ? pollwake+0x0/0x65
[<c04d48c2>] ? pollwake+0x0/0x65
[<c04d48c2>] ? pollwake+0x0/0x65
[<c04d48c2>] ? pollwake+0x0/0x65
[<c04d48c2>] ? pollwake+0x0/0x65
[<c04d48c2>] ? pollwake+0x0/0x65
[<c04d48c2>] ? pollwake+0x0/0x65
[<c04d48c2>] ? pollwake+0x0/0x65
[<c056b979>] ? avc_has_perm+0x41/0x4b
[<c0755c09>] ? scm_recv+0x32/0xa1
[<c0756967>] ? unix_stream_recvmsg+0x391/0x3b0
[<c056be05>] ? selinux_socket_recvmsg+0x1a/0x1c
[<c06e37e2>] ? __sock_recvmsg+0x56/0x60
[<c06e5680>] ? sock_aio_read+0xac/0xb8
[<c04c89e2>] ? do_sync_read+0xae/0xe9
[<c056cffd>] ? file_has_perm+0x89/0xa3
[<c0450d39>] ? autoremove_wake_function+0x0/0x34
[<c056d2ed>] ? selinux_file_permission+0x49/0x4d
[<c0566617>] ? security_file_permission+0x14/0x16
[<c04c8aba>] ? rw_verify_area+0x9d/0xc0
[<c04d51df>] ? sys_poll+0x44/0x8d
[<c0477ce0>] ? audit_syscall_exit+0xff/0x114
[<c0408f9b>] ? sysenter_do_call+0x12/0x28
Code:  Bad EIP value.
EIP: [<080b5f80>] 0x80b5f80 SS:ESP 0068:e5965be4
CR2: 00000000080b5f80

--------------------------------
Other details you requested earlier:

#modprobe ath9k
#dmesg -c
cfg80211: Calling CRDA to update world regulatory domain
cfg80211: World regulatory domain updated:
    (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
    (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
    (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
    (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
    (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
    (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
ath9k 0000:00:08.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
ath: EEPROM regdomain: 0x10
ath: EEPROM indicates we should expect a direct regpair map
ath: Country alpha2 being used: CO
ath: Regpair used: 0x10
phy0: Selected rate control algorithm 'ath9k_rate_control'
cfg80211: Calling CRDA for country: CO
Registered led device: ath9k-phy0::radio
Registered led device: ath9k-phy0::assoc
Registered led device: ath9k-phy0::tx
Registered led device: ath9k-phy0::rx
phy0: Atheros AR5416 MAC/BB Rev:2 AR2133 RF Rev:81: mem=0xf8160000, irq=17
cfg80211: Regulatory domain changed to country: CO
    (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
    (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2700 mBm)
    (5170000 KHz - 5250000 KHz @ 20000 KHz), (300 mBi, 1700 mBm)
    (5250000 KHz - 5330000 KHz @ 20000 KHz), (300 mBi, 2300 mBm)
    (5735000 KHz - 5835000 KHz @ 20000 KHz), (300 mBi, 3000 mBm)
cfg80211: Calling CRDA for country: GB
cfg80211: Regulatory domain changed to country: GB
    (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
    (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm)
    (5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)
    (5250000 KHz - 5330000 KHz @ 40000 KHz), (N/A, 2000 mBm)
    (5490000 KHz - 5710000 KHz @ 40000 KHz), (N/A, 2700 mBm)
ADDRCONF(NETDEV_UP): wlan0: link is not ready
#

Comment 20 Jaiv 2009-12-04 08:10:07 UTC
I also tested lates rawhide kernel:

kernel-PAE-2.6.32-0.65.rc8.git5.fc13.i686 - not working:

# uname -a
Linux thsw 2.6.32-0.65.rc8.git5.fc13.i686.PAE #1 SMP Thu Dec 3 01:14:42 EST 2009 i686 i686 i386 GNU/Linux
# /usr/local/bin/hostapd -ddd -K /etc/hostapd.conf
Starting hostapd: Configuration file: /etc/hostapd.conf
ctrl_interface_group=10 (from group name 'wheel')
eapol_version=2
nl80211: Add own interface ifindex 10
nl80211: Add own interface ifindex 13
BSS count 1, BSSID mask 00:00:00:00:00:00 (0 bits)
nl80211: Added 802.11b mode based on 802.11g information
Allowed channel: mode=1 chan=1 freq=2412 MHz max_tx_power=20 dBm
Allowed channel: mode=1 chan=2 freq=2417 MHz max_tx_power=20 dBm
Allowed channel: mode=1 chan=3 freq=2422 MHz max_tx_power=20 dBm
Allowed channel: mode=1 chan=4 freq=2427 MHz max_tx_power=20 dBm
Allowed channel: mode=1 chan=5 freq=2432 MHz max_tx_power=20 dBm
Allowed channel: mode=1 chan=6 freq=2437 MHz max_tx_power=20 dBm
Allowed channel: mode=1 chan=7 freq=2442 MHz max_tx_power=20 dBm
Allowed channel: mode=1 chan=8 freq=2447 MHz max_tx_power=20 dBm
Allowed channel: mode=1 chan=9 freq=2452 MHz max_tx_power=20 dBm
Allowed channel: mode=1 chan=10 freq=2457 MHz max_tx_power=20 dBm
Allowed channel: mode=1 chan=11 freq=2462 MHz max_tx_power=20 dBm
Allowed channel: mode=0 chan=1 freq=2412 MHz max_tx_power=20 dBm
Allowed channel: mode=0 chan=2 freq=2417 MHz max_tx_power=20 dBm
Allowed channel: mode=0 chan=3 freq=2422 MHz max_tx_power=20 dBm
Allowed channel: mode=0 chan=4 freq=2427 MHz max_tx_power=20 dBm
Allowed channel: mode=0 chan=5 freq=2432 MHz max_tx_power=20 dBm
Allowed channel: mode=0 chan=6 freq=2437 MHz max_tx_power=20 dBm
Allowed channel: mode=0 chan=7 freq=2442 MHz max_tx_power=20 dBm
Allowed channel: mode=0 chan=8 freq=2447 MHz max_tx_power=20 dBm
Allowed channel: mode=0 chan=9 freq=2452 MHz max_tx_power=20 dBm
Allowed channel: mode=0 chan=10 freq=2457 MHz max_tx_power=20 dBm
Allowed channel: mode=0 chan=11 freq=2462 MHz max_tx_power=20 dBm
RATE[0] rate=10 flags=0x2
RATE[1] rate=20 flags=0x6
RATE[2] rate=55 flags=0x6
RATE[3] rate=110 flags=0x6
RATE[4] rate=60 flags=0x0
RATE[5] rate=90 flags=0x0
RATE[6] rate=120 flags=0x0
RATE[7] rate=180 flags=0x0
RATE[8] rate=240 flags=0x0
RATE[9] rate=360 flags=0x0
RATE[10] rate=480 flags=0x0
RATE[11] rate=540 flags=0x0
Completing interface initialization
Mode: IEEE 802.11g  Channel: 3  Frequency: 2422 MHz
Flushing old station entries
Deauthenticate all stations
wpa_driver_nl80211_set_key: ifindex=10 alg=0 addr=(nil) key_idx=0 set_tx=1 seq_len=0 key_len=0
wpa_driver_nl80211_set_key: ifindex=10 alg=0 addr=(nil) key_idx=1 set_tx=0 seq_len=0 key_len=0
wpa_driver_nl80211_set_key: ifindex=10 alg=0 addr=(nil) key_idx=2 set_tx=0 seq_len=0 key_len=0
wpa_driver_nl80211_set_key: ifindex=10 alg=0 addr=(nil) key_idx=3 set_tx=0 seq_len=0 key_len=0
Using interface wlan0 with hwaddr hh:hh:hh:hh:hh:hh and ssid '....'
WPA: group state machine entering state GTK_INIT (VLAN-ID 0)
GMK - hexdump(len=32): 9c 37 07 d1 4e b6 cf f8 cf d5 7d e5 0c 4d 49 7b c7 09 e5 7a 82 09 03 71 15 08 e1 da 16 ef d7 a8
GTK - hexdump(len=16): dd 5c 3c f5 60 1a f9 17 77 27 52 a6 93 8a c4 92
WPA: group state machine entering state SETKEYSDONE (VLAN-ID 0)
wpa_driver_nl80211_set_key: ifindex=10 alg=3 addr=(nil) key_idx=1 set_tx=1 seq_len=0 key_len=16
nl80211: Set beacon (beacon_set=0)
wlan0: Setup of interface done.
MGMT (TX callback) ACK

STA xx:xx:xx:xx:xx:xx sent probe request for our SSID
STA xx:xx:xx:xx:xx:xx sent probe request for our SSID
MGMT (TX callback) fail
mgmt::proberesp cb
MGMT (TX callback) fail
mgmt::proberesp cb
STA xx:xx:xx:xx:xx:xx sent probe request for our SSID
MGMT (TX callback) fail
mgmt::proberesp cb
MGMT
mgmt::auth
authentication: STA=xx:xx:xx:xx:xx:xx auth_alg=0 auth_transaction=1 status_code=0 wep=0
  New STA
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authentication OK (open system)
wlan0: STA xx:xx:xx:xx:xx:xx MLME: MLME-AUTHENTICATE.indication(xx:xx:xx:xx:xx:xx, OPEN_SYSTEM)
wlan0: STA xx:xx:xx:xx:xx:xx MLME: MLME-DELETEKEYS.request(xx:xx:xx:xx:xx:xx)
authentication reply: STA=xx:xx:xx:xx:xx:xx auth_alg=0 auth_transaction=2 resp=0 (IE len=0)
MGMT (TX callback) fail
mgmt::auth cb
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: did not acknowledge authentication response
MGMT
mgmt::assoc_req
association request: STA=xx:xx:xx:xx:xx:xx capab_info=0x411 listen_interval=10
WMM IE - hexdump(len=7): 00 50 f2 02 00 01 00
Validating WMM IE: OUI 00:50:f2  OUI type 2  OUI sub-type 0  version 1  QoS info 0x0
  new AID 1

nl80211: Set beacon (beacon_set=1)
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: association OK (aid 1)
MGMT (TX callback) fail
mgmt::assoc_resp cb
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: did not acknowledge association response
STA yy:yy:yy:yy:yy:yy sent probe request for broadcast SSID
STA yy:yy:yy:yy:yy:yy sent probe request for broadcast SSID
MGMT (TX callback) fail
mgmt::proberesp cb
MGMT (TX callback) fail
mgmt::proberesp cb
STA yy:yy:yy:yy:yy:yy sent probe request for broadcast SSID
MGMT (TX callback) fail
mgmt::proberesp cb
MGMT
mgmt::deauth
deauthentication: STA=xx:xx:xx:xx:xx:xx reason_code=1
AP-STA-DISCONNECTED xx:xx:xx:xx:xx:xx
wlan0: STA xx:xx:xx:xx:xx:xx WPA: event 3 notification
wpa_driver_nl80211_set_key: ifindex=10 alg=0 addr=0x9593768 key_idx=0 set_tx=1 seq_len=0 key_len=0
   addr=xx:xx:xx:xx:xx:xx
WPA: xx:xx:xx:xx:xx:xx WPA_PTK entering state DISCONNECTED
WPA: xx:xx:xx:xx:xx:xx WPA_PTK entering state INITIALIZE
wpa_driver_nl80211_set_key: ifindex=10 alg=0 addr=0x9593768 key_idx=0 set_tx=1 seq_len=0 key_len=0
   addr=xx:xx:xx:xx:xx:xx
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.1X: unauthorizing port
Could not set station xx:xx:xx:xx:xx:xx flags for kernel driver (errno=22).
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: deauthenticated
wlan0: STA xx:xx:xx:xx:xx:xx MLME: MLME-DEAUTHENTICATE.indication(xx:xx:xx:xx:xx:xx, 1)

wlan0: STA xx:xx:xx:xx:xx:xx MLME: MLME-DELETEKEYS.request(xx:xx:xx:xx:xx:xx)
wpa_driver_nl80211_set_key: ifindex=10 alg=0 addr=0x9593768 key_idx=0 set_tx=1 seq_len=0 key_len=0
   addr=xx:xx:xx:xx:xx:xx

nl80211: Set beacon (beacon_set=1)
STA xx:xx:xx:xx:xx:xx sent probe request for our SSID
STA xx:xx:xx:xx:xx:xx sent probe request for our SSID
MGMT (TX callback) fail
mgmt::proberesp cb
MGMT (TX callback) fail
mgmt::proberesp cb
STA xx:xx:xx:xx:xx:xx sent probe request for our SSID
MGMT (TX callback) fail
mgmt::proberesp cb
MGMT
mgmt::auth
authentication: STA=xx:xx:xx:xx:xx:xx auth_alg=0 auth_transaction=1 status_code=0 wep=0
  New STA
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authentication OK (open system)
wlan0: STA xx:xx:xx:xx:xx:xx MLME: MLME-AUTHENTICATE.indication(xx:xx:xx:xx:xx:xx, OPEN_SYSTEM)
wlan0: STA xx:xx:xx:xx:xx:xx MLME: MLME-DELETEKEYS.request(xx:xx:xx:xx:xx:xx)
authentication reply: STA=xx:xx:xx:xx:xx:xx auth_alg=0 auth_transaction=2 resp=0 (IE len=0)
MGMT (TX callback) fail
mgmt::auth cb
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: did not acknowledge authentication response
MGMT
mgmt::assoc_req
association request: STA=xx:xx:xx:xx:xx:xx capab_info=0x411 listen_interval=10
WMM IE - hexdump(len=7): 00 50 f2 02 00 01 00
Validating WMM IE: OUI 00:50:f2  OUI type 2  OUI sub-type 0  version 1  QoS info 0x0
  new AID 1

nl80211: Set beacon (beacon_set=1)
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: association OK (aid 1)
MGMT (TX callback) fail
mgmt::assoc_resp cb
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: did not acknowledge association response
STA xx:xx:xx:xx:xx:xx sent probe request for our SSID
STA xx:xx:xx:xx:xx:xx sent probe request for our SSID
MGMT (TX callback) fail
mgmt::proberesp cb
MGMT (TX callback) fail
mgmt::proberesp cb
MGMT
mgmt::auth
authentication: STA=xx:xx:xx:xx:xx:xx auth_alg=0 auth_transaction=1 status_code=0 wep=0
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authentication OK (open system)
wlan0: STA xx:xx:xx:xx:xx:xx WPA: event 0 notification
wpa_driver_nl80211_set_key: ifindex=10 alg=0 addr=0x9593de8 key_idx=0 set_tx=1 seq_len=0 key_len=0
   addr=xx:xx:xx:xx:xx:xx

wlan0: STA xx:xx:xx:xx:xx:xx MLME: MLME-AUTHENTICATE.indication(xx:xx:xx:xx:xx:xx, OPEN_SYSTEM)
wlan0: STA xx:xx:xx:xx:xx:xx MLME: MLME-DELETEKEYS.request(xx:xx:xx:xx:xx:xx)
wpa_driver_nl80211_set_key: ifindex=10 alg=0 addr=0x9593de8 key_idx=0 set_tx=1 seq_len=0 key_len=0
   addr=xx:xx:xx:xx:xx:xx

authentication reply: STA=xx:xx:xx:xx:xx:xx auth_alg=0 auth_transaction=2 resp=0 (IE len=0)
MGMT (TX callback) fail
mgmt::auth cb
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: did not acknowledge authentication response
MGMT
mgmt::assoc_req
association request: STA=xx:xx:xx:xx:xx:xx capab_info=0x411 listen_interval=10
WMM IE - hexdump(len=7): 00 50 f2 02 00 01 00
Validating WMM IE: OUI 00:50:f2  OUI type 2  OUI sub-type 0  version 1  QoS info 0x0
  old AID 1

wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: association OK (aid 1)
MGMT (TX callback) fail
mgmt::assoc_resp cb
wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: did not acknowledge association response
STA yy:yy:yy:yy:yy:yy sent probe request for broadcast SSID
STA yy:yy:yy:yy:yy:yy sent probe request for broadcast SSID
MGMT (TX callback) fail
mgmt::proberesp cb
MGMT (TX callback) fail
mgmt::proberesp cb
STA yy:yy:yy:yy:yy:yy sent probe request for broadcast SSID
MGMT (TX callback) fail
mgmt::proberesp cb

Comment 21 Luis R. Rodriguez 2009-12-04 16:14:34 UTC
Jaiv please don't hijack a bug report for your AP issues and instead please either file a separate bug or consider trying newer drivers and newer hostapd (git version).

I should also note there are two actual reporters of issues on 2.6.31 on this bug report for ath9k: Terry and Tomasz.

I haven't seen new feedback from Tomasz so I'll continue with Terry for now.

Terry you have an AR9285 and that is its support was added on 2.6.29. I can't really think of much changes that went in to 2.6.32 which would help 2.6.31 as critical bug fixes but I we can try the upgrading the initvals as that does change every now and then and it may be possible the ones on 2.6.31 went with older values. I'll check.

Comment 22 Luis R. Rodriguez 2009-12-04 16:49:27 UTC
Created attachment 376121 [details]
Update initvals for AR9285 based on 2.6.32 work

Here is a backport of:

commit b264c673a03329b5e5bab79b705b5bb5ab1fe965
Author: Sujith <Sujith.Manoharan>
Date:   Wed Aug 26 08:39:55 2009 +0530

    ath9k: Update INITVALs for AR9285

    Signed-off-by: Sujith <Sujith.Manoharan>
    Signed-off-by: John W. Linville <linville>


Unfortunately that commit log entry is not descriptive but let me try to explain what these are to you. Initvals are the initialization values for all the Atheros registers. These registers are have some default values which need updating upon hardware initialization to get the harwdare fired up. The ath9k hw.c and friend files update the hardware eventually at different points in time but -- prior to start mucking around with the hardware we need to get it into a reasonable state. This is where the initvals come in. They are default register settings.

I've actually considered moving this to userspace and update the registers upon initialization via request_firmware() (although this is not firmware) but we also need these initvals also during hardware reset which is also done at many points in time during the driver, for example changing channels. The other challenge with putting this into userspace is initvals typically go hand in hand with specific driver updates and sometimes an initval update requires some actual driver changes. Nevertheless, one final consideration for moving this to userspace may be that we can kmalloc() the data instead of using the stack on the fly when needed, but so far this hasn't created an issue.

The updates to initvals are done by our systems engineers who preconfigure our hardware to sane defaults.

Anyway -- please test this out. I'm curious if you'll see any enhancements.

Comment 23 Luis R. Rodriguez 2009-12-04 16:53:02 UTC
I just noticed my backport isn't just about that patch but some other ones. I basically just diff'd the 2.6.31 initvals with the 2.6.32 and generated that patch. I did remove the AR9271 hunks though as those are not needed for 2.6.31.

Comment 24 Tomasz Sałaciński 2009-12-04 18:18:35 UTC
(In reply to comment #21)
> Jaiv please don't hijack a bug report for your AP issues and instead please
> either file a separate bug or consider trying newer drivers and newer hostapd
> (git version).
> 
> I should also note there are two actual reporters of issues on 2.6.31 on this
> bug report for ath9k: Terry and Tomasz.
> 
> I haven't seen new feedback from Tomasz so I'll continue with Terry for now.
> 
> Terry you have an AR9285 and that is its support was added on 2.6.29. I can't
> really think of much changes that went in to 2.6.32 which would help 2.6.31 as
> critical bug fixes but I we can try the upgrading the initvals as that does
> change every now and then and it may be possible the ones on 2.6.31 went with
> older values. I'll check.  

Hello,

I haven't written anything new on this bug since I've not tested the new kernel. Right now I am working with upgraded kernel from rawhide and this does work properly:

Linux Aspire 2.6.31.5-127.fc12.i686.PAE #1 SMP Sat Nov 7 21:25:57 EST 2009 i686 i686 i386 GNU/Linux

But, I don't know if the driver I've just installed is removed (it should be, because I am running new kernel?).

Anyway, after upgrading to new kernel it does work perfectly.

Comment 25 Luis R. Rodriguez 2009-12-04 19:11:58 UTC
Great Tomasz thanks for the feedback. What card do you have? See the comment #16 to see how to determine this.

Comment 26 Terry Moore 2009-12-04 21:07:38 UTC
Luis,
I would love to test the patch from comment 22 but I have not idea how to do this.  Sorry for the ignorance but can you give a quick step-by-step guide and I will test it.  

If I read everything correctly you are trying to get everything that has changed in the rawhide release into the 2.6.31-x kernel, is that correct?  If so, is this because F12 isnt planning on going to 2.6.32 or atleast no plans as of yet?

On another note, I agree with Tomasz that the kernel-2.6.32-x from Rawhide does work both with good performance and it works after a system resume.  I think my issue with the rawhide kernel has something to do with the intel video drivers which is completely unrelated to this thread.

BTW Luis thanks for all your work on this!!

Comment 27 John W. Linville 2009-12-04 21:21:31 UTC
Build w/ patch from comment 22 here (just started):

   http://koji.fedoraproject.org/koji/taskinfo?taskID=1850537

Luis probably doesn't have a lot of background for answering Fedora policy questions, but you can bet that it will be a little while before Fedora goes to 2.6.32.  Other people/distros may be in the same boat, so it would be nice to identify what parts specifically fix the issue at hand.

Comment 28 Luis R. Rodriguez 2009-12-04 21:22:58 UTC
Terry -- it seems you're likely going to have to wait for someone to build you a kernel for you to test.

As for Fedora 12 stuff, I have no clue.

And no -- I am not backporting all the patches, I actually haven't reviewed all the pathces on 2.6.32 since 2.6.31 for ath9k but I do know we do try to cc stable on important fixes. So what I was trying to do rather provide feedback on new patches merged for linux-next (2.6.33) which are likely good fixes for 2.6.31. I also remembered that the initval updates *might* help and we never CC'd stable on those. Unfortunately the commit log entries for the initval updates are not elaborate. Typically our initvals for our drivers are updated internally at Atheros based on new testing or finding of a bug, in this case it is not clear to yet what the initval updates for 2.6.32 were based on. I can go review but frankly I have other higher priority things to do and a test of this patch will likely be easier.

If I have time I'll try to review all the changes from 2.6.31..2.6.32 and see if there are any other things but I just cannot do that right now.

Comment 29 Luis R. Rodriguez 2009-12-04 21:23:37 UTC
Thanks John!

Comment 30 Luis R. Rodriguez 2009-12-04 21:38:45 UTC
Terry if your AP is on 2.437 MHz can you also please provide the output of:

iw dev wlan0 scan freq 2427 2432 2437 2442 2447 | grep -c ^BSS

Since your AP's center freq is on 2437 this will scan the APs on your own channel and other channels which could potentially interfere with your own AP's communication to your STA, this would be seen as noise. The more accurate way of doing this is if we can gather noise on each of those channels but I haven't quite figured out how to do this easily yet -- to measure the amount of traffic, not just the APs there.

Anyway -- this would help.

iw documentation is here:

http://wireless.kernel.org/en/users/Documentation/iw

I'd think FC12 has it.

Furthermore please avoid using iwconfig for ... anything. Please try to familiarize yourself with iw as it will provide more information than iwconfig and is required to get actual information for 802.11n. For example on newer kernels ath9k will support actually reporting the right MCS rate used on 802.11n networks. That won't be on 2.6.31..2.6.32 though but to see the actual rates used and packet error rate you can use debugfs:

http://wireless.kernel.org/en/users/Drivers/ath9k/debug#rcstat

Comment 31 Luis R. Rodriguez 2009-12-04 22:59:54 UTC
OK so ... I have done the review of 'fixes' that went in on 2.6.32 for ath9k which are not on 2.6.31. Some of them are small enough and indeed should have gone to stable and I hate to see this. Some of them a little big...

I count 40 fixes...

I've stashed this on this page:

http://bombadil.infradead.org/~mcgrof/patches/ath9k/fixes-not-in-2.6.31-for-ath9k.txt

John, I've tried to narrow this down but the only thing I see that is probably a non-so-important-fix is the ps enhancements but I can't untangle which patches those depend on exactly right now.

So, feel free to cherry pick changes... or try all of them. Sorry for the mess.

On the bright side our team now has synchronized testing for the 2.6.32 kernel and fixes were propagated for that kernel with a few exemptions which were only recently written and one patch may be too big for 2.6.32.

Comment 32 Jaiv 2009-12-05 00:22:35 UTC
Hi Luis - I'm sorry I did not want to hijack this bug report, I just had a gut feeling that issues described in this bug and issues I experienced with hostapd have the same rootcause (also because symptoms described were the same). An yes, I used latest git version of hostapd. But never mind, I also sent a note to hostapd mailing list and I will search again and open new bug if needed. Thank you.

Comment 33 Terry Moore 2009-12-05 01:15:19 UTC
Answer to comment 30

[root@minime ~]# iw dev wlan0 scan freq 2427 2432 2437 2442 2447 | grep -c ^BSS
4

Comment 34 Lonni J Friedman 2009-12-05 01:58:42 UTC
I, too, am afflicted by ath9k issues in Fedora12-i686.  I'm seeing the random disassocations problem:
wlan0: no probe response from AP 00:16:b6:da:8a:44 - disassociating

which is always resolved if I manually bring down & back up the wlan0 interface.  Its then a random period of time until it happens again.  I've gone anywhere from a few minutes to a high of 8 days without hitting this.  Most often, it happens when the system is idle, or a short period after booting when I attempt to push alot of data over wlan0.  

I just tried the 2.6.31.6-160.bz541756.1.fc12.i686 test kernel, but wlan0 died a few minutes after booting with the same 'disassociating' failure.  Under that kernel, here's the requested dmesg output:

ath9k 0000:04:00.0: PCI INT A -> Link[LN3A] -> GSI 19 (level, low) -> IRQ 19
ath9k 0000:04:00.0: setting latency timer to 64
ath: EEPROM regdomain: 0x60
ath: EEPROM indicates we should expect a direct regpair map
ath: Country alpha2 being used: 00
ath: Regpair used: 0x60
phy0: Selected rate control algorithm 'ath9k_rate_control'
phy0: Atheros AR9280 MAC/BB Rev:2 AR5133 RF Rev:d0: mem=0xf36c0000, irq=19
Registered led device: ath9k-phy0::radio
Registered led device: ath9k-phy0::assoc
Registered led device: ath9k-phy0::tx
Registered led device: ath9k-phy0::rx
device-mapper: multipath: version 1.1.0 loaded

I just went ahead and installed the latest rawhide kernel (2.6.32-0.65.rc8.git5.fc13.i686), and I'm testing it now.  I'll post an update if/when it fails, but barring a disassociation in the next few hours, it will still be difficult to be confident that the problem is fixed when it could take up to 8 days for it to reappear.  

If someone would prefer that I test any other kernel RPM, I'd be happy to do so, just point me at it.  thanks

Comment 35 Jaiv 2009-12-05 02:30:02 UTC
I opened new bug for hostpad and ath9k issue https://bugzilla.redhat.com/show_bug.cgi?id=544497 You can delete my posts #20 and #21

Comment 36 Trevor Curtis 2009-12-06 16:48:34 UTC
Not surprising given the above, but I just wanted to point out that upgrading to the rawhide kernel fixed my problems with the ath9k driver (on a Compaq CQ61-324CA).

Comment 37 Michal Jaegermann 2009-12-07 05:56:04 UTC
I just upgraded 1002HA ASUS netbook from Fedora 10 to Fedora 12.  Previously I was using various 2.6.29.6-... kernels for f10, which used to show up in updates-testing, and there ath9k worked really nicely (not so well with default F10 kernels).  As a test I attempted to scp some 450+ Megs of data using wlan0 interface from 2.6.31.6-145.fc12.i686 kernel.  That transfer was stalling all the time in unpredictable moments and in the middle of it I got:

irq 18: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper Not tainted 2.6.31.6-145.fc12.i686 #1
Call Trace:
 [<c04750cd>] __report_bad_irq+0x33/0x74
 [<c0475208>] note_interrupt+0xfa/0x152
 [<c0475780>] handle_fasteoi_irq+0x83/0xa2
 [<c04056cd>] handle_irq+0x40/0x4b
 [<c0404e91>] do_IRQ+0x46/0x9a
 [<c0403c50>] common_interrupt+0x30/0x38
 [<c044007b>] ? mod_timer_pending+0x14/0x16
 [<c0454bfd>] ? tick_nohz_stop_sched_tick+0x309/0x315
 [<c04026dd>] cpu_idle+0x74/0xaf
 [<c0753f88>] rest_init+0x58/0x5a
 [<c09898b2>] start_kernel+0x32b/0x330
 [<c0989070>] i386_start_kernel+0x70/0x77
handlers:
[<c066e9b6>] (usb_hcd_irq+0x0/0x6f)
[<f89b22bc>] (ath_isr+0x0/0x130 [ath9k])
Disabling IRQ #18

As IRQ #18 is used by the driver in question that was it.  Nothing could
be done before I reloaded ath9k module and restarted NetworkManager. Then the transfer completed but with the next pile of stalls and

wlan0: no probe response from AP 00:13:10:1b:6f:68 - disassociating
wlan0: authenticate with AP 00:13:10:1b:6f:68
wlan0: authenticated

Transfer speed clearly jumped all over the place depending if a stall hit or not.

Regretably this is quite a regression from a state in F10 where quite sizeable updates over a wlan0 interface were coming through just fine.  It appears that in a light usage that connection is usually "good enough" but not when traffic gets serious.

Please find atached dmesg showing the issue, a content of /proc/interrupts and an output from 'lspci -tv'.  This is a netbook so I do not have an option to try to rearrange hardware devices in slots.

Comment 38 Michal Jaegermann 2009-12-07 05:57:38 UTC
Created attachment 376588 [details]
dmesg from 2.6.31.6-145.fc12.i686 with "disassociating" all over the place

Comment 39 Michal Jaegermann 2009-12-07 05:58:58 UTC
Created attachment 376589 [details]
an output of 'cat /proc/interrupts'

Comment 40 Michal Jaegermann 2009-12-07 06:00:28 UTC
Created attachment 376590 [details]
layout of PCI buses

Comment 41 Michal Jaegermann 2009-12-07 18:30:20 UTC
An updated kernel-2.6.31.6-162.fc12.i686 showed up today with the following in changelog:

- ath9k: add fixes suggested by upstream maintainer

I repeated the same experiment as yesterday using this.  Not much of improvement although maybe some. Like previously I got a series of
....
wlan0: no probe response from AP 00:13:10:1b:6f:68 - disassociating
wlan0: authenticate with AP 00:13:10:1b:6f:68
wlan0: authenticate with AP 00:13:10:1b:6f:68
wlan0: authenticated
wlan0: associate with AP 00:13:10:1b:6f:68
wlan0: RX ReassocResp from 00:13:10:1b:6f:68 (capab=0x411 status=0 aid=1)
wlan0: associated
.....
After that 'integrated sync not supported' and ath9k was dead until I reloaded the module.  Still a restart of a NetworkManager was required before it caught up that a connection is already back.

I did not see "irq 18: nobody cared" but that is most likely infrequent.

It appears that rsync over ssh does not beat that much on that connection like just 'scp -r ...' with ~180 files to transfer.  I tried such rsync twice and
I got both times something like:

sent 3386 bytes  received 477362768 bytes  2630116.55 bytes/sec
total size is 477288939  speedup is 1.00

and no additional connection stalls.  I do not know what is in scp which gives ath9k such hard time.

In case somebody wonders an ethernet cable was unplugged during these trials.

Comment 42 Michal Jaegermann 2009-12-23 00:03:11 UTC
That is what I got with kernel-2.6.31.6-166.fc12.i686:

irq 18: nobody cared (try booting with the "irqpoll" option)
Pid: 1453, comm: nautilus Not tainted 2.6.31.6-166.fc12.i686 #1
Call Trace:
 [<c04750cd>] __report_bad_irq+0x33/0x74
 [<c0475208>] note_interrupt+0xfa/0x152
 [<c0475780>] handle_fasteoi_irq+0x83/0xa2
 [<c04056cd>] handle_irq+0x40/0x4b
 [<c0404e91>] do_IRQ+0x46/0x9a
 [<c0403c50>] common_interrupt+0x30/0x38
handlers:
[<c066eb52>] (usb_hcd_irq+0x0/0x6f)
[<f89ff334>] (ath_isr+0x0/0x130 [ath9k])
Disabling IRQ #18

After that wlan0 interface, which is so-so although mostly workable in the best times, become unusable until ath9k module was removed and loaded once again.  Even with that autofs, for example, lost ability to mount network file systems
and restarting that and other services did not help.

Comment 43 Luis R. Rodriguez 2009-12-23 00:41:07 UTC
Please try:

http://wireless.kernel.org/en/users/Download/stable

Comment 44 Andrew Overholt 2010-01-01 17:16:38 UTC
I am on F12's 2.6.31.9-174.fc12 kernel and tried compat-wireless-2.6.31-rc7 and I get the following:

$ make
./scripts/gen-compat-autoconf.sh config.mk > include/linux/compat_autoconf.h
make -C /lib/modules/2.6.31.9-174.fc12.i686/build M=/home/overholt/Downloads/compat-wireless-2.6.31-rc7 modules
make[1]: Entering directory `/usr/src/kernels/2.6.31.9-174.fc12.i686'
  CC [M]  /home/overholt/Downloads/compat-wireless-2.6.31-rc7/drivers/net/b44.o
make[3]: *** No rule to make target `/home/overholt/Downloads/compat-wireless-2.6.31-rc7/drivers/misc/eeprom/max6875.c', needed by `/home/overholt/Downloads/compat-wireless-2.6.31-rc7/drivers/misc/eeprom/max6875.o'.  Stop.
make[2]: *** [/home/overholt/Downloads/compat-wireless-2.6.31-rc7/drivers/misc/eeprom] Error 2
make[1]: *** [_module_/home/overholt/Downloads/compat-wireless-2.6.31-rc7] Error 2
make[1]: Leaving directory `/usr/src/kernels/2.6.31.9-174.fc12.i686'
make: *** [modules] Error 2

What am I missing?

Comment 45 Andrew Overholt 2010-01-03 16:14:26 UTC
I tried compat-wireless-2.6.32.2 on F12's 2.6.31.9-174.fc12 (x86) and it built fine.  It also works great -- thanks!  Is there anything I can do to help get these fixes into an F-12 2.6.31 kernel?

Comment 46 Rudi Salm 2010-01-17 23:16:58 UTC
(In reply to comment #45)

hello,

i have the same problem since updateing my Thinkpad A31p to F12. I also have some customers which use a acer notebook with an ath9k under fedora 12 with no performance problems. so i try to find out what happens. also i have two accespoints: 1 TP-Link TL-WR941ND 300N Router and 1 Fritzbox 7150. On the TP-Link i configuered wireless to use 11n only, 40 MHz and up to 300Mbps and wpa/wpa2 encryption - this works fine with F11. On the Fritzbox is configured to use 802.11g(++) up to 125Mbps and wpa/wpa2 encryption. When i now connect with the acer notebook to the fritzbox the connection was established with 54Mbps (NetworkManager says so) and it work fine. when i connect with my ThinkPad to my TP-Link i will be connected with 1Mbps!
To solve the bad performance on my A31p i reconfigured the TP-Link to use only 11g - and then it works fine with 54Mbps!

So i will say: not the whole ath9k is bad - only the 11n part / implementation.

Maybe there is someone who can recheck this behavior and fix the 802.11n part of the ath9k.

Comment 47 Luis R. Rodriguez 2010-01-19 18:41:02 UTC
As I have noted on comment #31, there a large number of fixes which went in to 2.6.32 which were never propagated down to 2.6.31. This was unfortunate and because of this I am working hard now to push every possible stable fix down to 2.6.32. If you are having issues with 2.6.31 you can consider reviewing the fixes I have seen on 2.6.32 which could likely be good 2.6.31 fixes that never got propagated:

http://bombadil.infradead.org/~mcgrof/patches/ath9k/fixes-not-in-2.6.31-for-ath9k.txt

If you do not want to deal with that then just use the compat-wireless stable for 2.6.32:

http://wireless.kernel.org/en/users/Download/stable

Comment 48 John W. Linville 2010-01-21 14:57:40 UTC
Have you tried one of the 2.6.32-based builds in Koji?

http://koji.fedoraproject.org/koji/buildinfo?buildID=152098

Comment 49 Rudi Salm 2010-01-21 17:04:04 UTC
(In reply to comment #48)
> Have you tried one of the 2.6.32-based builds in Koji?
> 
> http://koji.fedoraproject.org/koji/buildinfo?buildID=152098    

hello john,

thanx for your anwer. based on it i install all needed packages from Koji.
hey man, what a hard job to solve all thes dependencies manually ;-) .

but this work was successfully!

i have reconfigured my TP-Link router to

Channel = auto
Mode = 11bgn mixed
Channel Width = auto
Max Tx Rate = 300Mbps

and my thinkpad connects as under F11.

but there is allready on little mistake: NetworkManager says: 

Connection information speed = unknown

so i post my output off iwconfig:

wlan0     IEEE 802.11bgn  ESSID:"TP-LINK"  
          Mode:Managed  Frequency:2.412 GHz  Access Point: 00:23:CD:CD:3A:D8   
          Bit Rate=0 kb/s   Tx-Power=20 dBm   
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Power Management:off
          Link Quality=70/70  Signal level=-38 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:0

it seems the ath9k cannot determine the correct speed off the 802.11n connection.

this behavior is the same as under F11.

so long: the 2.6.32 kernel seems to work well!
now it's time to rollout this kernel asap to support all ath9k users.

greets and thanx
rudi

Comment 50 Luis R. Rodriguez 2010-01-21 17:26:16 UTC
iwconfig is 802.11n un-aware. You want to abandon iwconfig and start becoming comfortable with iw instead.

For 2.6.32 though ath9k did not report the right MCS rate to mac80211 though, this was fixed later so in future kernels you will be able to get the right MCS rate through iw on ath9k.

http://wireless.kernel.org/en/users/Documentation/iw

Network Manager on F11 probably still uses wireless-extensions to get a lot of information. This means it won't be able to get and display 802.11n information.

This may likely be addressed in future versions of Network Manager.

Comment 52 Stefan Assmann 2010-01-23 10:06:59 UTC
Hey John and Luis,

I've used the build from comment #51 for a whole day with several suspend cycles now and so far no problems encountered. Hope 2.6.32 hits F12 soon.

Comment 53 John W. Linville 2010-02-01 19:12:08 UTC
Setting to MODIFIED pending a 2.6.32 kernel in F-12...


Note You need to log in before you can comment on or make changes to this bug.