Bug 543719

Summary: AR9285 associates but requires tcpdump to get a DHCP reply
Product: Red Hat Enterprise Linux 5 Reporter: Luis R. Rodriguez <mcgrof>
Component: kernelAssignee: Stanislaw Gruszka <sgruszka>
Status: CLOSED WONTFIX QA Contact: desktop-bugs <desktop-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 5.5CC: cmeadors, eric, kusjma, linville, simon.matter, vince, zebing86
Target Milestone: rcKeywords: OtherQA, Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
URL: http://wireless.kernel.org/en/users/Drivers/ath9k/RHEL5
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 630960 (view as bug list) Environment:
Last Closed: 2011-09-24 15:32:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 630960    
Attachments:
Description Flags
Dmesg output after running modprobe ath9k debug=0x00000200
none
Patch with additional printk for debug
none
Debug messages in RHEL5 (not get beacons after scan finished)
none
Debug messages in 2.6.32.7 with RHEL5 user space (receive beacons after scan) none

Description Luis R. Rodriguez 2009-12-02 22:46:00 UTC
Description of problem:

I installed Centos 5.4 and then I installed the kernel-2.6.18-175.el5.jwltest.96.1.i686.rpm from:

http://people.redhat.com/linville/kernels/rhel5/

I then associated to my WRT610n, configured for 802.11bg-mixed (no 802.11n enabled) with no encryption and it associates fine but I get no DHCP replies. It works fine with 2.6.32-rc stuff though.

The AP sits right on top of the box.

How reproducible:

Always

Steps to Reproduce:
1. iwconfig wlan0 essid tesla-2g-bcm
2. dhclient wlan0
3. wait for a DHCP reply
 
Actual results:

No DHCP replies come through

Expected results:

DHCP replies are supposed to come through.

Comment 1 John W. Linville 2009-12-03 14:28:11 UTC
Odd...I've been using an ath9k card as one of my primary test devices.  Does wireshark (or tcpdump) running on the client show other traffic arriving?  Does use of WEP or WPA change the situtation?

Comment 2 Luis R. Rodriguez 2009-12-03 14:37:16 UTC
Oh neat didn't know wireshark would be availble, 'yum install wireshark' I take it? I'll try it out when I get to work.

Comment 3 John W. Linville 2009-12-03 15:07:09 UTC
Yes, that should work.

Comment 4 Luis R. Rodriguez 2009-12-03 17:09:02 UTC
OK I ended up sniffing from another box and I see the DHCP offers coming from the AP sent to the broadcast ff:ff:ff:ff:ff:ff. I figure this is likely a filter issue so I tested running tcpdump on another window while requesting the IP address via DHCP and that worked. So setting promiscuous lifted the flags up on the device so it can receive anything and it received the broadcast replies.

So this can be a filter issue or not setting the bssid of the AP found appropriately. Prior to running tcpdump or dhclient I do see the AP's BSS on the iwconfig output though and it also comes up in the kernel logs.

For what its worth I now have on this box an AR5416 and AR9280. It is able to receive the DHCP replies on both only if I run tcmpdump as described above.

I now have the AP configured to 802.11n-only and HT40, seems to ping fine, no iperf package available through yum but I am able to scp over to another box.

With AR5416 and AR9280 I get around ~ 2.0 MB/s so I suppose this is around 16 Mbit/s. Not bad for this being at the Atheros office with all the APs / noise traffice around. So awesome job!

I was unable to check the actual rates used to TX as debugfs for ath9k was not enabled. If its not too much trouble mind adding one that enables it? Or are there some debugfs backport considerations?

Comment 5 Luis R. Rodriguez 2009-12-03 17:10:17 UTC
I should mention the AR5416 PCI card I used is D-Link DWA-542. It comes up under lspci as AR5008.

Comment 6 Luis R. Rodriguez 2009-12-03 17:11:31 UTC
BTW John which card are you testing with so I can try other devices instead. I'm about to test AR9285.

Comment 7 Luis R. Rodriguez 2009-12-03 17:46:37 UTC
OK tested AR9285 *and* AR9287. AR9285 is our single stream device, all others are dual stream as such it was expected to do half as much throughput as it uses up to MCS7 and indeed it did. So AR9285 gets around 8 Mbit/s over the air here on HT40 on a busy place.

AR9287 does just as AR2980 and AR5416 do.

All of these require running tcpdump to get a DHCP reply from the AP.

BTW -- noticed lspci wasn't spitting out the right device name for AR9287 so I just added this to pciids.sf.net:

https://pci-ids.ucw.cz/read/PC/168c/002e

Comment 8 John W. Linville 2009-12-03 22:53:43 UTC
I've got a 168c:002a (rev 01) which lspci calls an "AR928X Wireless Network Adapter (PCI-Express) (rev 01)".

Re: iperf -- RPMForge (http://rpmforge.net/) seems to have it packaged for RHEL5

Are you running NetworkManager?  Or doing ifconfig/iwconfig/dhclient directly?  I'm not sure why that would make a difference, but worth noting...

Is there any chance this filtering issue could be device specific?  I am definitely getting addresses over DHCP w/o having to run any sort of sniffer or doing any other special thing.

Comment 9 Luis R. Rodriguez 2009-12-03 23:49:42 UTC
I'm running Network Manager but when it comes up I disable wireless. I then associate manually and then dhcp. I sometimes have to pkill dhclient as upon bootup dhclient would have started for the wireless interface as Network Manager was running from the start (as I enabled it to do so).

So.... I just killed everything that has to do with Network Manager:

/etc/init.d/NetworkManager stop
pkill nm-sy
pkill dhclient

Then brought the interface up and essid and tried DHCP and it worked just fine.

I'm disabling the service now and rebooting and will give this a spin with each card.

Comment 10 Luis R. Rodriguez 2009-12-04 00:07:38 UTC
So turns out that although upon default installation the network manager applet thingy does not come up but -- nm-system-settigns *is* running and if you don't kill that manual dhcp won't work.

Killing that fixes my woes.

Comment 11 John W. Linville 2009-12-04 14:19:16 UTC
OK, that sounds promising... :-)  So what about if you leave NM alone?  Does that connect successfully?

Comment 12 Luis R. Rodriguez 2009-12-04 15:31:48 UTC
Well no, you see when you actually *do* log in to GNOME on RHEL5.4 nm-applet is fired off. I spoke to Dan yesterday about this and he explained the whole piece 3-split of Network Manger the the old age. It seems although nm-applet is fired off Network Manager is still not started by default and *should not* affect the wireless interfaces. But I did notice some issues..

I did some more testing yesterday and stopped logging in through GNOME to the desktop -- just to be safe -- and ensured (although not required) even nm-applet was not fired off yet. This allows me to test AR5416 and AR9287 with DHCP fine. But I then noticed AR9285 is not working with dhcp, unless I run tcpdump... so it is likely that what I was seeing was a combination of Network Manager actually running and in other cases the harware was not getting broadcast replies to some actual driver issue.

I'll do another round of testing today but it does seem at least AR9285 *does not* get dhcp replies.

Comment 13 John W. Linville 2009-12-04 15:35:47 UTC
Yeah, I think there is still something here...for example, if I put iperf
server on ath9k box, I can't connect from iperf client unless I ping the client
from the server first (and recently)...

Comment 14 Luis R. Rodriguez 2009-12-10 01:00:56 UTC
John, a few patches which you might find useful for this release for ath9k:

http://kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2009-12-08/for-2.6.32/

Here is one pending an upstream sha1sum:

http://kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2009-12-08/for-2.6.32/pending-sha1sum/0005-mac80211-Fix-dynamic-power-save-for-scanning.patch

but likely desirable as well.

Comment 15 John W. Linville 2009-12-10 19:28:06 UTC
The commit ID for "mac80211: Fix dynamic power save for scanning" is (or will be) 7c3f4bbedc241ddcd3abe1f419c356e625231da1.

Comment 16 John W. Linville 2010-01-06 16:58:36 UTC
Any interest in trying jwltest.99?  It has the applicable wireless fixes from 2.6.32.2 and 2.6.32.3 applied.  It seems to resolve the issue I mentioned in comment 13.

Comment 17 Luis R. Rodriguez 2010-01-06 18:01:56 UTC
I just tested this. Odd enough now even AR5416 requires tcpdump to be running to get a reply back. This also applies to AR2985 still. I tried setting the SSID and then ran dhclient against each interface. To run these commands I didn't log in to GNOME, I just ssh'd into the box upon bootup.

[root@pogo ~]# uname -rm
2.6.18-183.el5.jwltest.99 x86_64

Comment 18 Luis R. Rodriguez 2010-01-06 18:14:32 UTC
I just got a panic after halting the box and then swapping a card. I can only see the tail of the oops. It does not relate to ath9k though:

[0...] driver_probe_device+0x52/0xaa
...
[<0xf..801c7745>] bus_add_driver+0x76/0x110
[<0xf..8015e38c>] __pci_register_driver+0x51/0xa6
[<0xf..800a7c83>] sys_init_module+0xaf/0x1f2
[<0xf..8005e116>] system_call+0x7e/0x83

RIP [<ffffffff8844d145>] :snd_hda_intel:via_build_pcms+0x70/0x11f
 RSP <ffff81021dd73cd8>
CR2: 000000000000000
<0>Kernel panic - not synching: Fatal exception

I should note upon early boot I get:

BIOS Bug: MCFG area at e0000 is not E820-reserved

After a reboot the panic did not happen. Seems sporadic.

Comment 19 Luis R. Rodriguez 2010-01-06 18:45:52 UTC
OK -- I have narrowed this issue down to against my WRT610N v1. If I try to connect against my work Cisco AP using WPA-EAP it associates and gets an IP address fine through DHCP on both cards.

I reconfigure the WRT610N to legacy mode operation (802.11bg) and it doesn't make a difference.

I'm using WRT610 Firmware version 1.00.00 B18 August 16, 2008

Going to try the new 07/28/2009 Ver.1.00.03.15

Comment 20 Luis R. Rodriguez 2010-01-06 18:56:56 UTC
After trying to rmmod ath9k after a connection I got:

unregister_netdevice: waiting for wlan0 to become free. Usage count = 5

This message goes in a loop, coming up every 6 seconds or so.

Anyway, rebooting will try the new firmware then.

Comment 21 Luis R. Rodriguez 2010-01-06 19:02:56 UTC
OK tried shiniest new firmware for this AP and no luck. Unfortunately I have no other APs I can use to test against dhcp. My work Cisco AP works fine though. I get an IP address upon the first DHCP request, immediately.

Can you get a WRT610n? :)

Comment 22 Luis R. Rodriguez 2010-01-06 19:15:16 UTC
FWIW I ran this in a loop without issue:

while true; do rmmod ath9k; modprobe ath9k; ip link set dev wlan0 up ; iwconfig wlan0 essid tesla-2g-bcm ; sleep 2; done

It always associated and didn't get an oops or that issue of unregister_netdevice I reported above. Not sure what could have caused that.

Let me know if you have other ideas about what this issue may be, I am able to connect against the same AP with wireless-testing so although this seems specific right now to WRT610N it is also seeming specific to the backport.

Comment 23 John W. Linville 2010-01-18 18:53:21 UTC
Doubt if it makes a difference, but I updated my test kernels w/ the wireless fixes from 2.6.32.4...

Comment 24 John W. Linville 2010-03-31 16:35:59 UTC
RHEL5.5 is released now.  I'm pretty sure it shipped with the kernel available here:

http://people.redhat.com/jwilson/el5/194.el5/

That should include fixes from 2.6.32.7, although I don't recall anything that would specifically address this issue.  Luis, when you get a chance please attempt to recreate with those kernels.  If you can, we should probably change the summary of this bug to reflect a released RHEL kernel.

Stanislaw has agreed to takeover wireless LAN maintenance for RHEL5, so I'm going to assign this to him.  I'll stay Cc'ed as well.  If any potential fixes come to mind, please let us know! :-)

Comment 25 Luis R. Rodriguez 2010-03-31 16:50:42 UTC
Sure, understood, any chance I can get a shiny RHEL5.5 box?

Comment 26 John W. Linville 2010-03-31 17:00:11 UTC
Are you talking hardware?  Or access to the release?  I doubt if I can help w/ the former, and would have to talk to someone else for the latter. :-(

Weren't you testing on a CentOS box before?  That should still suffice, if you still have that available. :-)

Comment 27 Stanislaw Gruszka 2010-04-06 09:09:24 UTC
I have WRT610N V1 and just ordered AR9285, so soon I will be able to reproduce.

Comment 28 Eric Levinson 2010-04-12 22:32:13 UTC
Not sure if this is the correct place, please feel free to delete this and let me know where the proper place is to put this.
I have a DWA-556 PCie wireless card and despite everything I have tried, under RHEL 5.5, cannot get the wireless link to come up.

I've tried both Madwifi and ath9k.  I also tried wpa_supplicant.  What happens is the card (wlan0) gets everything else _but_ an IP address.  I see the Mac address, the Mode, the ssid.  I even set my access point to use NO key, but no go.  In rhel 5.4, there wasn't even a wlan0 device.  At least the card is being detected properly in rhel5.5.

Here are my details:

 Linux kernel version 2.6.18-194.el5

 Here are some details:

 dmesg:

 ath: EEPROM regdomain: 0x10
 ath: EEPROM indicates we should expect a direct regpair map
 ath: Country alpha2 being used: CO
 ath: Regpair used: 0x10
 GSI 23 sharing vector 0x4A and IRQ 23
 ACPI: PCI Interrupt 0000:00:1b.0[A] -> GSI 22 (level, low) -> IRQ 74
 PCI: Setting latency timer of device 0000:00:1b.0 to 64
 hda_codec: ALC888: BIOS auto-probing.
 phy0: Selected rate control algorithm 'ath9k_rate_control'
 cfg80211: Calling CRDA for country: CO
 cfg80211: Regulatory domain changed to country: CO
        (start_freq - end_freq @ bandwidth), (max_antenna_gain, 
 max_eirp)
        (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2700 mBm)
        (5170000 KHz - 5250000 KHz @ 20000 KHz), (300 mBi, 1700 mBm)
        (5250000 KHz - 5330000 KHz @ 20000 KHz), (300 mBi, 2300 mBm)
        (5735000 KHz - 5835000 KHz @ 20000 KHz), (300 mBi, 3000 mBm) 
 Registered led device: ath9k-phy0::radio Registered led device: 
 ath9k-phy0::assoc Registered led device: ath9k-phy0::tx Registered led 
 device: ath9k-phy0::rx
 phy0: Atheros AR5418 MAC/BB Rev:2 AR2133 RF Rev:81:
 mem=0xffffc20000040000, irq=169
 ADDRCONF(NETDEV_UP): wlan0: link is not ready


 lspci:

 03:00.0 Network controller: Atheros Communications Inc. AR5008 
 Wireless Network Adapter (rev 01)

 The iwconfig:

 wlan0     IEEE 802.11bgn  ESSID:"levinsong"
          Mode:Managed  Frequency:2.462 GHz  Access Point: 
 00:22:75:62:3D:B0
          Bit Rate=54 Mb/s   Tx-Power=27 dBm
          RTS thr:off   Fragment thr:off
          Encryption key:XXXX-XXXX-XX
          Power Management:off
          Link Quality=61/70  Signal level=-49 dBm
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:0

eric

Comment 29 John W. Linville 2010-04-13 13:49:36 UTC
Eric, that does sound like the same issue Luis reported.  Does running 'tcpdump -i wlan0' in another window before/during the connection attempt (as Luis describes above) make things work for you?

Comment 30 Eric Levinson 2010-04-13 18:05:38 UTC
I tried the tcpdump -i wlan0 but it didn't help.
FYI the wpa_supplicant doesn't start at startup.  There is an error in the boot log regarding a file it can't find.

I have to manually go into the services and start it.

When I start it, the supplicant shows all wifi sites near my house, but no matter what I do with any of them, click Connect does nothing.


In addition, in the Network Control Panel (gnome) I see the wlan0 adaptor in the Inactive status.  If I try to activate it, it times out with Unable to get IP address message after about 30 seconds and it stays in the inactive statue.

Comment 31 John W. Linville 2010-04-13 18:20:04 UTC
It sounds like you are not using NetworkManager -- it isn't enabled by default in RHEL5, but I recommend it.

   chkconfig NetworkManager on
   service NetworkManager start

This should add a little icon to the bar a tthe top of the screen.  This should show any available networks and allow you to select them, enter security information, etc.  If you do that while running tcpdump as described above, do you see frames being received by tcpdump?

Comment 32 Eric Levinson 2010-04-13 18:27:27 UTC
I'll try this tonight.  I don't have remote access to the system.  So should I keep wpa_supplicant not started when I start the network manager?

Comment 33 John W. Linville 2010-04-13 18:58:10 UTC
Correct -- NetworkManager will start wpa_supplicant for itself.

Comment 34 Eric Levinson 2010-04-14 00:40:19 UTC
That worked.  I needed to start network manager, and then same issue.

When I started tcpinfo -t wlan0 everything fell into place and I am _finally_ on wifi!  Thanks.

I'd be interested in learning about a patch for this when it is released.

I'd also be happy to help test too.

Thanks!

Comment 35 John W. Linville 2010-04-14 13:37:48 UTC
Certainly seems like a filtering problem -- Luis, are you aware of any pertinent patches relating to filtering and/or these devices since 2.6.32 (i.e. the origin of the ath9k driver in RHEL5)?

Comment 36 Luis R. Rodriguez 2010-04-14 17:20:21 UTC
John, I had reviewed this issue and concluded this is very likely a backward compatibility bug since if you try the same driver on the same device with a vanilla 2.6.32.y (even an old one) the cards work.

I don't imagine the RX filter path was modified in any way on the ath9k driver, or mac80211/cfg80211 so I couldn't figure out where to look. A way to debug this would be to compile a kernel with debugging enabled for ath9k and then

modprobe ath9k debug=0x00000200

This will enable debug prints masked with ATH_DBG_CONFIG, and the line we want to verify is:

DPRINTF(sc, ATH_DBG_CONFIG, "Set HW RX filter: 0x%x\n", rfilt);

called on ath9k_configure_filter(). If I enable network manager I get:


$ dmesg | grep  "RX filter"
[ 1335.061829] ath: Set HW RX filter: 0x2207
[ 1335.064620] ath: Set HW RX filter: 0x2207
[ 1335.165433] ath: Set HW RX filter: 0x17
[ 1338.251359] ath: Set HW RX filter: 0x207
[ 1343.259071] ath: Set HW RX filter: 0x17
[ 1346.416966] ath: Set HW RX filter: 0x207
[ 1347.020106] ath: Set HW RX filter: 0x207
[ 1348.332275] ath: Set HW RX filter: 0x207

Your debug print should look something similar.

If we are setting the right filters then the issue is elsewhere.

Comment 37 John W. Linville 2010-04-14 17:27:39 UTC
Luis, thanks for the suggestion!  FWIW, I don't recall mucking-around during the backport in anything that should effect filtering.  Of course, there could be some subtle thing in the core networking code that has changed upstream but not in the RHEL5 2.6.18-based kernels.

Eric, could you try the debug option Luis describes above and post the results here?  Thanks!

Comment 38 Luis R. Rodriguez 2010-04-14 17:40:02 UTC
John, I figured nothing was changed in the backport for RX filtering which is why this is a funky issue.. Oh BTW you do need to recompile the kernel with ATH9K_DEBUG.

Comment 39 Eric Levinson 2010-04-14 17:52:49 UTC
Next chance I get I will follow the instructions for recompiling the kernel with the option ATH9K_DEBUG set at my next opportunity which will most likely tonight, and I'll dump out the filters.

BTW where is the best place to set this flag before I rebuild the kernel?

Thanks,
Eric

Comment 40 Luis R. Rodriguez 2010-04-14 17:57:05 UTC
For 2.6.32 it was under ath9k, so I suspect your 2.6.18 kernel will have it there as well.

Comment 41 John W. Linville 2010-04-14 18:13:42 UTC
Eric, the .config file in the root of the kernel tree needs to have a line that says "CONFIG_ATH9K_DEBUG=y".  The best way to accomplish that might depend on exactly what instructions you are following.  If you need more information, please provide a reference to the instructions you have.

Comment 42 Eric Levinson 2010-04-15 17:48:40 UTC
Okay, here is a newbie question.  Where do I download the Linux Kernel for 2.6.18-194.el5 for compiling?

The /usr/src/kernel doesn't make correctly.  I've read several documents on google that say that I need to download the kernel separately into my own directory and build it there instead of using the /usr/src directory, unless you have suggestions on getting it to work.

Thanks,
Eric

Comment 43 John W. Linville 2010-04-15 17:55:36 UTC
Might be easier just to start here:

   http://people.redhat.com/linville/kernels/rhel5/

Otherwise, I _think_ RHEL5 has yumdownloader available:

   yum install yum-utils
   yumdownloader --source kernel

Hopefully one of the options above is helpful!

Comment 44 Eric Levinson 2010-04-15 18:26:49 UTC
Yea, I saw the kernels in your director, but they look like they are for specific purposes.  I didn't see a plain vanilla one 2.6.18-194.el5 which is distributed with 5.5?
I'll play with the yum-utils and see if I can get it that way.

Thanks again,
Eric

Comment 45 Eric Levinson 2010-04-16 05:08:41 UTC
If I use yumdownloader --source kernel

I get a bunch of messages like 

No source RPM found for kernel-2.6.18-164.el5.x86_64
No source RPM found for kernel-2.6.18-164.11.1.el5.x86_64
No source RPM found for kernel-2.6.18-128.1.1.el5.x86_64
No source RPM found for kernel-2.6.18-8.1.1.el5.x86_64
No source RPM found for kernel-2.6.18-128.1.14.el5.x86_64
No source RPM found for kernel-2.6.18-128.1.16.el5.x86_64
Nothing to download

I then downloaded your kernel and installed it and it works great.  I downloaded your source, but the rpm won't install, so I extracted it to a folder. 

It looks like it is a bunch of patch files.  How do I build it?

Thanks,
Eric

Comment 46 John W. Linville 2010-04-16 12:55:05 UTC
"The rpm won't install"?  What sort of error does it produce?

The sequence should look something like this:

rpm -ihv http://people.redhat.com/linville/kernels/rhel5/rpms/kernel-2.6.18-195.el5.jwltest.107.src.rpm
cd /usr/src/redhat/SPECS
rpmbuild -bp kernel.spec
cd ../BUILD/kernel-2.6.18/linux-2.6.18
# edit Makefile to change EXTRAVERSION to something you will recognize
# edit .config to add "CONFIG_ATH9K_DEBUG=y"
make oldconfig
make
make modules_install install
# reboot into the kernel you just built

Comment 47 Eric Levinson 2010-04-17 02:57:26 UTC
When I launch the RPM file, I get a dialog, "Cannot install source packages"
and then it says No packages were given for installation" and then it exits.

I can open it with Archive Manager and extract all the files.

If I do an rpm --install I get:

warning: group mockbuild does not exist - using root
warning: user mockbuild does not exist - using root

over and over again, I don't see anything extracted.

Comment 48 Eric Levinson 2010-04-17 02:57:56 UTC
I should add that I redownloaded the file to make sure that wasn't the issue.

Comment 49 Eric Levinson 2010-04-17 05:11:47 UTC
Following your instructions I get:

[root@localhost SPECS]# rpmbuild -bp kernel-2.6.spec
error: Failed build dependencies:
        unifdef is needed by kernel-2.6.18-195.el5.jwltest.107.x86_64
[root@localhost SPECS]# 


There was no kernel.spec, only kernel-2.6.spec

Eric

Comment 50 John W. Linville 2010-04-18 18:37:23 UTC
Re: comment 47 -- the rpm --install behavior sounds normal.  The bit should end-up in the directories under /usr/src/redhat.

Re: comment 49 -- "yum install unifdef" should install the unifdef package.  You will probably need that whether you follow the recipe from comment 46 or whatever other resource you might have.

Comment 51 Eric Levinson 2010-04-19 08:27:41 UTC
Thanks for all your help.

I got the kernel to build.

If ATH9K debug is off the kernel builds and I can boot with it.

If the ATH9K debug is set then I get an error during building:

CC [M]  drivers/net/wireless/ath/ath9k/debug.o
drivers/net/wireless/ath/ath9k/debug.c: In function ‘read_file_wiphy’:
drivers/net/wireless/ath/ath9k/debug.c:379: error: implicit declaration of function ‘put_unaligned_le32’
drivers/net/wireless/ath/ath9k/debug.c:380: error: implicit declaration of function ‘put_unaligned_le16’
make[5]: *** [drivers/net/wireless/ath/ath9k/debug.o] Error 1
make[4]: *** [drivers/net/wireless/ath/ath9k] Error 2
make[3]: *** [drivers/net/wireless/ath] Error 2
make[2]: *** [drivers/net/wireless] Error 2
make[1]: *** [drivers/net] Error 2
make: *** [drivers] Error 2

Comment 52 John W. Linville 2010-04-19 22:16:52 UTC
I took the liberty of building test kernels w/ CONFIG_ATH9K_DEBUG enabled:

http://people.redhat.com/linville/kernels/rhel5/

If you would still rather build your own, you can apply the patch from here:

http://people.redhat.com/linville/kernels/rhel5/patches/jwltest-ath9k-debug.patch

Hth! :-)

Comment 53 Eric Levinson 2010-04-20 05:36:49 UTC
Thanks!  That makes it easier.

I booted with your 108 kernel, ran the follwing:

modprobe ath9k debug=0x00000200

The command

dmesg | grep "RX filter"


Doesn't return anything.  Did I get the right kernel?

Comment 54 Luis R. Rodriguez 2010-04-20 05:46:28 UTC
Can you attach your full log anyway

Comment 55 Eric Levinson 2010-04-20 07:59:39 UTC
Created attachment 407751 [details]
Dmesg output after running modprobe ath9k debug=0x00000200

Comment 56 Eric Levinson 2010-04-20 08:00:44 UTC
I've uploaded the output.  I booted your kernel:

kernel-2.6.18-195.el5.jwltest.108.x86_64.rpm

Thanks!

Comment 57 John W. Linville 2010-04-20 13:50:07 UTC
"ath9k: DMA failed to stop" -- maybe this isn't the same as the issue Luis reported afterall...

Comment 58 Eric Levinson 2010-04-30 05:42:48 UTC
Oh, okay.  Did you want me to do anything else?

Eric

Comment 59 Stanislaw Gruszka 2010-04-30 13:03:19 UTC
I did some debugging and for me looks like we do not get beacons from device after association. This make mod_beacon_timer call ieee80211_beacon_loss_work() and send next probe request to AP. After "tcpdump -i wlan0" we gets beacons from device.

I don't know why things works with other APs and not with WRT610N, not sure why it works on vanilla 2.6.32.

I have plan to change ath9k_bss_info_changed() in the same way as currently ath5k do:

ath5k_bss_info_changed:
        if (changes & BSS_CHANGED_ASSOC) {
                sc->assoc = bss_conf->assoc;
                if (sc->opmode == NL80211_IFTYPE_STATION)
                        set_beacon_filter(hw, sc->assoc);
                ath5k_hw_set_ledstate(sc->ah, sc->assoc ?
                        AR5K_LED_ASSOC : AR5K_LED_INIT);
        }

set_beacon_filter:
        rfilt = ath5k_hw_get_rx_filter(ah);
        if (enable)
                rfilt |= AR5K_RX_FILTER_BEACON;
        else
                rfilt &= ~AR5K_RX_FILTER_BEACON;
        ath5k_hw_set_rx_filter(ah, rfilt);


but have no time for that now.

Comment 60 Stanislaw Gruszka 2010-05-20 11:40:14 UTC
We really do not get beacons from device. However this seams to be unrelated with filter settings but with some other settings. I changed code to set permanently ATH9K_RX_FILTER_MYBEACON and then ATH9K_RX_FILTER_BEACON filter flags, and that not help with issue. Setting ATH9K_RX_FILTER_PROM give positive results but that is not right solution.

The change between upstream is RHEL5 that in rhel we do not use hardware encryption (we do not call ath9k_set_key()). I checked upstream, in vanilla 2.6.32 situation is the same, we do not get beacons from device when using nohwcrypt=1 module parameter, and device not work with WRT610N AP.

Good news is that nohwcrypt=1 works in wireless-next. However I have no idea which commit fix that. Luis, do you have any hint?

Comment 61 Stanislaw Gruszka 2010-05-21 12:53:43 UTC
(In reply to comment #60)
> The change between upstream is RHEL5 that in rhel we do not use hardware
> encryption (we do not call ath9k_set_key()). I checked upstream, in vanilla
> 2.6.32 situation is the same, we do not get beacons from device when using
> nohwcrypt=1 module parameter, and device not work with WRT610N AP.

That's not true. I think I booted different kernel during tests. 2.6.32 works well with nohwcrypt, sorry for bad information.

Difference we not get beacon on RHEL5 and get it in 2.6.32 must be in some other place.

Comment 62 Stanislaw Gruszka 2010-05-26 11:08:49 UTC
Created attachment 416766 [details]
Patch with additional printk for debug

Comment 63 Stanislaw Gruszka 2010-05-26 11:10:53 UTC
Created attachment 416769 [details]
Debug messages in RHEL5 (not get beacons after scan finished)

Comment 64 Stanislaw Gruszka 2010-05-26 11:12:16 UTC
Created attachment 416770 [details]
Debug messages in 2.6.32.7 with RHEL5 user space (receive beacons after scan)

Comment 65 Stanislaw Gruszka 2010-05-26 11:32:15 UTC
I tried to debug, but as far I can not find where problem is. For sure we do not get beacons after sw_scan_complete in RHEL5 kernel whereas we get them in upstream 2.6.32 kernel.

One of the difference is that setting operational channel (called from user space cfg80211_wext_siwfreq -> cfg80211_mgd_wext_siwfreq -> rdev_set_freq -> ath9k_config) is before ath9k_sw_scan_complete() in upstream kernel and after  ath9k_sw_scan_complete() in RHEL5 kernel (In attached logs, frequency of operational channel is different on RHEL5 and upstream, because we receive beacons during scan on different channels, upstream this is 2437 MHz in RHEL5 this is 2462 MHz) Not sure however if changing_channel/scan_complete order is the reason of lack of beacons.

I tried some hacks/workarounds to make device have similar settings regardless ordering of operational channel setup.  Workarounds are included in attached patch as comments. First I changed ath_update_chainmask(sc, conf_is_ht(conf)) to ath_update_chainmask(sc, true); to have "ath9k: tx chmask: 1, rx chmask: 3" in both cases. Second I called ath_beacon_config(sc, NULL) just after setting channel. Both workarounds do not change the situation. As far I have no other ideas :(

Comment 66 John W. Linville 2010-06-18 14:12:02 UTC
http://marc.info/?l=linux-wireless&m=127653628301304&w=2

Not obviously related, since that is for iwl3945.  But perhaps there is some insight there that could be useful here?  Just a thought...

Comment 67 John W. Linville 2010-06-18 14:16:20 UTC
Sorry, wrong link...

http://marc.info/?l=linux-wireless&m=127685767502521&w=2

Comment 68 Stanislaw Gruszka 2010-06-18 15:25:47 UTC
I tried something similar on aht9k, below patch fix the problem on my local setup, still not sure if this is right solution:

@@ -401,7 +401,7 @@ u32 ath_calcrxfilter(struct ath_softc *sc)
 
        rfilt = (ath9k_hw_getrxfilter(sc->sc_ah) & RX_FILTER_PRESERVE)
                | ATH9K_RX_FILTER_UCAST | ATH9K_RX_FILTER_BCAST
-               | ATH9K_RX_FILTER_MCAST;
+               | ATH9K_RX_FILTER_MCAST | ATH9K_RX_FILTER_MCAST_BCAST_ALL;


Luis, what you think?

Comment 69 Stanislaw Gruszka 2010-08-04 09:23:22 UTC
I think I will be not able to find better fix, so I post it to RKML. Since upstream works fine, I'm not posting patch upstream.

Comment 70 Luis R. Rodriguez 2010-08-04 19:00:51 UTC
What does "post it to RKML" mean? Posting your issue to the red hat kernel mailing list? Is this public or private?

Comment 72 Stanislaw Gruszka 2010-08-05 09:25:07 UTC
(In reply to comment #70)
> What does "post it to RKML" mean? Posting your issue to the red hat kernel
> mailing list? 
Posting a patch to red hat kernel mailing list, that's the way we commit patches into RH kernel.

> Is this public or private?  
Private.

Comment 73 RHEL Program Management 2010-08-06 05:50:00 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 75 RHEL Program Management 2010-09-02 19:15:09 UTC
Quality Engineering Management has reviewed and declined this request.  You may
appeal this decision by reopening this request.

Comment 76 Eric Levinson 2010-09-02 20:16:53 UTC
Forgive me, I am new to the Redhat Bugzilla community.

I realize this will not be included in any future revisions but is there something I can do to fix this on my instance?  is there any particular reason why the bug was closed as Wontfix?  This is a pretty critical issue with wireless.


Right now I have a crontab entry to run tcpdump every 1 minute and only capture 50 packets of data.  If I am lucky I get a connection withiin 2-3 minutes, although sometimes because of timing doesn't happen for 5 minutes after I give my password to unlock the keyring.

What are others doing?

I have a slightly different issue, but the symptoms and result is the same.  My debugging information didn't turn up filter issues.

I have to do a tcpdump before the wireless will get an IP address.

Thanks,
Eric

Comment 77 Stanislaw Gruszka 2010-09-03 06:26:25 UTC
(In reply to comment #76)
> Right now I have a crontab entry to run tcpdump every 1 minute and only capture
> 50 packets of data.  If I am lucky I get a connection withiin 2-3 minutes

Could you please try kernel
http://people.redhat.com/sgruszka/rhel5/bz621105/,
it have patch from comment 68 included, should fix the problem.

I belive this bug will be fixed ...

Comment 79 Eric Levinson 2010-09-04 21:22:47 UTC
That kernel didn't work for me.  Same issue.  dmesg:

ADDRCONF(NETDEV_UP): wlan0: link is not ready
[drm] Initialized drm 1.0.1 20051102
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 177
[drm] Initialized i915 1.8.0 20060929 on minor 0
mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining
set status page addr 0x02900000
wlan0: direct probe to AP 00:22:75:62:3d:b0 (try 1)
wlan0: deauthenticating from 00:22:75:62:3d:b0 by local choice (reason=3)
wlan0: direct probe to AP 00:22:75:62:3d:b0 (try 1)
wlan0: direct probe responded
wlan0: authenticate with AP 00:22:75:62:3d:b0 (try 1)
wlan0: authenticated
wlan0: associate with AP 00:22:75:62:3d:b0 (try 1)
wlan0: RX AssocResp from 00:22:75:62:3d:b0 (capab=0x411 status=0 aid=2)
wlan0: associated
ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
wlan0: no IPv6 routers present


It stuck there until I did a tcpdump then it associated and got an IP.

Comment 80 Stanislaw Gruszka 2010-09-06 12:58:52 UTC
Eric, what is your AP.

Comment 81 Eric Levinson 2010-09-06 21:46:38 UTC
I have a Belkin Wireless N Router.

My Linux box has a Dlink DWA 556 which comes up as an ath9K.

John Linville looked over my logs and we saw:

ath9k: DMA failed to stop

so he condluded it might not be the same problem.

Eric

Comment 83 Stanislaw Gruszka 2010-09-07 13:32:04 UTC
Eric, I will clone that bug to track your problem.

Patch from comment 68 fix issue that I can reproduce on my desk, which is most likely the same problem as Luis described in comment 0 and a few following comments. It's worth to have patch included in incoming release (and work
further to solve all problems :-)

Comment 85 Luis R. Rodriguez 2010-09-07 16:40:26 UTC
I should note that the core of the issue must be different since a vanilla 2.6.32 kernel works fine. The issue at hand is due to backports down to 2.6.18. I was unable to find the reason.

Comment 86 Linda Wang 2010-11-08 22:40:55 UTC

*** This bug has been marked as a duplicate of bug 621105 ***

Comment 87 Stanislaw Gruszka 2010-11-09 06:14:14 UTC
This is not a duplicate!

Since bug does not make RHEL5.6 moving to 5.7, I definitely would like to have this fixed.

Comment 88 Vince Spinelli 2010-11-18 18:51:23 UTC
Same issue here.  

Brand new Hewlett Packard Pavilion dm1z with an Atheros AR9285 chipset.

I actually came across this thread a long time after I'd tried 'everything else'.  So I can give a bit of 'cross-testing' feedback...

1- 5.4 won't even pick the card up with stock kernel.  Yum update to kernel x194-26 picks it up as a 9285 (which is good), but can't fire it up with Network Settings.

2- 5.5 sees it as a 9285 with stock kernel, Gnome Network Manager Applet 'sits forever' and then fails to gain an IP address.  Same behavior with yum updated x194-26 kernel.

3- 6.0 Beta crashes (locks the GUI up, but I think it kills the whole OS because if I'm jacked in with a hardline, I can't even ping it).

4- Fedora Core 12 through 14 with the latest Fedora spun kernels all pick up the adapter properly as a 9285 - AND THEY CONNECT.  However, here's the kicker, they all also result in the same lockup / crash behavior as 6.0 Beta.  I've tried with 3D ATI video drivers, with the stock RHEL packaged video drivers, with different memory chipsets (all of which pass memtest86 boot disc test) and with a different hard drive (the original passed Smart tests [one long and two short tests]).  

I am a bit beside myself here.

Also tried the /usr/sbin/tcpdump -i wlan0 command while at the same time having NetworkManager-applet connect to the network.  This method does work.  However it is obviously not an acceptable workaround, since I've got to leave the terminal open in order to maintain a connection (it drops out if I close the tcpdump window).

Unless RH is abandoning the 5.x series OS, then this bug shouldn't be closed until it's resolved.  My personal opinions of 6 aside (and I haven't vented them online as of yet because it's still a very 'young' version), 5.x is still supposed to be supported for some time now, at least that was my understanding of the life-cycle??

Comment 89 Vince Spinelli 2010-11-18 18:53:51 UTC
Apologies, didn't mention and wanted to be clear on this, the tcpdump 'trick' yielded a connection on version 5.5 with kernel x194-26 for me.

Comment 90 Vince Spinelli 2010-11-19 15:11:11 UTC
Ok, so I've got a functional workaround.

I post this only for the 'usability' factor - that others with the same problem may employ this work around in order to make use of their devices without pulling their hair out.  This is by no means a 'fix'...

On RHEL or clone systems (tested on versions 5.4 and 5.5 with latest x194-26 kernel), add the following line to your /etc/rc.d/rc.local file...

/usr/bin/nohup /usr/sbin/tcpdump -i wlan0 2>/dev/null 1>/dev/null &

This will allow tcpdump to run in effect indefinately on wlan0, but do so in the background, and pushing all output to the proverbial bit-bucket rather than clogging up your desktop or any disk-based file.  

On the tested AMD Turion processor, it's registering "0%" cpu and "0bytes" memory usage, so I'm willing to (at least in a preliminary sense) call it a 'negligable' resource usage.

Comment 91 Vince Spinelli 2010-11-19 22:28:57 UTC
Scratch that... Just adding that call in comment 90 doesn't do the job.  If your wireless adapter has an 'on / off' switch, such as on a laptop, or if for any reason tcpdump drops out, then you're back at square on.

So I put together this python daemon that sits in the background.  It can be launched from the command line or by inclusion in /etc/rc.d/rc.local.

It uses python 2.4 which is default RHEL 5.x series included.

Call it with...
[path-to-script]/workaround_bugzilla_543719.py [wifi-name]

So, if its in your home directory and you are after wlan0, then let's say...
[root@localhost]# /home/vince/workaround_bugzilla_543719.py wlan0

(or just add that string to rc.local)

It's on Sourceforge at... 
https://sourceforge.net/projects/wbz543719/

It's on my website at...
http://download.spinellicreations.com/workaround_bugzilla_543719/

From my perspective, it's effectively worked around at this point.  An actual bugfix would be preferable, but if we don't get one, this will do.  It's just a matter of getting it into the hands of the people that need it.  Hopefully Google will take care of that (lead them here).

Comment 92 Simon Matter 2010-12-09 12:51:19 UTC
(In reply to comment #68)
> I tried something similar on aht9k, below patch fix the problem on my local
> setup, still not sure if this is right solution:
> 
> @@ -401,7 +401,7 @@ u32 ath_calcrxfilter(struct ath_softc *sc)
> 
>         rfilt = (ath9k_hw_getrxfilter(sc->sc_ah) & RX_FILTER_PRESERVE)
>                 | ATH9K_RX_FILTER_UCAST | ATH9K_RX_FILTER_BCAST
> -               | ATH9K_RX_FILTER_MCAST;
> +               | ATH9K_RX_FILTER_MCAST | ATH9K_RX_FILTER_MCAST_BCAST_ALL;
> 

That seems to fix it for me too. I don't know what the patch breaks but it seems to fix my issues with not getting a IP address from certain DHCP servers.

Note, we have several of the exact same systems with different AP's and only one device shows the problems for whatever reason.

Simon

Comment 94 RHEL Program Management 2011-06-20 21:31:27 UTC
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 95 Stanislaw Gruszka 2011-09-24 15:32:56 UTC
I thought I would manage some time to look at this, but unfortunately I have more capacity problems than expected before, hence this bug will be not fixed. Good news is that we released RHEL6, where ath9k should work well.

Comment 96 moonshine 2012-02-24 15:07:28 UTC
oh,,,
mark up for centos6