After updating to kernel version 3.7.6-102, I began experiencing spotty WiFi perfromance on one of my laptops that has an Atheros AR5009. Pinging my wireless router it starts out fine like this: PING 192.168.1.28 (192.168.1.28) 56(84) bytes of data. 64 bytes from 192.168.1.28: icmp_req=1 ttl=64 time=1.96 ms 64 bytes from 192.168.1.28: icmp_req=2 ttl=64 time=1.57 ms 64 bytes from 192.168.1.28: icmp_req=3 ttl=64 time=1.58 ms 64 bytes from 192.168.1.28: icmp_req=4 ttl=64 time=2.87 ms 64 bytes from 192.168.1.28: icmp_req=5 ttl=64 time=1.61 ms 64 bytes from 192.168.1.28: icmp_req=6 ttl=64 time=1.59 ms 64 bytes from 192.168.1.28: icmp_req=7 ttl=64 time=1.90 ms 64 bytes from 192.168.1.28: icmp_req=8 ttl=64 time=1.88 ms 64 bytes from 192.168.1.28: icmp_req=9 ttl=64 time=1.93 ms But then you see things like this: 64 bytes from 192.168.1.28: icmp_req=328 ttl=64 time=1.80 ms 64 bytes from 192.168.1.28: icmp_req=329 ttl=64 time=1.59 ms 64 bytes from 192.168.1.28: icmp_req=330 ttl=64 time=10107 ms 64 bytes from 192.168.1.28: icmp_req=331 ttl=64 time=9107 ms 64 bytes from 192.168.1.28: icmp_req=332 ttl=64 time=8107 ms 64 bytes from 192.168.1.28: icmp_req=333 ttl=64 time=7107 ms 64 bytes from 192.168.1.28: icmp_req=334 ttl=64 time=6107 ms 64 bytes from 192.168.1.28: icmp_req=335 ttl=64 time=5107 ms 64 bytes from 192.168.1.28: icmp_req=336 ttl=64 time=4107 ms 64 bytes from 192.168.1.28: icmp_req=337 ttl=64 time=3111 ms 64 bytes from 192.168.1.28: icmp_req=338 ttl=64 time=2114 ms 64 bytes from 192.168.1.28: icmp_req=339 ttl=64 time=1114 ms 64 bytes from 192.168.1.28: icmp_req=340 ttl=64 time=114 ms 64 bytes from 192.168.1.28: icmp_req=341 ttl=64 time=1.63 ms 64 bytes from 192.168.1.28: icmp_req=342 ttl=64 time=1.64 ms And this: 64 bytes from 192.168.1.28: icmp_req=447 ttl=64 time=2.82 ms 64 bytes from 192.168.1.28: icmp_req=448 ttl=64 time=1.88 ms 64 bytes from 192.168.1.28: icmp_req=450 ttl=64 time=9953 ms 64 bytes from 192.168.1.28: icmp_req=451 ttl=64 time=8953 ms 64 bytes from 192.168.1.28: icmp_req=452 ttl=64 time=7963 ms 64 bytes from 192.168.1.28: icmp_req=453 ttl=64 time=6963 ms 64 bytes from 192.168.1.28: icmp_req=454 ttl=64 time=5963 ms 64 bytes from 192.168.1.28: icmp_req=455 ttl=64 time=4963 ms 64 bytes from 192.168.1.28: icmp_req=456 ttl=64 time=3964 ms 64 bytes from 192.168.1.28: icmp_req=457 ttl=64 time=2965 ms 64 bytes from 192.168.1.28: icmp_req=458 ttl=64 time=1965 ms 64 bytes from 192.168.1.28: icmp_req=459 ttl=64 time=965 ms 64 bytes from 192.168.1.28: icmp_req=460 ttl=64 time=1.44 ms 64 bytes from 192.168.1.28: icmp_req=461 ttl=64 time=1.88 ms Notice that the ~10 second responses occur exactly 120 seconds apart. I see this problem also on another laptop with an Atheros AR9285, but it is not quite a severe as this. Ping response only goes to about 3 seconds and recovers quicker. I do not see this issue on 2 other laptops that use different WiFi drivers. I see this issue with both kernel version 3.7.6-102 and 3.7.9-101. I do not see this issue with kernel version 3.7.3-101.
Because my previous issues with this driver had all been related to my use of WEP, I took the extra step of switching to WPA/WPA-2 and also encryption disabled and the problem persisted regardless of the type of encryption being used (or not used).
I can confirm that this issue is still present in kernel-3.7.9-205.fc18.x86_64 , but not yet present in kernel-3.7.5-201.fc18.x86_64 (although connectivity with this kernel seems to be a lot more erratic, but I didn't do a proper comparison).
Still present in the latest kernel 3.8.1-201.fc18.x86_64 , every 120 seconds like clockwork. 64 bytes from 192.168.1.1: icmp_seq=111 ttl=64 time=2.45 ms 64 bytes from 192.168.1.1: icmp_seq=112 ttl=64 time=10.6 ms 64 bytes from 192.168.1.1: icmp_seq=124 ttl=64 time=536 ms 64 bytes from 192.168.1.1: icmp_seq=232 ttl=64 time=20.1 ms 64 bytes from 192.168.1.1: icmp_seq=244 ttl=64 time=365 ms 64 bytes from 192.168.1.1: icmp_seq=245 ttl=64 time=5.13 ms
I'm not sure if this is related, but when pinging I see the following: 64 bytes from 192.168.178.1: icmp_seq=136 ttl=64 time=5.55 ms 64 bytes from 192.168.178.1: icmp_seq=137 ttl=64 time=4.30 ms 64 bytes from 192.168.178.1: icmp_seq=138 ttl=64 time=4.42 ms 64 bytes from 192.168.178.1: icmp_seq=139 ttl=64 time=4.30 ms 64 bytes from 192.168.178.1: icmp_seq=140 ttl=64 time=5.54 ms 64 bytes from 192.168.178.1: icmp_seq=146 ttl=64 time=4.38 ms 64 bytes from 192.168.178.1: icmp_seq=147 ttl=64 time=4.68 ms 64 bytes from 192.168.178.1: icmp_seq=148 ttl=64 time=5.92 ms 64 bytes from 192.168.178.1: icmp_seq=149 ttl=64 time=6.17 ms 64 bytes from 192.168.178.1: icmp_seq=150 ttl=64 time=4.18 ms 64 bytes from 192.168.178.1: icmp_seq=152 ttl=64 time=402 ms 64 bytes from 192.168.178.1: icmp_seq=158 ttl=64 time=5.25 ms basically there are gaps in the icmp sequence of 5 packages every 5 seconds. This behaviour is quite noticeable when loading a website or viewing movies.
Additional info: kernel version: 3.8.1-201.fc18.i686.PAE wifi driver: rt2800usb.
On these 120sec gaps, interesting. Can someone please post the dmesg log? Interesting its with WPA/WPA2 or WEP. Perhaps something in the logs will help isolate.
Created attachment 710599 [details] dmesg log Attaching dmesg log as requested. Nothing was logged during my ping test. This is basically log messages from before i started the ping command and represents messages form boot up and log in.
Hmm..Nothing really obvious there. Are you using the bluetooth at the same time? Try to disable it and retest if so. BT and wifi share antenna, lets eliminate that if so.
My hardware does not support Bluetooth. I use a USB Bluetooth adapter when I need to do Bluetooth, so no shared antenna. I did not have the Bluetooth dongle inserted during this.
I did some more troubleshooting of this. I realized that if instead of pinging the gateway, I instead pinged my server, I could run tcpdump there to see if the delay was on the transmit or receive side of things. I also did pings with an interval of 2 seconds instead of one to see if this was a 120 pings or 120 seconds issue. The results were: 1. It is definitely a 120 second interval issue. 2. It is a problem with data being received. tcpdumps on the server showed the echo requests being received at 1 second intervals (after i switched back to that interval) and the echo replies were transmitted at exactly 1 second intervals. 3. a tcpdump on the affected laptop showed the same delay in the tcpdump data as in the ping output, so whatever the issue is it happens before tcpdump sees the data also. I should point out that this is obviously not restricted to pings. The original issue was that it became impossible to use the laptop to ssh to my server and edit files using vi. I should probably also state that this still fails with linux-kernel-3.8.3-103.fc17.x86_64.
Oh, and it seems that if you are unfortunate in timing and run into this delay during an arp request, you can end up with commands failing with "no route to host" error.
I see this in the log: ath9k 0000:09:00.0 wlan0: disabling HT/VHT due to WEP/TKIP use Which in itself isn't too weird. But made me wonder: can you post output of: modinfo ath9k If hardware encryption is on, let's eliminate that part with s/w encryption and see if that helps. Try this: try the test loading ath9k with hardware encrytption off I think you need modprobe ath9k nohwcrypt=1 parm: nohwcrypt:Disable hardware encryption (int) Can you test this?
(In reply to comment #12) > I see this in the log: > ath9k 0000:09:00.0 wlan0: disabling HT/VHT due to WEP/TKIP use > > Which in itself isn't too weird. But made me wonder: can you post output of: > modinfo ath9k Here is the output of the command: modinfo ath9k filename: /lib/modules/3.8.4-102.fc17.x86_64/kernel/drivers/net/wireless/ath/ath9k/ath9k.ko license: Dual BSD/GPL description: Support for Atheros 802.11n wireless LAN cards. author: Atheros Communications alias: platform:qca955x_wmac alias: platform:ar934x_wmac alias: platform:ar933x_wmac alias: platform:ath9k alias: pci:v0000168Cd00000036sv*sd*bc*sc*i* alias: pci:v0000168Cd00000037sv*sd*bc*sc*i* alias: pci:v0000168Cd00000034sv*sd*bc*sc*i* alias: pci:v0000168Cd00000033sv*sd*bc*sc*i* alias: pci:v0000168Cd00000032sv*sd*bc*sc*i* alias: pci:v0000168Cd00000030sv*sd*bc*sc*i* alias: pci:v0000168Cd0000002Esv*sd*bc*sc*i* alias: pci:v0000168Cd0000002Dsv*sd*bc*sc*i* alias: pci:v0000168Cd0000002Csv*sd*bc*sc*i* alias: pci:v0000168Cd0000002Bsv*sd*bc*sc*i* alias: pci:v0000168Cd0000002Asv*sd*bc*sc*i* alias: pci:v0000168Cd00000029sv*sd*bc*sc*i* alias: pci:v0000168Cd00000027sv*sd*bc*sc*i* alias: pci:v0000168Cd00000024sv*sd*bc*sc*i* alias: pci:v0000168Cd00000023sv*sd*bc*sc*i* depends: ath9k_hw,ath9k_common,mac80211,ath,cfg80211 intree: Y vermagic: 3.8.4-102.fc17.x86_64 SMP mod_unload parm: debug:Debugging mask (uint) parm: nohwcrypt:Disable hardware encryption (int) parm: blink:Enable LED blink on activity (int) parm: btcoex_enable:Enable wifi-BT coexistence (int) parm: enable_diversity:Enable Antenna diversity for AR9565 (int) As you can see hardware encryption is already disabled, and that could not be an issue anyway because if you read comment #1 I said I tried turning off encryption on my WiFi and that did not help either.
Oh I should mention that I have other systems with non Atheros WiFi and this problem is ONLY in the ath9k driver as far as I can determine.
So not hardware encrypt issue it would seem. This device is a/g/n: do you know if you are connecting to 802.11a or 802.11g? Not obvious from dmesg. I'm wondering if the scan for APs (a lot of channels on 802.11a and can take a long time in certain cases). Might be off-channel too long. So a couple more ideas for now: I know you said your hardware doesn't support BT but it appears the driver still has it enabled, i.e. parm: btcoex_enable:Enable wifi-BT coexistence (int) Maybe the driver is getting long timeout waiting for no hardware. It should always be off if you have no hardware. Can you try btcoex_enable=0? Also, a supplicant log would likely help here if it's a scan issue. /var/log/wpa_supplicant.log. May need to add a -d to command line to get helpful level of detail. Can you please add this file? I will mention I have seen something of this sort on RHEL with another driver using WEP, on my list to solve still. May not be relevant, but interesting. If nothing pops out of this, next step is likely a look at Peter's comment: >I can confirm that this issue is still present in kernel-3.7.9-205.fc18.x86_64 , >but not yet present in kernel-3.7.5-201.fc18.x86_64 (although connectivity with >this kernel seems to be a lot more erratic, but I didn't do a proper comparison).
So far all i have is the wpa_supplicant.log file. This seems to be the relevant thing. I was doing a ping in one window and tailing the log in a second one, the long ping response delay corresponds to when the supplicant was doing a periodic scan for available networks. I will restrict my access point for wirelss modes to see if the mode a/b/g/n makes any difference. As far as the bluetooth parameter goes, I am clearly doing something wrong. I tried doing "modprobe ath9k btcoex_enable=0" and modinfo still showed it enabled. So, I made up an ath9k file in /etc/modprobe.d which contained the line: options ath9k btcoex_enable=0 and rebooted. Still no joy. Can you tell me what I am doing wrong here?
Created attachment 736400 [details] wpa-supplicant.log file
My access point only does b g and n modes, it does not do A. I locked the access point to each mode only and retried the ping test. The mode at which I was associated made no difference.
The log file helps, thanks for that. Your are on wpa_supplicant 1.0_rc3 All through the log I see this stuff: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/1/BSSs/11 nl80211: Event message available Which gets more invasive as the log goes on, crazy at the end..The log doesn't have timestamps, but all this stuff may be a big part of your 120 delays I'm thinking. BTW: (can add timestamps to supplicant -t option, might circle back to this.) There is an issue on the web about this problem from a while back. Background if you care: https://mail.gnome.org/archives/networkmanager-list/2011-February/msg00027.html which mentions a fix, I don't find it in latest supplicant 1.1_rc3, I need to do some checking to get updated state of this. Till then, would you be willing to update your supplicant package to that or later version?
Quick update: got some info that will help us decide if issue is in driver or above (supplicant/NM). In your wireless profile editor, set the BSSID to the one of the AP you are currently connected to. In your supplicant file its: Current BSS: 10:6f:3f:63:81:b4 level=-52 This should eliminate much of the roaming scanning it the background likely the cause of issue you see. Can you check that and let me know if the 120 delay improves?
Created attachment 737462 [details] New wpa_supplicant.log Information you requested. This time it includes timestamps and the line you asked me to add to wpa-supplicant.conf. I also changed form -d to -dd to get more detail. HTis is a log from the entire run of the ping -c 1000 192.168.1.28 command Note that the times that the issue occurs corresponds to these 2 lines appearing in the log file. 1366327957.788401: nl80211: Scan SSID - hexdump_ascii(len=0): [NULL] 1366327957.796279: Scan requested (ret=0) - scan timeout 30 seconds I don't think the other errors relate to this issue. It just seems that the periodic scanning for new SSID's by Network manger triggers this issue.
Oh and just in case you missed earlier comments. I have several other configurations using different wireless devices and it is only dives using the ath9k driver that experience this issue.
(In reply to comment #21) > Created attachment 737462 [details] > New wpa_supplicant.log > > Information you requested. This time it includes timestamps and the line > you asked me to add to wpa-supplicant.conf. I also changed form -d to -dd > to get more detail. > > HTis is a log from the entire run of the ping -c 1000 192.168.1.28 command > > Note that the times that the issue occurs corresponds to these 2 lines > appearing in the log file. > > 1366327957.788401: nl80211: Scan SSID - hexdump_ascii(len=0): [NULL] > 1366327957.796279: Scan requested (ret=0) - scan timeout 30 seconds > > I don't think the other errors relate to this issue. It just seems that the > periodic scanning for new SSID's by Network manger triggers this issue. Thanks for the log. You are correct about scan timeout 30 seconds. That is just the timer doing the periodic scan starting the scan. Scans are taking about 4 seconds in this log, way too little to cause much pain. The most concerning thing is the stuff at the very end of the log (flurry of nl80211 message apparently not handled or recognized). The log terminates but takes 94 seconds for what is there. This is in the realm of the delay you see. Have you been able to try the test in comment 20? It doesn't have to be that exact AP, any single AP will do. Set the BSSID to any single one in the profile, then run your ping test. This eliminates the scanning part of this issue, one way or other. Can you do that please?
(In reply to comment #23) > (In reply to comment #21) > > Created attachment 737462 [details] > > New wpa_supplicant.log > > > > Information you requested. This time it includes timestamps and the line > > you asked me to add to wpa-supplicant.conf. I also changed form -d to -dd > > to get more detail. > > > > HTis is a log from the entire run of the ping -c 1000 192.168.1.28 command > > > > Note that the times that the issue occurs corresponds to these 2 lines > > appearing in the log file. > > > > 1366327957.788401: nl80211: Scan SSID - hexdump_ascii(len=0): [NULL] > > 1366327957.796279: Scan requested (ret=0) - scan timeout 30 seconds > > > > I don't think the other errors relate to this issue. It just seems that the > > periodic scanning for new SSID's by Network manger triggers this issue. > > Thanks for the log. You are correct about scan timeout 30 seconds. That is > just the timer doing the periodic scan starting the scan. Scans are taking > about 4 seconds in this log, way too little to cause much pain. > > The most concerning thing is the stuff at the very end of the log (flurry of > nl80211 message apparently not handled or recognized). The log terminates > but takes 94 seconds for what is there. This is in the realm of the delay > you see. > > Have you been able to try the test in comment 20? > > It doesn't have to be that exact AP, any single AP will do. Set the BSSID > to any single one in the profile, then run your ping test. > This eliminates the scanning part of this issue, one way or other. > > Can you do that please? This is part of what I don't understand. I had that line in the wpa_supplicant.conf file during that last log collection, and it is still doing the scan. I am not sure why. I added that line then rebooted with the affected kernel and captured that log.
I just double checked by trying again and with this configuration in /etc/wpa_supplicant/wpa_supplicant.conf: ctrl_interface=/var/run/wpa_supplicant ctrl_interface_group=wheel Current BSS: 10:6f:3f:63:81:b4 level=-52 It still scans every 120 seconds.
Is there something I also need to put in the NetowrkManager.conf file to stop it from initiating WIFI scans?
For our purposes you you should be able to do it where you create the connection in the GUI. (Assuming you are using one.) In KDE for instance, click network manager icon, select wireless. There should be several profiles in there (which NM will use). Edit the one of interest, you should see a BSSID field. Just add the BSSID of the AP you are connected to in this field. You may need to reconnect afterward. Gnome has similar setup I believe. Does this help?
Created attachment 739388 [details] wpa_supplicant.log with BSSID specified in Network Manager With the BSSID specified in Network Manger, it does not do the scans and the performance issue disappears. Here are the statistics for 1000 pings. As you can see the longest time was ~17 milliseconds as opposed to the previous ~10 seconds. --- 192.168.1.28 ping statistics --- 1000 packets transmitted, 1000 received, 0% packet loss, time 1000276ms rtt min/avg/max/mdev = 1.369/1.984/17.282/0.771 ms
Created attachment 739389 [details] wwpa_supplicant.log while doing "iwlist scanning" Here is a log file for a period during which I was using the "iwlist scanning" command to force scans, which caused the ping issue every time I forced a scan.
As to the stopping scans you really can't stop them or you will never find or connect to anything, the trick is the balance of how many. From the log it appears the scan occur about every 23 seconds. If you aren't concerned about being mobile at this location this solution would work fine it appears. Those ping times are reasonable for most thing: the question is does that fit the need of your real work, not just pings now? If so, we can close your issue. I'll check around for new fixes at this area when I get time.
I am running fine using kernel 3.8.4-102.fc17.x86_64 with this workaround. This works for me now.
Good deal. Would like a cleaner solution that would focus on less invasive scanning by NM, etc. That said, I'll look at the flurry of nl80211 msgs I found at the end of logs more from kernel perspective.. Not sure driver/kernel is totally innocent either. Bill: is the workaround good enough that this issue can be closed for you?
Well you are asking 2 different questions here. This does fix the issue for me because the 2 devices i have the issue with are not really mobile so they are at 2 different locations where they only ever have to connect to a single AP. The problem is for the general case where people using this driver actually depend on being able to roam to varyi9ng public WiFi hotspots. In that scenario this is entirely unworkable.
Yes, it is two different questions. I intended that ;) I agree it's only a workaround for mobile users and an irritating one at that. Mostly wanted to get you workable and shrink the problem domain. Which it did.. Will leave this open a place holder for the possible kernel issue issue for me. Re-read this today: you might consider updating the wpa_supplicant, NetworkManager, crda packages: they are all outside the kernel and may give you some better relief. Did you try this and I just missed it?
Problem is if it is not a package I will then get updates for is maybe something i can figure out how to do once, but if no one is maintaining packages that auto update for fedora 17 I might be stuck with things breaking later on. I am very busy at this point on my real job and really don't have a lot of time to spend on figuring out how to keep this updated versus other updates that get auto-updated.
There seem to be new packages in testing for fedora 18 i will try upgrading to fc18 and then loading the ones from testing
I did look at Network manager config file, no much jumped out to help you there either. As to figuring out what is out there, it's pretty easy for these guys: wanna take a shot at it? here you go. I'm on F17 still, latest: yum list crda Installed Packages crda.x86_64 1.1.3_2013.02.13-2.fc17 yum list NetworkManager NetworkManager.x86_64 1:0.9.6.4-3.fc17 $yum list wpa_supplicant wpa_supplicant.x86_64 1:1.0-1.fc17 What is available for you? yum list available NetworkManager wpa_supplicant crda and what ever suits you yum update <package> Again, this might help: still plan work to see if driver might have issues "soon".
This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Anything new Bill? Have you tried new packages? It's been a couple months, will close this if it's fixed.
(In reply to John Greene from comment #39) > Anything new Bill? Have you tried new packages? It's been a couple > months, will close this if it's fixed. I am unable to reproduce this issue under fedora 19 with kernel 3.9.9-301.fc19.x86_64.
That's great news.. I'll close the bug as fixed in 3.9.9-301.fc19. Can you let me know for the record whether you believe any user side packages above were updated and had some bearing on the solution or was it just a kernel update? There are 213 changes in ath9k from 3.7 to 3.9 alone, 105 fromn 3.8 forward.. It's hard to guess what might have addressed this kernel side without more testing. Can you let me know for the record whether you believe any user side packages above were updated and had some bearing on the solution or was it just a kernel update? Thanks Bill.
(In reply to John Greene from comment #41) > That's great news.. I'll close the bug as fixed in 3.9.9-301.fc19. Can you > let me know for the record whether you believe any user side packages above > were updated and had some bearing on the solution or was it just a kernel > update? > > There are 213 changes in ath9k from 3.7 to 3.9 alone, 105 fromn 3.8 > forward.. > It's hard to guess what might have addressed this kernel side without more > testing. > > Can you let me know for the record whether you believe any user side > packages above were updated and had some bearing on the solution or was it > just a kernel update? Thanks Bill. OK more information. It is definitely just a difference in kernel versions. With the same tools I tested with the 3 kernels I still had available on the most affected system and narrowed this down a bit further. All testing was using the same tools. The only change was the kernel version. kernel-3.9.6-200.fc18.x86_64 - affected kernel-3.9.8-300.fc19.x86_64 - not affected
Bill, That's great work! Thanks for the info. I'll certainly look into this. I'm curious. Closing Fixed->next release. Changing the fixed version to kernel-3.9.8-300.fc19.x86_64, which is earlier than the 3.9.9-301 originally thought, should help isolate issues for future interests.