Description of problem: On *some* networks, NM seems to have problems ever associating. The machine works fine on other wireless networks, so I'm not sure what triggers the problem. For example, I can connect to my Android phone hotspot (WPA2) fine, but when I try to connect to the particular network in the hotel room I am right now, it never successfully completes. It keeps disassociating, and asking for the WPA password over and over again. And I know the wireless network works fine. My phone can connect to it, and I can even connect to it on the same laptop if I kill NetworkManager and then just do it manually with wpa_supplicant. Version-Release number of selected component (if applicable): NetworkManager-0.9.8.0-1.fc18.x86_64 How reproducible: 100% reproducible on that particular network (essid: "lodgecondo", password "condo000", WPA2/psk). But it works on my home network, and with my phone hotspot. Steps to Reproduce: 1. Find a network that shows this issue 2. Try to connect to it 3. Run around naked with a chainsaw to let out your frustrations Actual results: This is from dmesg: wlan0: authenticate with 50:46:5d:02:85:08 wlan0: send auth to 50:46:5d:02:85:08 (try 1/3) wlan0: authenticated wlan0: associate with 50:46:5d:02:85:08 (try 1/3) wlan0: RX AssocResp from 50:46:5d:02:85:08 (capab=0x411 status=0 aid=16) wlan0: associated wlan0: disassociating from 50:46:5d:02:85:08 by local choice (reason=3) cfg80211: Calling CRDA to update world regulatory domain cfg80211: World regulatory domain updated: cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) cfg80211: (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) cfg80211: Calling CRDA for country: US cfg80211: Regulatory domain changed to country: US cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2700 mBm) cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 1700 mBm) cfg80211: (5250000 KHz - 5330000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) cfg80211: (5490000 KHz - 5600000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) cfg80211: (5650000 KHz - 5710000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 3000 mBm) cfg80211: (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 4000 mBm) Rinse and repeat. Notice the "wlan0: disassociating from..". Expected results: This is dmesg with just wpa_supplicant run by hand: wlan0: authenticate with 50:46:5d:02:85:08 wlan0: send auth to 50:46:5d:02:85:08 (try 1/3) wlan0: authenticated wlan0: associate with 50:46:5d:02:85:08 (try 1/3) wlan0: RX AssocResp from 50:46:5d:02:85:08 (capab=0x411 status=0 aid=16) wlan0: associated IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready The wireless works fine, I'm doing this bug-report with that wireless network right now. Just not with NetworkManager running. Additional info: This is a ChromeBook pixel with the ath9k chipset, with up-to-date Fedora18 as of today (except I obviously have my own kernel - but I checked, and it fails the same way with the currenf Fedora kernel too).
Created attachment 715950 [details] wpa_supplicant.log for failure cases This is the wpa_supplicant.log file with extra debug output as documented at https://live.gnome.org/NetworkManager/Debugging for a failed event (well, several failures, as far as I can tell. Does this make anybody go "oh, I know what's up?"
Created attachment 716083 [details] working trace from wpa_supplicant --ddKt It's cut off in the middle of a line because apparently wpa_supplicant ends up buffering standard output to a file, but I suspect nobody cares. It has successfully authenticated already and finished the connection. This comes with the PSK key visible, this is not some super-secure installation, it's a rental condo.
Created attachment 716290 [details] nonworking NM -> wpa_supplicant trace with -ddKt Ok, I couldn't get NetworkManager or dbus debugging to work, so I brute-forced it by renaming "wpa_supplicant" as "wpa_supplicant.real" and then have just a shell-script wpa_supplicant that added the -ddKt flags and then ran the real one. That worked, and here's the wpa_supplicant log with key data and timestamps for the non-working NetworkManager case. It definitely uses the right passphrase, and the resulting binary PSK key seems to be identical too. Any ideas?
Observations: In the working trace, wpa_supplicant is using WEXT, whereas the other is using nl80211. For some reason, when under nl80211, wpa_supplicant doesn't get a response for its EAPOL-Key 2/4 and the connection times out.
As on my reply to your private mail thread, the WEXT vs. nl80211 thing could well be an issue here, we need to figure out why the supplicant/kernel aren't connecting correctly with nl80211. If you haven't already, make sure you move NetworkManager out of the way so that systemd doesn't respawn it after you've stopped it (annoying, I know). Then run your successful supplicant test from comment #2, but when spawning the supplicant, use: wpa_supplicant -dddtuK -Dnl80211 -C /home/torvalds/wpa_conf.txt -i wlan0 and lets see if that works; that'll help point the blame finger in the right direction.
Created attachment 716576 [details] failing wpa_supplicant log (-Dnl80211) Yup, it seems to be related to -Dnl80211. This log was created with wpa_supplicant -dddtuK -Dnl80211 -i wlan0 -c wpa_conf.txt > nl80211.log
Created attachment 716586 [details] Same log, working, and without -Dnl80211 This should be very similar to the 716083 attachement, but I made it right after the failing case with the exact same flags except for dropping the -Dnl80211 part. Just in case the extra -d and -u flags make a difference.
I'm seeing this bug too. I'm using Fedora 18 with an Asus netbook. The campus wireless uses WPA2. The network connection works with this machine and Windows XP, Fedora 17 (before I upgraded to F18), and Ubuntu 12.10.
(In reply to comment #8) > I'm seeing this bug too. I'm using Fedora 18 with an Asus netbook. The > campus wireless uses WPA2. The network connection works with this machine > and Windows XP, Fedora 17 (before I upgraded to F18), and Ubuntu 12.10. Garrett, what kind of wifi card do you have? Running 'lspci' and 'lsusb' would certainly tell us; you could attach the output of those commands here.
Created attachment 736457 [details] Mitchener: output from lspci Output from lspci on my asus EEEPC netbook
Created attachment 736458 [details] Mitchener: output from lsusb Output from lsusb on my asus EEEPC netbook
(In reply to Garrett Mitchener from comment #8) > I'm seeing this bug too. I'm using Fedora 18 with an Asus netbook. The > campus wireless uses WPA2. The network connection works with this machine > and Windows XP, Fedora 17 (before I upgraded to F18), and Ubuntu 12.10. Garrett, are you still seeing this issue ?
So Garrett's machine has ath9k, which isn't as mature as other drivers, so that could be the cause of the problems, also the nl80211/wext thing. Logs from 'dmesg' would be helpful to see if the device simply loses sync with the AP or what.
Yes, I'd like ath9k logs along with some debugfs statistics to debug this issue.
Ok, I got a copy of the packet capture that Linus sent to Jouni and here are some observations: In the success case, MSG 2/4 is sent at MCS 3. In the failure case, MSG 2/4 is sent at MCS 13. From the wpa_s logs, MSG 2/4 appears to cause the timeout and eventual disconnect. This is a problem with the ath9k rate control, where it sends EAPOL packets at really high rates even in very congested environments for the reason that it starts from the *highest* rate and then starts ramping down based on the packet error rate. But, this will cause authentication timeouts because the algorithm needs time to come down to lower rates and a failed EAPOL packet breaks things. minstrel doesn't have this problem because it starts from the lowest rate and goes upward. I am not sure about the difference between nl80211/WEXT and why that would make a difference in rate selection. This has to be verified - so if someone who can reproduce this uses minstrel to check if the issue is still seen, that'd give us a clue.
(In reply to Sujith from comment #15) > Ok, I got a copy of the packet capture that Linus sent to Jouni and here are > some observations: > > In the success case, MSG 2/4 is sent at MCS 3. > In the failure case, MSG 2/4 is sent at MCS 13. Ah, great. That makes things easier to understand. Based on this, I'm now easily able to reproduce the issue and get capture files that look very similar to the one from Linus. > This is a problem with the ath9k rate control, where it sends EAPOL packets > at really high rates even in very congested environments for the reason that > it starts from the *highest* rate and then starts ramping down based on the > packet error rate. > > But, this will cause authentication timeouts because the algorithm needs > time to come down to lower rates and a failed EAPOL packet breaks things. Yes, that needs to be fixed. The behavior seemed to be to try two first data frames after association with MCS 13 and only then drop to MCS 3. Guess how many times the AP that Linus tried to use transmitted EAPOL-Key 1/4.. Just twice. The last success case worked due to another bug (in mac80211, I'd assume) that leaked a data frame that should not have been there. That was enough to convince rate control to drop TX rate for the second (and final with this AP) EAPOL-Key msg 2/4 attempt. I did not check what rates were included in the retransmit sequence, but apparently not low enough to make this work. This seems to match quite well with the description of WPA2-Personal working with phone hotspot or home network and not with a hotel network. I'd assume the first two have much better signal strength while some hotels I've been to would be in quite the opposite end of the scale.. ;-) (The one in the capture file was not really that bad, but it was not good enough to get anything like MCS 13 going through.) > minstrel doesn't have this problem because it starts from the lowest rate > and goes upward. I am not sure about the difference between nl80211/WEXT and > why that would make a difference in rate selection. I'd assume the nl80211 vs. WEXT part is unrelated. > This has to be verified - so if someone who can reproduce this uses minstrel > to check if the issue is still seen, that'd give us a clue. I can confirm that this resolves the issue that I'm seeing. For others who would like to test this, disabling CONFIG_ATH9K_RATE_CONTROL=y allows this to be worked around until ath9k rate control can be improved or the default rate control algorithm for ath9k gets changed.
(In reply to Jouni Malinen from comment #16) > The last success case worked due to another bug (in mac80211, I'd > assume) that leaked a data frame that should not have been there. That was > enough to convince rate control to drop TX rate for the second (and final > with this AP) EAPOL-Key msg 2/4 attempt. > I'd assume the nl80211 vs. WEXT part is unrelated. Actually, I think I need to take that back.. That "leaked data frame" is something that I think was recently identified as a bug with WEXT. So that bug may actually be enough to explain why the WEXT case was more likely to work here since there may be extra data frames before the 4-way handshake has been completed and with nl80211, those are dropped before given to the driver (which is the correct behavior).
(In reply to Jouni Malinen from comment #16) > I can confirm that this resolves the issue that I'm seeing. For others who > would like to test this, disabling CONFIG_ATH9K_RATE_CONTROL=y allows this > to be worked around until ath9k rate control can be improved or the default > rate control algorithm for ath9k gets changed. Is this an option for building the ath9k driver ...? I'll try to get some logs & such next time I'm on campus. What should I try to get from debugfs?
(In reply to Jouni Malinen from comment #17) > Actually, I think I need to take that back.. That "leaked data frame" is > something that I think was recently identified as a bug with WEXT. So that > bug may actually be enough to explain why the WEXT case was more likely to > work here since there may be extra data frames before the 4-way handshake > has been completed and with nl80211, those are dropped before given to the > driver (which is the correct behavior). Yes, frames 1257, 1268 and 1277 are sent before the handshake and they force the rate control to use lower rates based on rising packet error rate. They are NullFunc frames with the Power Management bit set, btw.
(In reply to Garrett Mitchener from comment #18) > Is this an option for building the ath9k driver ...? Yes, if CONFIG_ATH9K_RATE_CONTROL is not enabled in the kernel config minstrel is chosen. > I'll try to get some logs & such next time I'm on campus. What should I try > to get from debugfs? /sys/kernel/debug/ieee80211/phy#/xmit /sys/kernel/debug/ieee80211/phy#/netdev:wlan#/stations/<associated_ap>/rc_stats
For getting the debugfs stats, please use the default ath9k rate control so we know if excessive retries are happening.
Should I just make CONFIG_ATH9K_RATE_CONTROL have a "depends on BROKEN" or at least make it "default n" in the meantime. Right now it defaults to on, and is clearly not working well. I don't have access to a problematic network at the moment for testing, but I had issues at the hotel during my recent trip to Japan (sitting at Narita airport right now on the way back), being forced to use my *phone* for internet (USB tethering!) because the ath9k driver was unusable. I've changed my personal config to not have CONFIG_ATH9K_RATE_CONTROL enabled, but since the problem only shows up in certain situations, I don't know if I can verify that it fixes it. The wireless network in the United lounge at Narita doesn't have the problem, for example..
It can be changed to "default n". Do you want me to send a patch or would you prefer to just change it in mainline ?
(In reply to Sujith from comment #23) > It can be changed to "default n". Do you want me to send a patch or would > you prefer to just change it in mainline ? So the problem I see with just changing it to "default n" is that it won't fix people who already have it set because of legacy reasons. Which is why making it "depends on BROKEN" would actually be better. Alternatively, we could rename the option (so that old configs having the broken default don't trigger) and make the new name have "default n". Hmm?
> So the problem I see with just changing it to "default n" is that it won't > fix people who already have it set because of legacy reasons. > > Which is why making it "depends on BROKEN" would actually be better. > > Alternatively, we could rename the option (so that old configs having the > broken default don't trigger) and make the new name have "default n". Hmm? That's a better approach - we can use CONFIG_ATH9K_LEGACY_RATE_CONTROL and set it "n" by default.
Patch posted: https://patchwork.kernel.org/patch/2676971/
Also, this hasn't made it to 3.10 yet: https://patchwork.kernel.org/patch/2646451/ So, for now "iw dev wlan# set power_save off" can be used to improve link stability.
I merged the patches in question to the wireless.git tree and sent them along to Dave Miller a few minutes ago.
I updated my netbook today (2013-06-08) I tried connecting to the campus network again, and the network connection is a lot more stable than it has been. There have been a few updates to NetworkManager and the kernel, so maybe one of them helped. Also, our IT crew has been doing some work on the campus wireless network so maybe they changed something. Anyway, the connection worked for a just few minutes, then started going up and down again, and finally stabilized. I'm attaching various log files...
Created attachment 758623 [details] output of lsmod
Created attachment 758624 [details] output of lspci
Created attachment 758625 [details] output of dmesg
Created attachment 758626 [details] /var/log/messages
Created attachment 758627 [details] copy of /sys/kernel/debug/ieee80211/phy0/ath9k/xmit
Created attachment 758628 [details] copy of /sys/kernel/debug/ieee80211/phy0/netdev:wlan0/stations/<station address>/rc_stats
Created attachment 758629 [details] copy of /var/log/wpa_supplicant.log
Created attachment 758630 [details] rc_stats after the network connection stabilized
Created attachment 758631 [details] wpa_supplicant.log after the network connection stabilized
Created attachment 758632 [details] xmit after the network connection stabilized
By the way, the rc_stats one is hard to get because the subdirectory in /sys/kernel/debug/ieee80211/phy0/netdev:wlan0/stations/ corresponding to the hex address of the station keeps appearing and disappearing when the connection is unstable.
(In reply to Garrett Mitchener from comment #40) > By the way, the rc_stats one is hard to get because the subdirectory in > /sys/kernel/debug/ieee80211/phy0/netdev:wlan0/stations/ corresponding to the > hex address of the station keeps appearing and disappearing when the > connection is unstable. rc_stats shows that ath9k RC is still being used, please switch to minstrel.
I recompiled the kernel configured with ATH9K_RATE_CONTROL=no, and it didn't help. I'm about to attach the log files. I can still connect to the problem network if I do 'systemctl stop NetworkManager' and then run wpa_supplicant from a script, so I'm not convinced this is (just) a kernel driver problem.
Created attachment 763954 [details] output of lsmod kernel 3.9.5, with Fedora 18 patches, ATH9K_RATE_CONTROL=no
Created attachment 763955 [details] output of dmesg kernel 3.9.5, with Fedora 18 patches, ATH9K_RATE_CONTROL=no
Created attachment 763956 [details] /var/log/messages kernel 3.9.5, with Fedora 18 patches, ATH9K_RATE_CONTROL=no
Created attachment 763967 [details] copy of /sys/kernel/debug/ieee80211/phy0/netdev:wlan0/stations/<station address>/rc_stats kernel 3.9.5, with Fedora 18 patches, ATH9K_RATE_CONTROL=no
Created attachment 763968 [details] copy of /sys/kernel/debug/ieee80211/phy0/ath9k/xmit kernel 3.9.5, with Fedora 18 patches, ATH9K_RATE_CONTROL=no
Is "swerdna" the problematic network ?
The one I have the most trouble with is 'cofc-secure', the main campus network at the College of Charleston. It's WPA2 enterprise, with individual ids and passwords. 'swerdna' is one of the access points at my church, and I do sometimes have trouble there. I think it's WPA with one password. I'll collect some data next time I'm there.
This appears to be a problem with NM. The following snippet appears many times in the log: Jun 21 12:23:54 falkor NetworkManager[1309]: <info> Activation (wlan0/wireless): access point 'cofc-secure' has security, but secrets are required. Jun 21 12:23:54 falkor NetworkManager[1309]: <info> (wlan0): device state change: config -> need-auth (reason 'none') [50 60 0] Jun 21 12:23:54 falkor NetworkManager[1309]: <info> Activation (wlan0) Stage 2 of 5 (Device Configure) complete. Jun 21 12:23:54 falkor NetworkManager[1309]: <warn> No agents were available for this request. Jun 21 12:23:54 falkor NetworkManager[1309]: <info> (wlan0): device state change: need-auth -> failed (reason 'no-secrets') [60 120 7] Jun 21 12:23:54 falkor NetworkManager[1309]: <info> Marking connection 'cofc-secure' invalid. Jun 21 12:23:54 falkor NetworkManager[1309]: <warn> Activation (wlan0) failed for connection 'cofc-secure' Jun 21 12:23:54 falkor NetworkManager[1309]: <info> (wlan0): device state change: failed -> disconnected (reason 'none') [120 30 0] Jun 21 12:23:54 falkor NetworkManager[1309]: <info> (wlan0): deactivating device (reason 'none') [0]
It's July 30 and I'm still seeing this problem with a fully updated Fedora 19 netbook. The College of Charleston is hosting Flock to Fedora in just a couple of weeks and I can't connect to our campus network... What more can I do to help debug this?
When NM says "no agents were available for this request", that means nothing is listening for password requests to show you the dialog box. On Fedora 18, GNOME Shell does this, but it has a bug where if you restart NetworkManager for any reason, it does not reconnect to NetworkManager to listen for these requests. So if you've restarted NM anywhere, you also have to restart the shell with "Alt+F2" then type 'r' and hit return. Then it will reconnect to NM and start providing secrets.
No, that's not the problem, I'm seeing something else. It happens on a fresh boot, fresh log in, I get the gnome password dialog, type in my password, and it just won't connect. It repeatedly asks for my password, and yes I'm typing it correctly.
maybe this is the same.. https://bugzilla.kernel.org/show_bug.cgi?id=60690 but could be wrong
I think this bug might actually be a problem in wpa_supplicant Possibly related: bug #950518 here This next one is an old bug by date, but it's not marked fixed in the w1.fi bugzilla, so it might still be present: wpa_supplicant responds to all requests for identity, not just the ones relevant to this machine: http://w1.fi/bugz/show_bug.cgi?id=455 In working with our network engineers, we found that my laptop was generating lots of EAP requests, and it was enough to trigger some intrusion detection mechanism that deauthenticated it. So that might be a source of problems.
Also possibly related, #981679 but that one was apparently resolved in fedora 19, where my bug was not
minor correction, let's get the automagic link to bug #981679 this time
tough to say if this is wpa_supplicant(s) issue. should kernel developers break userspace? in anycase the bisect can be done killing nm-applet(NM) and manually using wpa_supplicant. good kernel will connect reliably, bad kernel connects once then never connects until you rmmod your module and reload it.
Possibly related: bug #929217 This one may be related or may be another problem inside anaconda: bug #892896
I'm seeing this on one notebook with an ath9k wireless card, and another with intel wireless, and another with something else (don't have access to that one right now). I don't think it's (just) a kernel problem.
Additional info: The glitch http://w1.fi/bugz/show_bug.cgi?id=455 shows up when I start wireshark without any special settings: Unless you change a setting, wireshark puts the wifi device into promiscuous mode, which triggers the messy behavior where wpa_supplicant replies to requests for identity intended for other machines, which results in lots of EAP packets, which triggers the intrusion detection mechanism. Now that I figured that out, I'm running wireshark in not-promiscuous mode, trying to get better packet captures and track down the problem. So maybe it's not in wpa_supplicant. *grr*
I guess a good rule is: https://lkml.org/lkml/2012/12/23/75 from the master!! anyways that's the once reliable connect, then after that no connection. to see this you need to : kill NM: service network-manager stop && killall nm-applet then wpa_supplicant -Dwext -I wlan* -c /donde/esta use CTRL-C to start and stop wpa_supplicant there you will see a reliable connection and an unreliable connection depending on the kernel version. on the current mainline here I get no connection at all until I apply my patch. if I git reset --hard 6565ec then I can toggle wpa_supplicant on and off
Ok, so this seems to be a kernel issue, right?
unless the author of the changes had decided to change the API _knowing_ that wpa_supplicant will break and informed them of the changes, then I think its safe to say its a kernel issue.
Still not sure where my particular wifi problems are coming from: I'm using a different computer, an Asus UX31A which has Intel centrino wifi. I'm using Fedora 19. I have to use kernel 3.9.5 because of a bug in 3.19. Using Gnome and NetworkManager, this notebook connects to the campus network (WPA2 enterprise) more or less okay, but then every few minutes, it disconnects and requests my password again. I managed to configure the problematic network manually, running wpa_supplicant without NetworkManager, and it seemed a lot more stable, but I haven't run it long enough to know for sure. That would indicate that my problem is caused by some problem between NetworkManager and my campus wifi network, and is probably not just a kernel issue, although it may have something to do with signal strength as reported by the kernel. Possibly related: bug #864906 bug #975244 (another ath9k weak signal report) bug #929217 (another ath9k weak signal report) bug #490493 (disconnections caused by automatic scanning for BSSIDS, and old bug report, but with recent comments indicating that it hasn't been resolved) http://askubuntu.com/questions/224999/ubuntu-wifi-roaming-problem-how-do-i-set-wireless-sensitivity-on-ubuntu https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1057582 https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1080634 https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1091610 https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1144720 https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1154559 https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1157601 One of the symptoms I sometimes see that relates to several of the above: I sign on to our network using NetworkManager, the icon in Gnome shows high strength, then immediately it roams to another access point with a much weaker signal. I'm working with campus network engineers-- there may be some other intrusion detection feature or load-balancing feature that's complicating things. Maybe related but not as promising / informative: bug #982429 (maybe a problem with where passwords are stored rather than my problem) bug #675226 (problem with NM and IPv6, which I believe we have on campus) https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1174259 (about WPA personal) bug #965791 https://lkml.org/lkml/2013/5/22/115 (another WPA2 network, but this one is LEAP, mine is PEAP; intel wifi; something about where secrets are stored; supposedly fixed in kernel-3.9.4, so it shouldn't be affecting me, but just in case) https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1176359 https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1182182 (Whatever is going on here must be pretty subtle. It seems to have a long history of being discovered, reported, not fixed, and put on the back burner, rediscovered, reported again, not fixed, put on the back burner.....)
By the way, we have a *lot* of access points around here, and NM seems to be responsible for a lot of roaming between them. Is NM possibly searching for the best bandwidth and somehow ends up preferring distant access points that have less load but too weak of a signal?
over here the problem is temporarily fixed with using my patch from: https://bugzilla.kernel.org/show_bug.cgi?id=60690 machine runs flawlessly with wireless. hopefully the experts come in and review their code.
should be fixed on the current Mainline or rawhid fedora 18 main thread here: https://bugzilla.redhat.com/show_bug.cgi?id=981445
In reply to comment 67 and comment 68: I'm stuck with kernel 3.9.5 for the moment because of bug #989763 I'll try the rawhide kernel (link in that bug report, supposedly fixes it) when I get a chance and see if it fixes this bug too.
Created attachment 789310 [details] /var/log/messages from 2013-08-22, my Asus Zenbook Part of /var/log/messages: I booted my Asus Zenbook UX31A and signed into Gnome. The dialog window appeared asking me to confirm my wifi password (WPA2-PEAP-MSCHAPv2). I clicked Ok, but it couldn't connect to the wifi. Then I stopped NetworkManager and ran wpa_supplicant manually, and was able to connect.
So far no luck with trying a 3.10 kernel -- until the graphics bug is fixed I can't tell if it fixes NetworkManager FYI: I tried some Ubuntu live images. The same problem is present in Ubuntu 13.04, except that the reconnection after deauthentication is smoother. However, I see no deauthentications at all with Ubuntu 12.10 live, and the network connection seems to be completely steady. So something happened to introduce this bug in between those versions of NetworkManager and/or the kernel?
Trying fedora 19 with kernel 3.11.1-200 -- Connection is slightly more stable at times, but I had trouble getting it to connect, and it still deauthenticates and disconnects, asks for my password again, same problem... Still no problems running wpa_supplicant manually with no NetworkManager
Aug 22 14:10:09 falkor NetworkManager[584]: <info> (wlp2s0): supplicant interface state: associating -> associated Aug 22 14:10:33 falkor NetworkManager[584]: <warn> Activation (wlp2s0/wireless): association took too long. Aug 22 14:10:33 falkor kernel: [ 120.077269] wlp2s0: deauthenticating from 00:0f:7d:3b:ed:23 by local choice (reason=3) At this point, we need the supplicant logs to see what's going on here. Can you: mv /usr/sbin/wpa_supplicant / killall -TERM wpa_supplicant /wpa_supplicant -dddtu and then reconnect to the network and wait for the problem to occur. Then please attach the supplicant output. You can get things back to normal by "mv /wpa_supplicant /usr/sbin/" when you're done. (There may be private information in these logs, so feel free to sanitize them or send them directly to my email address if you'd prefer them not to be public.)
Garrett: this is a wpa_supplicant bug: 1380298605.254235: RSN: PMK from pre-auth - hexdump(len=32): [REMOVED] 1380298605.254245: RSN: Replace PMKSA entry for the current AP 1380298605.254247: nl80211: Delete PMKID for 00:0f:7d:xx:xx:xx 1380298605.254345: wlp2s0: RSN: PMKSA cache entry free_cb: 00:0f:7d:xx:xx:xx reason=1 1380298605.254346: RSN: Flush PMKSA cache entry for 00:0f:7d:xx:xx:xx 1380298605.254347: nl80211: Delete PMKID for 00:0f:7d:xx:xx:xx 1380298605.254396: wlp2s0: RSN: PMKSA cache entry free_cb: 00:0f:7d:xx:xx:xx reason=0 1380298605.254397: wlp2s0: RSN: removed current PMKSA entry 1380298605.254400: wlp2s0: Request to deauthenticate - bssid=00:0f:7d:xx:xx:xx pending_bssid=00:00:00:00:00:00 reason=1 state=COMPLETED 1380298605.254401: wpa_driver_nl80211_deauthenticate(addr=00:0f:7d:xx:xx:xx reason_code=1) 1380298605.260269: wlp2s0: Event DEAUTH (12) received 1380298605.260274: wlp2s0: Deauthentication notification 1380298605.260275: wlp2s0: * reason 1 (locally generated) 1380298605.260276: Deauthentication frame IE(s) - hexdump(len=0): [NULL] 1380298605.260280: wlp2s0: CTRL-EVENT-DISCONNECTED bssid=00:0f:7d:xx:xx:xx reason=1 locally_generated=1 and happens when you have a WPA Enterprise network with many access points, that supports RSN Preauthentication, which yours does. The supplicant mishandles the PMKSA cache and flushes the entry for the current access point, which triggers a disconnect. What specific version of wpa_supplicant are you using?
That's with: wpa_supplicant-2.0-3.fc19.x86_64
(In reply to Garrett Mitchener from comment #75) > That's with: > > wpa_supplicant-2.0-3.fc19.x86_64 So there's really two bugs in this one here. First is the bug that Linus is describing, which could be fixed already; his report was about WPA-PSK networks. Second is your bug, which we've already identified as a supplicant bug, but that is specific to WPA-Enterprise networks (which use PMK caching). So best I think is to clone this bug for wpa_supplicant; use the "Clone this bug" link at the top-right of this page and set the component to wpa_supplicant. Thanks! We'll leave this bug open and try to see if Linus' bug has been fixed by now.
(In reply to Dan Williams from comment #76) > (In reply to Garrett Mitchener from comment #75) > > That's with: > > > > wpa_supplicant-2.0-3.fc19.x86_64 > > So there's really two bugs in this one here. First is the bug that Linus > is describing, which could be fixed already; his report was about WPA-PSK > networks. Second is your bug, which we've already identified as a > supplicant bug, but that is specific to WPA-Enterprise networks (which use > PMK caching). So best I think is to clone this bug for wpa_supplicant; use > the "Clone this bug" link at the top-right of this page and set the > component to wpa_supplicant. Thanks! Since I'm here, I'll just clone it. New supplicant bug is bug #1016707.
I'm going to go ahead and close this bug, since the original problem was an issue with ath9k rate control and has since been fixed in the kernel. NetworkManager asking the supplicant to use nl80211 instead of WEXT is the right thing to do, since WEXT is effectively dead, and any issues with nl80211 should get fixed kernel side. Lastly, Garrett's problem is a separate supplicant issue with PMKSA cache eviction, which is being tracked in another bug.