Bug 1102365

Summary:

Qualcomm Atheros AR93xx disconnects immediately

Product:

[Fedora] Fedora

Reporter:

Eric Griffith <EGriffith92>

Component:

NetworkManager

Assignee:

fedora-kernel-wireless-ath

Status:

CLOSED EOL

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

CC:

dcbw, EGriffith92, gansalmon, itamar, jogreene, jonathan, kernel-maint, madhu.chinakonda, mchehab, sujith

Target Milestone:

---

Keywords:

Desktop

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-06-29 20:51:14 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
First journal log file.	none
"-dd" output of wpa_supplicant	none
"-dd -t" output of wpa_supplicant	none
"-dd -t" Output of wpa_supplicant with a Network Key Rotation Time of 0	none
INTEL: "-dd -t" output of wpa_supplicant, with a Network Key Rotation Time of 3600	none

Description Eric Griffith 2014-05-28 20:20:32 UTC

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Eric Griffith 2014-05-28 20:35:21 UTC

Accidentally hit enter. Lets try this again... 

==VERSIONS==
Kernel version: Linux erics-desktop 3.14.4-200.fc20.x86_64 #1 SMP Tue May 13 13:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

NetworkManager version: 0.9.9.0-38.git20131003.fc20

WPA Supplicant version: wpa_supplicant v2.0

==HARDWARE IN QUESTION==
02:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
        Subsystem: Qualcomm Atheros Device 3112
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 36
        Region 0: Memory at fe900000 (64-bit, non-prefetchable) [size=128K]
        Expansion ROM at fe920000 [disabled] [size=64K]
        Capabilities: <access denied>
        Kernel driver in use: ath9k
        Kernel modules: ath9k

==MODULE OPTIONS== 
options ath9k nohwcrypt=1
options ath9k nohwcrypt=0

Neither setting changes outcome of problem. 

==Problem==
Finally got around to installing Fedora 20 on a spare hard drive for my desktop. I'm currently connected to the network via tethering from my phone via usb. If I unplug my phone NetworkManager jumps in and tries to connect to wifi.. it eventually does. It will stay connected (visibly) but in the background its total chaos and dmesg is getting flooded. 

Proof of connection: 

[egriffith@erics-desktop ~]$ iwconfig
wlp2s0    IEEE 802.11abgn  ESSID:"WhoKnows-5Ghz"  
          Mode:Managed  Frequency:5.785 GHz  Access Point: 40:16:7E:59:A7:44   
          Bit Rate=6.5 Mb/s   Tx-Power=30 dBm   
          Retry short limit:7   RTS thr:off   Fragment thr:off
          Power Management:off
          Link Quality=59/70  Signal level=-51 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:25  Invalid misc:1   Missed beacon:0

p3p1      no wireless extensions.

lo        no wireless extensions.

enp0s16f0u1  no wireless extensions.



And a few seconds later, if I re-run it... 



Proof of disconnect:

[egriffith@erics-desktop ~]$ iwconfig
wlp2s0    IEEE 802.11abgn  ESSID:off/any  
          Mode:Managed  Access Point: Not-Associated   Tx-Power=30 dBm   
          Retry short limit:7   RTS thr:off   Fragment thr:off
          Power Management:off
          
p3p1      no wireless extensions.

lo        no wireless extensions.

enp0s16f0u1  no wireless extensions.


Meanwhile in dmesg...


[ 1402.417110] wlp2s0: deauthenticated from 40:16:7e:59:a7:44 (Reason: 15)
[ 1402.436890] cfg80211: Calling CRDA to update world regulatory domain
[ 1402.441232] cfg80211: World regulatory domain updated:
[ 1402.441238] cfg80211:  DFS Master region: unset
[ 1402.441241] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[ 1402.441244] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm)
[ 1402.441247] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm)
[ 1402.441250] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm)
[ 1402.441252] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 2000 mBm)
[ 1402.441254] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm)
[ 1402.441257] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm)
[ 1402.441290] cfg80211: Calling CRDA for country: US
[ 1402.443796] cfg80211: Regulatory domain changed to country: US
[ 1402.443800] cfg80211:  DFS Master region: unset
[ 1402.443802] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[ 1402.443804] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 3000 mBm)
[ 1402.443806] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 1700 mBm)
[ 1402.443807] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz), (N/A, 2300 mBm)
[ 1402.443809] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 3000 mBm)
[ 1402.443810] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 4000 mBm)
[ 1404.194907] wlp2s0: authenticate with 40:16:7e:59:a7:44
[ 1404.200138] wlp2s0: send auth to 40:16:7e:59:a7:44 (try 1/3)
[ 1405.204415] wlp2s0: authenticated
[ 1405.219087] wlp2s0: associate with 40:16:7e:59:a7:44 (try 1/3)
[ 1406.214360] wlp2s0: associate with 40:16:7e:59:a7:44 (try 2/3)
[ 1407.219249] wlp2s0: associate with 40:16:7e:59:a7:44 (try 3/3)
[ 1407.222431] wlp2s0: RX AssocResp from 40:16:7e:59:a7:44 (capab=0x11 status=0 aid=2)
[ 1407.222487] wlp2s0: associated
[ 1415.409345] wlp2s0: deauthenticated from 40:16:7e:59:a7:44 (Reason: 15)


And it loops like that, over and over. Always "Reason: 15"

Attached is a log file of everything "journalctl -b" gave. 


Thoughts? "Use Ethernet" isn't a valid solution. 

For the record, same issues happened on 3.11 (default kernel for installation media) so this is not a recent regression unless it got fixed and then re-broken sometime inbetween.

All currently available updates are installed.

Comment 2 Eric Griffith 2014-05-28 20:36:17 UTC

Created attachment 900132 [details]
First journal log file.

Comment 3 John Greene 2014-05-29 13:02:28 UTC

Reason 15 is 4 way handshake timeout.  THere are several places and reasons why this could happen.  Are you willing to generate a debug log?  If so, here's how:


/etc/sysconfig/wpa_supplicant:OTHER_ARGS="-u -f /var/log/wpa_supplicant.log -P /var/run/wpa_supplicant.pid"
Can you could add -dd to the wpa_supplicant command line during the issue:
it's usually in:
/etc/sysconfig/wpa_supplicant
add -dd to the OTHER_ARGS list then either reboot, or disable wifi and restart.

and then send me a copy of /var/log/messages/wpa_supplicant.log while the connection is made and then drops off.

Be sure to go back and take the -dd out after as the log will get huge. We can at least see more where the issue is.

Comment 4 Eric Griffith 2014-05-30 17:18:21 UTC

Sorry, been busy the last couple days. I've got no problem debugging this as much as needed, will post the requested logs late tonight (EST). 

I will say: Whatever is going on is a driver / firmware issue of some kind that is restricted to Linux. Under Windows 7 this adapter is working perfectly fine and I couldn't be happier. So its not buggy hardware, buggy router, or interference.

Comment 5 Eric Griffith 2014-05-31 05:56:40 UTC

Created attachment 901030 [details]
"-dd" output of wpa_supplicant

Requested Info.

Comment 6 John Greene 2014-06-02 15:20:54 UTC

Thanks for the log.  And the info that hardware seems fine under windows.  
Digesting..
So far it seems to be connecting then just redoing the group key periodically which may be normal if you have a key rotation enabled for the AP.  Without timestamps, it's a bit hard to tell.

And an error on my part: you have good logs here with -dd.  We *may* need to redo this with  -dd -t to get timestamps in the logs to help decipher them with time.  Unless you have nothing better to do... :)

Sorry to miss that.

Comment 7 Eric Griffith 2014-06-03 02:34:27 UTC

Created attachment 901574 [details]
"-dd -t" output of wpa_supplicant

Requested (redone) log file is below, I let it run for about 20 seconds I think. If thats not enough time just give a shout and I can let it run longer again. 

I care much more about getting this resolved than I do having to redo log files, so its not a big deal, John :)

Comment 8 John Greene 2014-06-03 14:29:29 UTC

Looks like the system gets all the way through the EAP 4 way handshake succesfully. 4-WAY-HANDSHAKE -> GROUP-HANDSHAKE -> Completed state. Link is up.

Then
1401761119.836062: wlp2s0: State: GROUP_HANDSHAKE -> COMPLETED
1401761119.836069: wlp2s0: CTRL-EVENT-CONNECTED - Connection to 40:16:7e:59:a7:40 completed [id=0 id_str=]
1401761119.836075: wpa_driver_nl80211_set_operstate: operstate 0->1 (UP)
1401761119.836082: netlink: Operstate: linkmode=-1, operstate=6
1401761119.836136: EAPOL: External notification - portValid=1
1401761119.836144: EAPOL: External notification - EAP success=1
1401761119.836150: EAPOL: SUPP_PAE entering state AUTHENTICATING
1401761119.836157: EAPOL: SUPP_BE entering state SUCCESS
1401761119.836163: EAP: EAP entering state DISABLED
1401761119.836168: EAPOL: SUPP_PAE entering state AUTHENTICATED
1401761119.836174: EAPOL: Supplicant port status: Authorized
1401761119.836202: EAPOL: SUPP_BE entering state IDLE
1401761119.836208: EAPOL authentication completed successfully
1401761119.836234: RTM_NEWLINK: operstate=1 ifi_flags=0x11043 ([UP][RUNNING][LOWER_UP])
1401761119.836241: RTM_NEWLINK, IFLA_IFNAME: Interface 'wlp2s0' added
1401761119.836257: nl80211: if_removed already cleared - ignore event

/// LOOKS HAPPY Till here then...
1401761119.840911: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/1
// Unless this is a normal key rotation interval, not sure why we'd go back to
// RX EAPOL frames this quickly: just got done and this is less than a second // later.
1401761120.837078: wlp2s0: RX EAPOL from 40:16:7e:59:a7:40

The system just keeps redoing the key phase and as a consequence, timeout is likely the result.

This cycle repeats over and over.

dbus: flush_object_timeout_handler: Timeout ...
not sure what this is doing, but some sort of timeout is engaging that appears out of sorts.
Some sort of network mananger -> supplicant issue? Not my best space here..so I'll reassign to netmanager/wpa_supplicant folks to assess. If I can help further, please LMK.

You can try a WEP AP setup to validate the hardware/driver isn't the problem (WEP doesn't use the supplicant). But at this point I think the issue is above driver level.

Comment 9 Eric Griffith 2014-06-04 20:46:03 UTC

I'm testing out various network settings to see what works and what doesn't.

WPA2-Personal. Key lasts 3600secs: Current situation. Key exchange is messed up.

WPA2-Personal. Key lasts 0secs: About to test. 

Open system. ath9k hwcrypt=1: This is interesting. Open system I stay connected but something is throttling my network connection. My card is getting 6.5 Mb/s. My laptop next to it, on the same network, is getting 144.5 Mb/s. (Speeds come via iwconfig, also tested by trying to load some websites... im gonna say 6.5 Mb/s is accurate.)

Open system. ath9k hwcrypt=0: Constantly disconnects, LIKE the current situation.

WEP: Cannot test. Router depreciated WEP. 



If my problem with the key exchange is higher in the stack, what are the chances that the throttling is also higher in the stack vs in-driver?

Comment 10 John Greene 2014-06-05 13:24:49 UTC

Open systems does no keys at all. wpa_supplicant not involved nor is hw encyption.  This data point helps though.

On the AP, try turning of key rotation on WPA/WPA2 if its an option.  Then what do you get?

As to speed issue, nothing should be throttling it at this point.  Let's defer that for now, focus on the connection stability first.

Comment 11 Eric Griffith 2014-06-09 07:05:51 UTC

A thousand apologies John, I haven't been home much lately. I will post the necessary info in the morning. Hopefully that will bring us one step closer to resolving this. Once more, very sorry

Comment 12 Dan Williams 2014-06-09 19:40:50 UTC

(In reply to John Greene from comment #8)
> dbus: flush_object_timeout_handler: Timeout ... 
> not sure what this is doing, but some sort of timeout is engaging that
> appears out of sorts.
> Some sort of network mananger -> supplicant issue?   Not my best space
> here..so I'll reassign to netmanager/wpa_supplicant folks to assess. If I
> can help further, please LMK.

This is just an internal timeout for coalescing property change notifications and doesn't affect actual operation.  It's only a debug message.

The issue would most likely be in the kernel or in the supplicant.

Comment 13 Dan Williams 2014-06-09 19:49:31 UTC

Some stuff that looks odd to me:

1401761118.757169: wlp2s0: RX EAPOL from 40:16:7e:59:a7:40
1401761118.757211: RX EAPOL - hexdump(len=121): <stuff>
1401761118.757256: wlp2s0: Not associated - Delay processing of received EAPOL frame (state=ASSOCIATING bssid=00:00:00:00:00:00)
<short time passes>
1401761118.760649: nl80211: MLME event 38
1401761118.760668: nl80211: MLME event frame - hexdump(len=145): <stuff>
1401761118.760722: nl80211: Associate event
1401761118.760741: wlp2s0: Event ASSOC (0) received

The supplicant is missing the first few RX EAPOL frames because mac80211 or the driver isn't passing up the ASSOC event before it allows RX through? Curious, but it ultimately doesn't matter because another RX EAPOL comes through after the ASSOC event at 1401761118.761591.

But it looks like the real problem is that that AP's WPA rekey time is like 2 seconds???? Every two seconds another RX EAPOL frame comes in and the supplicant has to process a rekey. The AP is clearly processing the rekeys correctly, so it clearly can hear the STA, but it still requests a continuous rekey.

This continues until 1401761125.859718 when the supplicant sends a rekey response, but something doesn't get through and the AP drops the client because it couldn't complete rekey within the timeout.

So I think that's the basic problem: something in the supplicant or driver stack isn't fast enough.

But the real question is, why is the AP requesting rekeys every 2 seconds? That's the actual problem.

Comment 14 Eric Griffith 2014-06-09 19:59:50 UTC

Created attachment 906124 [details]
"-dd -t" Output of wpa_supplicant with a Network Key Rotation Time of 0

Time on this one was a bit longer, near the end you might see some ethernet / tethering events because I plugged my phone in before I killed wifi.

Comment 15 Dan Williams 2014-06-10 14:26:28 UTC

Does the network intentionally use a short key rotation time?

Comment 16 Eric Griffith 2014-06-10 17:30:48 UTC

My router lists the adjustable Network Key Rotation Time as 3600seconds-- one hour-- by default, and it has only had 2 settings applied: 3600seconds, and 0 seconds (for testing purposes). Both settings result in the above situation.  

No other clients are having problems with connecting, whether they be iOS, Android, Windows 7, or Fedora (Intel wireless on that machine). It is only that specific machine under Linux-- under Windows it's fine.

Comment 17 Dan Williams 2014-06-13 19:54:22 UTC

When setting it to 3600 seconds, do you reboot the router to make sure it starts using the new interval?

Comment 18 Eric Griffith 2014-06-13 21:07:16 UTC

2 points on your question...

1) The answer: As soon as I hit apply the router pops up a page saying it is detecting a reboot-requiring change and is automatically rebooting to make it take effect. So -personally-, no I did not reboot the router myself, I've assumed that it actually IS rebooting like its saying it is. Though, given that all wireless clients disconnect during the time it says its rebooting, I'm tempted to believe that it did infact reboot.

2) Its been set to 3600 seconds from the beginning and I have only, momentarily and by request set it to 0 seconds. Whether its set to 3600 seconds or 0 seconds doesn't seem to change anything.

The only way I've been able to stay connected (excluding tethering) is by setting my wireless network to open, but then something in the stack is throttling my connection down to 6.5Mb/s. If the card has to process even a SINGLE key request.. everythings crap.

To clarify: The router is supposed to be sending out rekey requests every hour. For whatever reason the router seems to be sending out multiple rekeys initially, OR the networking stack is getting one rekey request and processing it multiple times. (Those are strictly my thoughts at this point).

The problem is.. if the router is sending out multiple rekeys, why isn't anyone else on the network affected? Is there a part of the spec that says "If the client receives multiple rekeys within xyz seconds, they should be disregarded?" that isn't implemented on the Atheros driver?

It seems odd that an equally updated Fedora 20 x64 machine on the same network, but with an Intel wireless card, isn't having problems if this is something router-side.

Comment 19 Eric Griffith 2014-06-18 05:42:58 UTC

Due to me having to RMA this particular computer's RAM, I will not be able to provide any further logs or updates until I receive the new modules. 

This bug report's been quiet for a few days, if you guys have just been busy thats totally fine, just don't want to see this forgotten about. Even if the answer is "We don't have a damn clue why this is happening", it is at least AN answer.

Comment 20 John Greene 2014-06-18 13:36:04 UTC

Eric,

You make so good points in comment 18.
When you get your system restored, can you upload timestamped supplicant log with the Intel card that is working ok?  It would interesting to see the rekey interval being so short and compare that with the ath one.  I'm wondering a couple things:

>2) Its been set to 3600 seconds from the beginning and I have only, >momentarily >and by request set it to 0 seconds. Whether its set to 3600 >seconds or 0 >seconds doesn't seem to change anything. 

Is AP firmware up to date?  If setting rekey interval to 3600 generated rekeys at same 1 or 2 second rate (please confirm my understand that right).  If the setting doesn't change the observed rate in the log, that's on  the AP. Might be an easy fix for you. 

>The problem is.. if the router is sending out multiple rekeys, why isn't anyone >else on the network affected? Is there a part of the spec that says "If the >client receives multiple rekeys within xyz seconds, they should be >disregarded?" that isn't implemented on the Atheros driver? 

A guess: give the delayed processing of the EAPOL frame noted in comment 13 it kinda begs the question if there is other delay in the process.  It's not likely as the rekeys come after associated and port is open.  But if Intel works.. something is different with the driver, not sure what.

Comment 21 John Greene 2014-06-18 14:18:59 UTC

One further question if Intel works and atheros doesn't..and throughput is slow.  Have / Can you tried a different card?  Might be bad card.

Comment 22 Eric Griffith 2014-06-18 16:48:24 UTC

Just so we're all on the same page to avoid any confusion. 

Desktop system: Atheros PCI-E card (http://www.amazon.com/dp/B007GMPZ0A/ref=wl_it_dp_o_pC_nS_ttl?_encoding=UTF8&colid=2TNVXPD6PTYFU&coliid=IAPP9VDETHW07&psc=1)

Laptop system: Network controller: Intel Corporation Centrino Advanced-N 6230 [Rainbow Peak] (rev 34)

Both are / are capable of Fedora 20 x64. The laptop is the one thats working totally fine, so I can throw the timestamped wpa-supplicant log from that system in a little bit. 

The desktop system is dualbooted between Windows 7 x64 and Fedora 20 x64. Under Windows 7 the card is totally fine with perfect speeds and the best connectivity in the entire house. The card is ONLY buggy as far as connection speed & connectivity when I am booted into Fedora. 

So the odds of it being buggy hardware.... eeeeeeh. Possible? Sure, if the driver is hitting different codepaths, but it seems unlikely, no?

Will post back in a bit with the answer to the other questions / logs.

Comment 23 Eric Griffith 2014-06-18 17:22:20 UTC

Created attachment 910108 [details]
INTEL: "-dd -t" output of wpa_supplicant, with a Network Key Rotation Time of 3600

The attachment above is from the Intel based laptop. Its wifi is working perfectly.

Comment 24 Eric Griffith 2014-06-18 17:24:58 UTC

Router is an ASUS-RTN66U with firmware version: 3.0.0.4.374_5517-g302e4dc. According to the router, that firmware is the latesta available version.

Comment 25 Eric Griffith 2014-06-18 17:43:32 UTC

As far as trying a different card... I have a small USB wireless-N adapter that i can plug into the desktop-- once I get the new RAM. Last time I checked the adapter was working "fine." The chipset in the mini USB adapter is a Ralink chipset.


Just to recap... 


Desktop system: 

--->Ethernet: Perfectly working

--->Tethering over USB: Pefectly working

--->Mini-USB Receiver (This is IIRC, I can retest once I get the new RAM)

------> Windows: Doesn't always connect on boot, but ONCE connected: works fine.

------> Fedora: Doesn't always connect on boot, but ONCE connected: works fine.

--->Atheros PCIE Card: 

------> Windows 7: Works beyond expectations. 

------> Fedora Can't get/stay connected, and when it can... throttled.


Laptop System: 

---> Ethernet: N/A

---> Tethered over USB: Perfectly working

---> Mini USB Receiver:

------> Windows: Doesn't always connect on boot, but ONCE connected: works fine.

------> Fedora: Doesn't always connect on boot, but ONCE connected: works fine.

---> Internal Wireless (Intel) Card: 

------> Windows: Works perfectly

------> Fedora: Works perfectly. 

Systems are on the same networks, in the same rooms.

Comment 26 Eric Griffith 2014-07-09 20:18:53 UTC

I'm back in town from vacation, new RAM is installed on the desktop so I'm back in business to debug this further. 

Given the comments I posted above and the info I provided in the Intel log file-- any thoughts?

Comment 27 John Greene 2014-07-10 13:06:20 UTC

Eric, 
Hope vacation was fun.  I'm envious.

The intel log is perfect, as it should be.

In review, it appears that the failing device gets associated, and then has either 4 way handshake failure: it occurs to me it isn't even starting the 4 way handshake: it starts it but shuts it down with error code before anything in the handshake even occurs.


May 28 16:31:01 erics-desktop NetworkManager[690]: <info> (wlp2s0): supplicant interface state: authenticating -> associating
May 28 16:31:01 erics-desktop kernel: wlp2s0: associate with 40:16:7e:59:a7:44 (try 1/3)
May 28 16:31:02 erics-desktop kernel: wlp2s0: associate with 40:16:7e:59:a7:44 (try 2/3)
May 28 16:31:03 erics-desktop kernel: wlp2s0: associate with 40:16:7e:59:a7:44 (try 3/3)
May 28 16:31:03 erics-desktop kernel: wlp2s0: RX AssocResp from 40:16:7e:59:a7:44 (capab=0x11 status=0 aid=2)
May 28 16:31:03 erics-desktop kernel: wlp2s0: associated

>>>ASSOCIATION to start 4 way.
May 28 16:31:03 erics-desktop NetworkManager[690]: <info> (wlp2s0): supplicant interface state: associating -> 4-way handshake
May 28 16:31:04 erics-desktop NetworkManager[690]: <info> (wlp2s0): supplicant interface state: 4-way handshake -> completed
<<<4 way done, should have 4 parts, but we don't even do that: WHY?

May 28 16:31:04 erics-desktop NetworkManager[690]: <info> (wlp2s0): roamed from BSSID (none) ((none)) to 40:16:7E:59:A7:44 (WhoKnows-5Ghz)
May 28 16:31:11 erics-desktop kernel: wlp2s0: deauthenticated from 40:16:7e:59:a7:44 (Reason: 15)

May 28 16:31:11 erics-desktop NetworkManager[690]: <warn> Connection disconnected (reason 15)

Not sure what occurs with that. Smacks of a either a frame delayed as Dan noted in Comment 13.  I'll look upstream.

Have you tried different kernels?  Is this a regression? Did it work at one point and an upgrade killed something?

Comment 28 Eric Griffith 2014-07-16 04:57:18 UTC

I love how prompt Comcast is to fix an area-wide service outage... NOT. -.- (On again / off again over the last week...)


As far as I know, this has "always been." I'd be happy to test a few kernels now that I have internet access restored but I can't go too far back as my APU is an AMD Kaveri, so too far back and I won't even have a working desktop to test -with.- 

How far back does Fedora keep kernels in-repo? I can always just snag the earliest available kernel and do a 3.x -> 3.x+1 over and over and see if / where it starts working.

Other than kernels, is there anything else I can help test? Can't use the Intel system as its currently of of commission, so don't ask for any logs off that.

Comment 29 Eric Griffith 2014-07-29 00:59:09 UTC

Alright so I finally had a day to sit down and test various kernels for a couple hours. I grabbed a few kernels from every major kernel series (3.12.x, 3.13.x, 3.14.x, 3.15.x, 3.16.x)

And this is what I got so far... 

3.11: Kaveri support shot. Can't hit desktop. Calling it an N/A.

3.12.x: No change, still broken wireless

3.13.x: No change, still broken wireless.

3.14.x: No change, still broken wireless.

3.15.x: No change, still broken wireless.


None of them are particularly better or worse than any others, they all still fail in seemingly the same ways. I can provide logs if you really need them, 

Any luck with looking upstream, John?

Comment 30 Sujith 2014-10-28 03:54:04 UTC

(In reply to Eric Griffith from comment #29)
> Alright so I finally had a day to sit down and test various kernels for a
> couple hours. I grabbed a few kernels from every major kernel series
> (3.12.x, 3.13.x, 3.14.x, 3.15.x, 3.16.x)
> 
> None of them are particularly better or worse than any others, they all
> still fail in seemingly the same ways. I can provide logs if you really need
> them. 

Can you try with the latest kernel ?

These options need to be enabled:

CONFIG_ATH_DEBUG=y
CONFIG_ATH9K_DEBUGFS=y

Please load the ath9k driver with the module param "debug=0xf01" and post the kernel log.

Also, the output of these files would be good to have:

/sys/kernel/debug/ieee80211/phy*/ath9k/base_eeprom
/sys/kernel/debug/ieee80211/phy*/ath9k/modal_eeprom
/sys/kernel/debug/ieee80211/phy*/ath9k/interrupt
/sys/kernel/debug/ieee80211/phy*/ath9k/recv
/sys/kernel/debug/ieee80211/phy*/ath9k/phy_err
/sys/kernel/debug/ieee80211/phy*/ath9k/xmit
/sys/kernel/debug/ieee80211/phy*/ath9k/reset
/sys/kernel/debug/ieee80211/phy*/ath9k/misc

Comment 31 Eric Griffith 2014-12-13 03:47:22 UTC

Apologies for being quiet on this bug. Midterms, Thanksgiving and Finals all hit pretty much back to back for me so it was kind of a mess and I didn't have time to be able to mess with my desktop for free of breaking it. Now that I'm on break I can give it a go and we'll see what happens, im gonna be switching over to F21 probably this weekend anyway so that'll get me latest.

Comment 32 Eric Griffith 2015-02-14 16:41:58 UTC

So after quite a bit of fighting with this over the last month and a half I think I figured out whats going on... Faulty antennae. In moving into my apartment I had to disassemble my desktop, in the process of doing so I noticed one of the antennae's didn't and would not screw down all the way. Swapped antennae's around and no matter which of the three I used it would always fail on that slot. 

I left that antennae unscrewed, rebooted into Fedora... lo and behold it worked perfectly fine. Now, the question is, why does Windows work with the broken antennae in but Linux does not? Does the Linux stack not have fault tolerance? Can it not test & mark antennae's before using them? Why does Windows just work around the problem mean while Linux just keeps trying something that doesn't work?

Comment 33 John Greene 2015-02-16 14:25:38 UTC

> I left that antennae unscrewed, rebooted into Fedora... lo and behold it
> worked perfectly fine. Now, the question is, why does Windows work with the
> broken antennae in but Linux does not? Does the Linux stack not have fault
> tolerance? Can it not test & mark antennae's before using them? Why does
> Windows just work around the problem mean while Linux just keeps trying
> something that doesn't work?

Nice bit of detective work.  I don't know the exact details of this implementation but the one's I've dealt with often support a number of ways to do antenna management, but it's often left up to the firmware to control with no exposed API to driver to make changes to how that's done.  A quick look at your driver cmd parameters with modinfo ath9k show these parameters for you:

parm:           debug:Debugging mask (uint)
parm:           nohwcrypt:Disable hardware encryption (int)
parm:           blink:Enable LED blink on activity (int)
parm:           btcoex_enable:Enable wifi-BT coexistence (int)
parm:           bt_ant_diversity:Enable WLAN/BT RX antenna diversity (int)
parm:           ps_enable:Enable WLAN PowerSave (int)
parm:           use_chanctx:Enable channel context for concurrency (int)

*If* it's just an antenna diversity problem (it might be something else) changing the bt_ant_diversity cmd  line variable may help.  Bluetooth, if supported and enabled, on your machine, will time share the antenna with wifi.
If you need/use bt this doesn't appear to be something doable (see below). 
You'd have to turn off blue tooth coexistence with  btcoex=0.  Then add bt_ant_diversity=1 to enable the driver or firmware to -hopefully- work around a bad antenna, without making your range more limited.  


  I don't think you'll damage anything, but you might also get things in a worse state than now too.

Here's a quick cod note I saw in regards to this: there are a number of gotcha's here: does your machine support this in it's antenna design?  I dunno.

        /*
         * Enable WLAN/BT RX Antenna diversity only when:
         *
         * - BTCOEX is disabled.
         * - the user manually requests the feature.
         * - the HW cap is set using the platform data.
         */
But you seem to want to play with this stuff. It appears to be a bit of a swamp, but might give your a workaround also.  Good luck.

Comment 34 Fedora End Of Life 2015-05-29 11:58:18 UTC

This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 35 Fedora End Of Life 2015-06-29 20:51:14 UTC

Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.