Bug 519158

Summary: ath5k: wireless scanning after suspend/resume works fine going between some networks but not others
Product: [Fedora] Fedora Reporter: Kieran Clancy <clancy.kieran+redhat>
Component: kernelAssignee: John W. Linville <linville>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 11CC: itamar, kernel-maint, linville, mcgrof, me
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-02-22 19:10:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg boot log from work
none
dmesg boot log from home none

Description Kieran Clancy 2009-08-25 13:14:21 UTC
Description of problem:
I have two wireless networks; one at home (WPA2) and one at work (web login based, no encryption). Wireless works well at either network. If I suspend/resume at home everything is fine. If I suspend/resume at work it's also fine. If I suspend at home and resume at work it will find the work wireless network and that will work too. The problem is after I have associated with my work's wireless network, wireless doesn't turn back on after coming home.

Version-Release number of selected component (if applicable):
kernel-2.6.29.6-217.2.8.fc11.x86_64

How reproducible:
Has never worked despite going from work to home and trying for several weeks.

Steps to Reproduce:
1. Associate with work wireless
2. Suspend
3. Resume at home
4. Wait for wireless connection
  
Actual results:
No wireless connnection. dmesg reports only:
ADDRCONF(NETDEV_UP): wlan0: link is not ready

Expected results:
Wireless connection resumes at home.

Please let me know what I can test to sort this problem out. It's really frustrating because every day I come home and have to close all my documents and restart the laptop.

Additional info:
02:00.0 Ethernet controller: Atheros Communications Inc. AR242x 802.11abg Wireless PCI Express Adapter (rev 01)
        Subsystem: Hewlett-Packard Company Device 137b
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at d3500000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [40] Power Management version 2
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
        Capabilities: [60] Express Legacy Endpoint, MSI 00
        Capabilities: [90] MSI-X: Enable- Count=1 Masked-
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel <?>
        Kernel driver in use: ath5k
        Kernel modules: ath5k

Comment 1 John W. Linville 2009-08-25 15:05:58 UTC
Wow...wierd...

Can you replicate with a 2.6.30-based kernel?

   yum --enablerepo=updates-testing update kernel

Comment 2 Bob Copeland 2009-08-25 15:31:54 UTC
When it fails, do you see NetworkManager trying to reconnect and not succeeding, or does the hardware not show up at all?  If the hardware completely fails to resume, I would expect something more in dmesg.

Is your home network using CCMP/AES, or TKIP?  Can you post the line from dmesg containing "Atheros ARXXXX chip found (MAC:...".  If you have a /var/log/wpa_supplicant.log from when it isn't working, that would be useful too.

Other things to try in addition to John's request:
 - try ifconfig wlan0 down; ifconfig wlan0 up
 - try a second suspend/resume cycle to see if that corrects the issue
 - start from home, suspend at home for 30 minutes or so, then resume to see
   if it breaks

Comment 3 Kieran Clancy 2009-08-25 23:53:06 UTC
Hi guys, thanks for the suggestions.

@John: Yes, I have the same problem with 2.6.30.5-28

@Bob:
I think the hardware is still working, because I can go back to work and it will find the network there again.

NetworkManager doesn't try and fail to connect to the network because it never finds the AP by scanning. That seems to be the problem; if I run `iwlist wlan0 scanning` at home it doesn't find anything.

I'm not sure what my home network is using regarding TKIP/CCMP - how would I find out? Later today I will do some experiments to see if I can find my home network after configuring the access point to not use encryption etc.

ath5k phy0: Atheros AR2425 chip found (MAC: 0xe2, PHY: 0x70)

Here are some lines from my wpa_supplicant.log:
Trying to associate with 00:0f:b5:39:f3:5b (SSID='home' freq=2472 MHz)
Associated with 00:0f:b5:39:f3:5b
WPA: Key negotiation completed with 00:0f:b5:39:f3:5b [PTK=CCMP GTK=TKIP]
CTRL-EVENT-CONNECTED - Connection to 00:0f:b5:39:f3:5b completed (auth) [id=0 id_str=]
CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
Failed to initiate AP scan.
Failed to initiate AP scan.
Trying to associate with 00:14:1c:00:6f:91 (SSID='work' freq=2412 MHz)
Associated with 00:14:1c:00:6f:91
CTRL-EVENT-CONNECTED - Connection to 00:14:1c:00:6f:91 completed (auth) [id=0 id_str=]
CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
... (suspend/resume several times at work)
Failed to initiate AP scan.
Failed to initiate AP scan.
Failed to initiate AP scan.
Failed to initiate AP scan.
Failed to initiate AP scan.
(this is all that is in the log after this).

If I do ifconfig wlan0 down and up again, I just get the following in dmesg:
ADDRCONF(NETDEV_UP): wlan0: link is not ready

Subsequent suspend/resume cycles do not seem to help.

Suspend/resume works fine for long periods at home provided that I don't visit work.

-------------

Final note: I have a suspicion that this is due to my home network being on a higher frequency than the card thinks it can do:
# iwlist wlan0 frequency
wlan0     12 channels in total; available frequencies :
          Channel 01 : 2.412 GHz
          Channel 02 : 2.417 GHz
          Channel 03 : 2.422 GHz
          Channel 04 : 2.427 GHz
          Channel 05 : 2.432 GHz
          Channel 06 : 2.437 GHz
          Channel 07 : 2.442 GHz
          Channel 08 : 2.447 GHz
          Channel 09 : 2.452 GHz
          Channel 10 : 2.457 GHz
          Channel 11 : 2.462 GHz
          Channel 12 : 2.467 GHz
          Current Frequency:2.437 GHz (Channel 6)

My work network is channel 1, but my home network is channel 13 (not listed here). I'll check later to see if it lists channel 13 at home when I haven't been at work.

Comment 4 Bob Copeland 2009-08-26 11:51:39 UTC
That's a very good guess -- chances are the regulatory system has decided from beacons received in your work environment that you are in a regulatory domain where channel 13 is disallowed (e.g. US).  Look in your dmesg for any messages pertaining to CRDA.

I would normally suggest to check the output of "iw phy phy0 info" to confirm, but I believe in 2.6.30 and below, ath5k still exports too many channels for that command to work properly.

Comment 5 Bob Copeland 2009-08-26 12:11:36 UTC
By the way:

$ sudo modprobe -r ath5k ; modprobe ath5k

Should be enough to get it working again, I think.

Comment 6 John W. Linville 2009-08-26 14:24:18 UTC
OK, at least that makes some sense!

I wonder if we need some sort of "regulatory reset" API functionality to drop any regulatory hints from IEs...

Comment 7 John W. Linville 2009-08-26 14:40:38 UTC
BTW, where are you located?  Are you crossing international borders between home and work? :-)  If not, you probably want to get the configuration changed on one of those APs...

Comment 8 John W. Linville 2009-08-26 14:53:36 UTC
Looking at the code in reg.c, it seems this situation has been contemplated but not yet resolved -- adding Luis to Cc: list...

Comment 9 Kieran Clancy 2009-08-26 16:17:54 UTC
Okay, so today I restarted my laptop at home (so no work connections yet):
$ iwlist wlan0 frequency
wlan0     13 channels in total; available frequencies :
          Channel 01 : 2.412 GHz
          Channel 02 : 2.417 GHz
          Channel 03 : 2.422 GHz
          Channel 04 : 2.427 GHz
          Channel 05 : 2.432 GHz
          Channel 06 : 2.437 GHz
          Channel 07 : 2.442 GHz
          Channel 08 : 2.447 GHz
          Channel 09 : 2.452 GHz
          Channel 10 : 2.457 GHz
          Channel 11 : 2.462 GHz
          Channel 12 : 2.467 GHz
          Channel 13 : 2.472 GHz
          Current Frequency:2.472 GHz (Channel 13)

$ iw reg get
country AU:
	(2402 - 2482 @ 40), (N/A, 20)
	(5170 - 5250 @ 40), (3, 23)
	(5250 - 5330 @ 40), (3, 23), DFS
	(5735 - 5835 @ 40), (3, 30)

dmesg shows:
cfg80211: Calling CRDA to update world regulatory domain
cfg80211: Calling CRDA for country: AU
cfg80211: Regulatory domain changed to country: AU
        (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
        (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm)
        (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2300 mBm)
        (5250000 KHz - 5330000 KHz @ 40000 KHz), (300 mBi, 2300 mBm)
        (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 3000 mBm)

I'll need to do some more testing once I can get back to work, but these are the important bits from my messages file when I have connected at work:
cfg80211: Current regulatory domain updated by AP to: AU
   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
   (2402000 KHz - 2477000 KHz @ 40000 KHz), (N/A, 2000 mBm)

Does the AP at work just tell my hardware that it's in AU or does it give it those frequencies too? Because they are different to what they originally started with after being initialised to AU on startup.

If my work AP is saying a maximum of channel 14 (2.477 GHz), why can't I connect to a network on channel 13?

Will test more tomorrow. Thanks again for looking at this.

Comment 10 Luis R. Rodriguez 2009-08-26 21:37:51 UTC
Please install iw:

http://wireless.kernel.org/en/users/Documentation/iw

And then provide the output of:

iw list

From a fresh boot (no suspend) at home, and then at work.

Also attach the full dmesg from both boots.

Comment 11 John W. Linville 2009-08-27 12:55:23 UTC
FWIW, iw is already packaged in Fedora:

   yum install iw

Comment 12 Kieran Clancy 2009-08-27 13:43:54 UTC
Created attachment 358871 [details]
dmesg boot log from work

# iw list
Wiphy phy0
	Band 1:
		Frequencies:
			* 2412 MHz [1] (20.0 dBm)
			* 2417 MHz [2] (20.0 dBm)
			* 2422 MHz [3] (20.0 dBm)
			* 2427 MHz [4] (20.0 dBm)
			* 2432 MHz [5] (20.0 dBm)
			* 2437 MHz [6] (20.0 dBm)
			* 2442 MHz [7] (20.0 dBm)
			* 2447 MHz [8] (20.0 dBm)
			* 2452 MHz [9] (20.0 dBm)
			* 2457 MHz [10] (20.0 dBm)
			* 2462 MHz [11] (20.0 dBm)
			* 2467 MHz [12] (20.0 dBm)
			* 2472 MHz [13] (disabled)
			* 2484 MHz [14] (disabled)
			* 2512 MHz [-498] (disabled)
			* 2532 MHz [-494] (disabled)
			* 2552 MHz [-490] (disabled)
			* 2572 MHz [-486] (disabled)
			* 2592 MHz [-482] (disabled)
			* 2612 MHz [-478] (disabled)
			* 2632 MHz [-474] (disabled)
			* 2652 MHz [-470] (disabled)
			* 2672 MHz [-466] (disabled)
			* 2692 MHz [-462] (disabled)
			* 2712 MHz [-458] (disabled)
			* 2732 MHz [-454] (disabled)
		Bitrates (non-HT):
			* 1.0 Mbps
			* 2.0 Mbps (short preamble supported)
			* 5.5 Mbps (short preamble supported)
			* 11.0 Mbps (short preamble supported)
			* 6.0 Mbps
			* 9.0 Mbps
			* 12.0 Mbps
			* 18.0 Mbps
			* 24.0 Mbps
			* 36.0 Mbps
			* 48.0 Mbps
			* 54.0 Mbps
	max # scan SSIDs: 4
	Supported interface modes:
		 * IBSS
		 * managed
		 * monitor
		 * mesh point
# iw reg get
country 98:
	(2402 - 2477 @ 40), (N/A, 20)

Comment 13 Kieran Clancy 2009-08-27 13:51:58 UTC
Created attachment 358874 [details]
dmesg boot log from home

# iw list
Wiphy phy0
	Band 1:
		Frequencies:
			* 2412 MHz [1] (20.0 dBm)
			* 2417 MHz [2] (20.0 dBm)
			* 2422 MHz [3] (20.0 dBm)
			* 2427 MHz [4] (20.0 dBm)
			* 2432 MHz [5] (20.0 dBm)
			* 2437 MHz [6] (20.0 dBm)
			* 2442 MHz [7] (20.0 dBm)
			* 2447 MHz [8] (20.0 dBm)
			* 2452 MHz [9] (20.0 dBm)
			* 2457 MHz [10] (20.0 dBm)
			* 2462 MHz [11] (20.0 dBm)
			* 2467 MHz [12] (20.0 dBm)
			* 2472 MHz [13] (20.0 dBm)
			* 2484 MHz [14] (disabled)
			* 2512 MHz [-498] (disabled)
			* 2532 MHz [-494] (disabled)
			* 2552 MHz [-490] (disabled)
			* 2572 MHz [-486] (disabled)
			* 2592 MHz [-482] (disabled)
			* 2612 MHz [-478] (disabled)
			* 2632 MHz [-474] (disabled)
			* 2652 MHz [-470] (disabled)
			* 2672 MHz [-466] (disabled)
			* 2692 MHz [-462] (disabled)
			* 2712 MHz [-458] (disabled)
			* 2732 MHz [-454] (disabled)
		Bitrates (non-HT):
			* 1.0 Mbps
			* 2.0 Mbps (short preamble supported)
			* 5.5 Mbps (short preamble supported)
			* 11.0 Mbps (short preamble supported)
			* 6.0 Mbps
			* 9.0 Mbps
			* 12.0 Mbps
			* 18.0 Mbps
			* 24.0 Mbps
			* 36.0 Mbps
			* 48.0 Mbps
			* 54.0 Mbps
	max # scan SSIDs: 4
	Supported interface modes:
		 * IBSS
		 * managed
		 * monitor
		 * mesh point
# iw reg get
country AU:
	(2402 - 2482 @ 40), (N/A, 20)
	(5170 - 5250 @ 40), (3, 23)
	(5250 - 5330 @ 40), (3, 23), DFS
	(5735 - 5835 @ 40), (3, 30)

--------

By the way, doing modprobe -r ath5k; modprobe ath5k did seem to fix it today, but I have tried it before and it didn't work previously; I will try it again over the next few working days and see how reliable it is.

Comment 14 Luis R. Rodriguez 2009-08-27 16:31:06 UTC
What channel is your AP at home and what channel is your AP at work?

Comment 15 Kieran Clancy 2009-08-28 05:47:20 UTC
Work channel is 1, home is 13. I could change the channel at home, but I used 13 because it seems to suffer less interference than lower channels.

Comment 16 John W. Linville 2009-12-21 16:04:17 UTC
Hmmm...I guess we never really resolved this...

Luis, correct me if I'm wrong...the issue here is that regulatory enforcements are all cumulative.  So, once the rules for Keiran's work are applied then it will no longer matter what the rules are for his home w.r.t. channel 13 -- it will already be disabled.  This should be true whether he first boots at work or if he boots at home, suspends at home and resumes at work, then suspends at work and resumes again at home.

Should we have an "iw reg reset" command that drops all current regulatory info and starts over?  Or perhaps we should even just do that by default when resuming?

Comment 17 Luis R. Rodriguez 2009-12-21 17:19:07 UTC
Yeah that's right John, I think its sensible to reset regulatory domain information upon resume by default. Reason for me not implementing this immediately is it a bit more complicated and required some more thought and testing.

A reset to cfg80211's regulatory domain can be done with reset_regdomains(). But we not only have to deal with a central cfg80211 regulatory domain but also the fact that each wiphy can have its own assigned regulatory domain struct, see wiphy->regd. There is also the pre-wiphy registration call wiphy_apply_custom_regulatory() that drivers can use to ensure to modify *_orig parameters. Upon resume we just need to ensure reset the wiphy's non-orig parameters to match the orig parameters, this would have cleared any 11d information and yet respect the driver's own *_orig parameter customizations on pre-wiphy registration. Then we'd apply the wiphy->regd on the wiphy, and last thing would be to apply the cfg80211 regulatory domain in case the user wants to be more compliant. Another thing to consider is the learned beacon hints. If a user was world roaming, got a beacon hint and then enabled AP mode of operation if they then suspended and resumed do we want to ensure they can still use that mode of operation? I think so. We keep beacon hints on a linked list so technically this bank of inmformation would be remembered upon resume, we would just have to ensure to call wiphy_update_beacon_reg() for each already existing wiphy upon resume, but ensure we don't allow TX / RX to kick off at all until all this is done.

Given the above is taken into consideration I think it would work.

Comment 18 Kieran Clancy 2009-12-21 23:10:57 UTC
Can anyone tell me why my work's regulatory domain is even stopping me from connecting to channel 13?

If my work advertises a domain of (2402000 KHz - 2477000 KHz @ 40000 KHz), (N/A, 2000 mBm), then shouldn't I still be able to connect to channel 13 which is 2472 MHz?

Comment 19 John W. Linville 2009-12-22 00:22:15 UTC
The channels are 20MHz wide.  At the center frequency of 2472MHz then you will emit radiation up from 2462MHz to 2482MHz, the latter being higher than the 2477MHz limit specified by the work AP.

Comment 20 Kieran Clancy 2009-12-22 05:31:29 UTC
Thanks John, that makes sense. It's a shame that my work's AP sends the wrong frequency range for an AU domain; it must cause problems for other people too.

By the way, now that I've found the `modprobe -r ath5k && modprobe ath5k` command works reliably, I made a setuid program which runs those and I just execute that whenever I come home from work. So while it'd still be good to fix this, it's no longer really affecting me badly.

Thanks to everyone else that has helped with this issue too.

Comment 21 John W. Linville 2010-02-22 19:10:53 UTC
The following upstream comment addresses this issue -- unfortunately, we won't see it in Fedora until at least F13...

commit 09d989d179d0c679043556dda77c51b41a2dae7e
Author: Luis R. Rodriguez <lrodriguez>
Date:   Fri Jan 29 19:58:57 2010 -0500

    cfg80211: add regulatory hint disconnect support
    
    This adds a new regulatory hint to be used when we know all
    devices have been disconnected and idle. This can happen
    when we suspend, for instance. When we disconnect we can
    no longer assume the same regulatory rules learned from
    a country IE or beacon hints are applicable so restore
    regulatory settings to an initial state.
    
    Since driver hints are cached on the wiphy that called
    the hint, those hints are not reproduced onto cfg80211
    as the wiphy will respect its own wiphy->regd regardless.
    
    Signed-off-by: Luis R. Rodriguez <lrodriguez>
    Signed-off-by: John W. Linville <linville>