Bug 893580 - WiFi network connection keeps seeing old APs on resume from suspend (F-17 -> F-18 regression)
Summary: WiFi network connection keeps seeing old APs on resume from suspend (F-17 -> ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 18
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Stanislaw Gruszka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 902664 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-09 14:03 UTC by Milan Bouchet-Valat
Modified: 2013-03-26 11:56 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-03-26 11:56:38 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
/Var/log/messages after resume (17.89 KB, text/plain)
2013-01-11 09:47 UTC, Milan Bouchet-Valat
no flags Details
dmesg after resume (8.68 KB, text/plain)
2013-01-11 09:48 UTC, Milan Bouchet-Valat
no flags Details
Output of "nmcli dev wifi list" before and after suspend (46.88 KB, application/octet-stream)
2013-01-11 09:49 UTC, Milan Bouchet-Valat
no flags Details
Relevant output of journalctl (24.21 KB, text/plain)
2013-02-11 09:47 UTC, Milan Bouchet-Valat
no flags Details

Description Milan Bouchet-Valat 2013-01-09 14:03:42 UTC
I'm not sure that's a NetworkManager bug since I'm aware of [1] (and especially of the final remark about Broadcom), but...

Since I upgraded to F18, my wireless connection does not drop after suspending: when I resume, I'm already connected, while in F17 I had to reconnect. This is probably a feature, not a bug -- but I'm not completely sure. The problem is, if I changed place during suspend, the old connection is no longer valid. It takes some time for the system to realize that I'm no longer connected to the original AP. Sometimes I even need to select the new AP manually.

/var/log/messages does not contain anything about WiFi or network on resume, which is consistent with the fact that the system considers the connection is still alive.

I don't know what information I should provide, so please ask if you need more testing. If it's a driver bug, I'll be happy to report it to Broadcom (they are making efforts, as the driver is now in-tree! ;-).


Kernel is 3.6.11-3, module brcmsmac.
NetworkManager is 0.9.7.0-12.git20121004.
Wireless card is a Broadcom Corporation BCM4313 [14e4:4727] (rev 01).


1: http://blogs.gnome.org/dcbw/2009/02/26/suspendresume-vs-networkmanager/

Comment 1 Milan Bouchet-Valat 2013-01-09 14:34:13 UTC
FWIW, it also happens with kernel 3.7.1-2.

Comment 2 Jirka Klimes 2013-01-10 10:05:40 UTC
Please attach /var/log/messages so that we can see what happens on suspend/resume. Output of 'dmesg' will help to determine possible driver problems.

And get available APs known to NM (both before sleep and after and at a different place):
$ nmcli dev wifi list

Comment 3 Milan Bouchet-Valat 2013-01-11 09:47:36 UTC
Created attachment 676743 [details]
/Var/log/messages after resume

To reproduce the bug, I simply turned off my WiFi AP during suspend, and resumed the machine. NUMERICABLE-760E is the AP that should not be visible after resume: the machine "disconnects" (detects the AP is gone) about one minute after resuming (delay varies).

Comment 4 Milan Bouchet-Valat 2013-01-11 09:48:11 UTC
Created attachment 676744 [details]
dmesg after resume

Comment 5 Milan Bouchet-Valat 2013-01-11 09:49:20 UTC
Created attachment 676745 [details]
Output of "nmcli dev wifi list" before and after suspend

See comments inline: there are three points before, after suspend, and after the AP vanishes.

Comment 6 Stanislaw Gruszka 2013-01-29 14:00:07 UTC
*** Bug 902664 has been marked as a duplicate of this bug. ***

Comment 7 Stanislaw Gruszka 2013-01-29 14:09:51 UTC
I'm able to reproduce this problem locally. Looks like we Auto-activete connection again and again, where as we should just stop after first failure. I think I could cook a patch for this problem ...

Comment 8 Stanislaw Gruszka 2013-02-04 12:51:42 UTC
That's odd, I'm not able to reproduce the problem any longer. Due to some other changes I broke gdm somehow and was not able to login into graphical environment on my F-18 installation. I reinstalled the system and problem gone :-/ I'm testing that by going to suspend and then turning off AP.

Looking at log from comment 3 there is:

nm-dispatcher.action: Script '/etc/NetworkManager/dispatcher.d/10-sendmail' took too long; killing it.

perhaps this cause troubles, if you do 

systemctl disable sendmail.service

does that prevent the issue ?

Comment 9 Pierre Ossman 2013-02-04 17:40:39 UTC
Not here at least. I also didn't have any complaints from sendmail in my log. In fact, this is all I get when grepping for NetworkManager:

Feb  4 18:35:06 mjolnir NetworkManager[673]: <info> (wlan1): supplicant interface state: completed -> disconnected
Feb  4 18:35:06 mjolnir NetworkManager[673]: <info> (wlan1): supplicant interface state: disconnected -> scanning
Feb  4 18:35:22 mjolnir NetworkManager[673]: <warn> (wlan1): link timed out.
Feb  4 18:35:22 mjolnir NetworkManager[673]: <info> (wlan1): device state change: activated -> failed (reason 'SSID not found') [100 120 53]
...

Comment 10 Milan Bouchet-Valat 2013-02-11 09:47:06 UTC
Created attachment 695984 [details]
Relevant output of journalctl

Here sendmail.service was in Failed state, I disabled it and maybe it made things quicker, I'm not sure at all.


In the attached log, I resumed the system in a different place at 10:17:44. You can see that the brcmsmac driver noticed the AP was gone immediately. What is weird is that it seems the driver then believes it is associated again, and disassociated once more. It looks like the system is "restarting tasks" twice, which is weird...

All of this happens in 1 second only, so it does not generate the delay, but maybe it confuses NetworkManager?

févr. 11 10:17:44 milan kernel: ieee80211 phy0: brcms_ops_config: change power-save mode: false (implement)
févr. 11 10:17:44 milan kernel: ieee80211 phy0: brcmsmac: brcms_ops_bss_info_changed: associated
févr. 11 10:17:44 milan kernel: ieee80211 phy0: changing basic rates failed: -22
févr. 11 10:17:44 milan kernel: ieee80211 phy0: brcms_ops_bss_info_changed: cqm change: threshold 0, hys 0  (implement)
févr. 11 10:17:44 milan kernel: ieee80211 phy0: brcms_ops_bss_info_changed: arp filtering: enabled true, count 1 (implement)
févr. 11 10:17:44 milan kernel: ieee80211 phy0: brcms_ops_bss_info_changed: qos enabled: true (implement)
févr. 11 10:17:44 milan kernel: PM: resume of devices complete after 2710.517 msecs
févr. 11 10:17:44 milan kernel: PM: Finishing wakeup.
févr. 11 10:17:44 milan kernel: Restarting tasks ... done.
févr. 11 10:17:44 milan kernel: video LNXVIDEO:00: Restoring backlight state
févr. 11 10:17:44 milan kernel: ieee80211 phy0: brcmsmac: brcms_ops_bss_info_changed: disassociated
févr. 11 10:17:44 milan kernel: ieee80211 phy0: brcms_ops_bss_info_changed: arp filtering: enabled false, count 1 (implement)
févr. 11 10:17:44 milan kernel: ieee80211 phy0: brcms_ops_bss_info_changed: qos enabled: false (implement)
févr. 11 10:17:44 milan kernel: cfg80211: Calling CRDA to update world regulatory domain
févr. 11 10:17:44 milan kernel: cfg80211: World regulatory domain updated:
févr. 11 10:17:44 milan kernel: cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
févr. 11 10:17:45 milan kernel: cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
févr. 11 10:17:45 milan kernel: cfg80211:   (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
févr. 11 10:17:45 milan kernel: cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
févr. 11 10:17:45 milan kernel: cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
févr. 11 10:17:44 milan kernel: sd 0:0:0:0: [sda] Starting disk
févr. 11 10:17:44 milan kernel: Extended CMOS year: 2000
févr. 11 10:17:44 milan kernel: ieee80211 phy0: brcms_ops_config: change power-save mode: false (implement)
févr. 11 10:17:44 milan kernel: ieee80211 phy0: brcmsmac: brcms_ops_bss_info_changed: associated
févr. 11 10:17:44 milan kernel: ieee80211 phy0: changing basic rates failed: -22
févr. 11 10:17:44 milan kernel: ieee80211 phy0: brcms_ops_bss_info_changed: cqm change: threshold 0, hys 0  (implement)
févr. 11 10:17:44 milan kernel: ieee80211 phy0: brcms_ops_bss_info_changed: arp filtering: enabled true, count 1 (implement)
févr. 11 10:17:44 milan kernel: ieee80211 phy0: brcms_ops_bss_info_changed: qos enabled: true (implement)
févr. 11 10:17:44 milan kernel: PM: resume of devices complete after 2710.517 msecs
févr. 11 10:17:44 milan kernel: PM: Finishing wakeup.
févr. 11 10:17:44 milan kernel: Restarting tasks ... done.
févr. 11 10:17:44 milan kernel: video LNXVIDEO:00: Restoring backlight state
févr. 11 10:17:44 milan kernel: ieee80211 phy0: brcmsmac: brcms_ops_bss_info_changed: disassociated
févr. 11 10:17:44 milan kernel: ieee80211 phy0: brcms_ops_bss_info_changed: arp filtering: enabled false, count 1 (implement)
févr. 11 10:17:44 milan kernel: ieee80211 phy0: brcms_ops_bss_info_changed: qos enabled: false (implement)


This time, the connection to the new AP happened quite quickly, but there still is an unexplained delay (14 seconds):
févr. 11 10:17:59 milan dbus[768]: [system] Activating service name='org.freedesktop.nm_dispatcher' (using servicehelper)
févr. 11 10:17:59 milan NetworkManager[820]: <info> (wlan0): device state change: failed -> disconnected (reason 'none') [120 30 0]
févr. 11 10:17:59 milan NetworkManager[820]: <info> (wlan0): deactivating device (reason 'none') [0]
févr. 11 10:17:59 milan dbus-daemon[768]: dbus[768]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
févr. 11 10:17:59 milan dbus[768]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'

Shouldn't nm-dispatcher be kept alive across suspends?

Comment 11 Stanislaw Gruszka 2013-03-18 12:50:56 UTC
Does it still happen on updated system with:
NetworkManager-0.9.8.0-1.fc18.x86_64
kernel-3.8.3-201.fc18.x86_64

Comment 12 Stanislaw Gruszka 2013-03-18 12:51:52 UTC
I mean, does it still happen for you ? I'm not able to reproduce this problem any longer.

Comment 13 Pierre Ossman 2013-03-18 15:21:42 UTC
Possibly not. At least I haven't had any of the really persistent ghost entries in a while. It does still take a little while before it gets its bearings after resume though.

Comment 14 Stanislaw Gruszka 2013-03-21 12:32:14 UTC
Milan, ping ?

Comment 15 Dan Williams 2013-03-21 19:06:59 UTC
In NM < 0.9.8 in F18, NM wasn't listening to systemd for suspend/resume notifications, which is how the AP list is cleared out.  But with 0.9.8 and later that should be happening.  That change was made in mid-February, so I'd expect that to be the original reason why this bug report was filed.

If this still happens, then look into the journal or /var/log/messages for notes about NM "sleeping" and "waking up".  If you don't see those, then NM isn't properly suspending/resuming.  If you do see those, then the problem is likely in the wifi stack and driver showing old APs from an old scan.

Comment 16 Stanislaw Gruszka 2013-03-26 11:56:38 UTC
Ok, I assume this issue was caused by lack of suspend/resume notification and is now fixed.


Note You need to log in before you can comment on or make changes to this bug.