Bug 1119524 - Hangs without (and even with) hardware crypto
Summary: Hangs without (and even with) hardware crypto
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: wpa_supplicant
Version: 24
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Lubomir Rintel
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-15 01:25 UTC by Stuart D Gathman
Modified: 2017-08-08 11:47 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-08-08 11:47:30 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
sanitized teaser of context around hang (5.56 KB, text/plain)
2015-04-16 03:10 UTC, Stuart D Gathman
no flags Details

Description Stuart D Gathman 2014-07-15 01:25:44 UTC
Description of problem:
When connected to an encrypted network using a wireless card with no hardware crypto,  wpa_supplicant hangs after 100 Megs or so of transfer

Version-Release number of selected component (if applicable):
wpa_supplicant-2.0-8.fc19.i686
kernel-3.14.8-100.fc19.i686
Network controller: Broadcom Corporation BCM4311 802.11a/b/g (rev 01)

How reproducible:
Always

Steps to Reproduce:
1. Use nocrypto wireless, e.g. bcm4311
2. Connect to encrypted network
3. Attempt to download a CD's worth of data

Actual results:
wpa_supplicant hangs.  

Expected results:
wpa_supplicant continues to operate.

Additional info:
A "killall wpa_supplicant" within 30 seconds of it hanging (because I'm watching gkrellm during my download), will restart it without losing the session.  This clearly marks this as a software problem.  Other indications:

1.  It doesn't happen on unencrypted connections
2.  It only happens when the wireless driver reports no hardware crypto

Comment 1 Stuart D Gathman 2014-07-15 01:31:15 UTC
annotated wpa_supplicant.log

wlan0: SME: Trying to authenticate with 00:1d:7e:2f:bd:0e (SSID='Gathman Home' freq=2462 MHz)
wlan0: Trying to associate with 00:1d:7e:2f:bd:0e (SSID='Gathman Home' freq=2462 MHz)
wlan0: Associated with 00:1d:7e:2f:bd:0e
wlan0: WPA: Key negotiation completed with 00:1d:7e:2f:bd:0e [PTK=CCMP GTK=CCMP]
wlan0: CTRL-EVENT-CONNECTED - Connection to 00:1d:7e:2f:bd:0e completed [id=0 id_str=]
************working away***************
dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7
dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7
dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7
.... repeat hundreds of times 
dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7
dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7
*********hangs*************************
************killall wpa_supplicant*****
wlan0: CTRL-EVENT-DISCONNECTED bssid=00:1d:7e:2f:bd:0e reason=3 locally_generated=1
wlan0: CTRL-EVENT-TERMINATING

Successfully initialized wpa_supplicant
dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7
wlan0: SME: Trying to authenticate with 00:1d:7e:2f:bd:0e (SSID='Gathman Home' freq=2462 MHz)
wlan0: Trying to associate with 00:1d:7e:2f:bd:0e (SSID='Gathman Home' freq=2462 MHz) 
wlan0: Associated with 00:1d:7e:2f:bd:0e 
wlan0: WPA: Key negotiation completed with 00:1d:7e:2f:bd:0e [PTK=CCMP GTK=CCMP]
wlan0: CTRL-EVENT-CONNECTED - Connection to 00:1d:7e:2f:bd:0e completed [id=0 id_str=]
dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7
dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7
dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7
************working away again****************

Comment 2 Dan Williams 2014-07-16 16:34:33 UTC
When this happens, could you 'gdb attach <pid>' and then run "backtrace" and attach the backtrace so we can see where it hangs?

Comment 3 Stuart D Gathman 2014-07-16 18:27:37 UTC
Here is a first cut.  I'll see about downloading some debuginfo

(gdb) backtrace
#0  0xb7597424 in __kernel_vsyscall ()
#1  0xb720427d in ___newselect_nocancel () from /lib/libc.so.6
#2  0xb75d22f6 in eloop_run ()
#3  0xb768efe2 in wpa_supplicant_run ()
#4  0xb75c740d in main ()

Comment 4 Stuart D Gathman 2014-07-16 19:27:47 UTC
Hung it again after getting first round of debuginfo installed:

Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.5.9-1.fc19.i686 krb5-libs-1.11.3-21.fc19.i686 libcom_err-1.42.7-2.fc19.i686 libselinux-2.1.13-15.fc19.i686 pcre-8.32-9.fc19.i686 zlib-1.2.7-10.fc19.i686
(gdb) backtrace
#0  0xb7671424 in __kernel_vsyscall ()
#1  0xb72de27d in ___newselect_nocancel ()
    at ../sysdeps/unix/syscall-template.S:81
#2  0xb76ac2f6 in eloop_run () at ../src/utils/eloop.c:736
#3  0xb7768fe2 in wpa_supplicant_run (global=global@entry=0xb878d6f8)
    at wpa_supplicant.c:3322
#4  0xb76a140d in main (argc=11, argv=0xbf971454) at main.c:297

Comment 5 Stuart D Gathman 2014-10-27 20:14:42 UTC
I've actually seen this happen twice *with* hardware crypto.  But it is so rare as to be essentially unreproducable.  This does indicate, however, that it might be a timing problem.  And even though rare on modern Wifi hardware, fixing it will increase reliability.

Comment 6 Fedora End Of Life 2015-01-09 21:27:23 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 7 Stuart D Gathman 2015-01-10 00:20:45 UTC
Still happens in F20 *with* hardware crypto.  However, it takes several days.  (Vs less than an hour without hardware crypto).

Comment 8 Stuart D Gathman 2015-01-10 00:23:49 UTC
I can reproduce by leaving a laptop up for 3 days and 2 nights.  It seems to be related to bytes transferred, rather than elapsed time, however.

Comment 9 Stuart D Gathman 2015-02-03 17:28:46 UTC
Suspending the laptop, moving to another WAP, rebooting - all reset the byte count for when wpa_supplicant is going to fail.  Since people don't run into it very often, maybe an easier fix is a watchdog to just restart wpa_supplicant when no more byte are getting transferred.  That is what I end up doing manually anyway.

Comment 10 Stuart D Gathman 2015-02-19 16:10:01 UTC
Still getting this on F21, *with* hardware crypto.

Comment 11 Stuart D Gathman 2015-03-25 21:05:41 UTC
bug#1192550 makes it difficult to keep my laptop up for the required number of days.  I'll leave the extended logging on, but don't want to leave my screen on all night as it is fluorescent.  So this is not likely to reproduce until the display sleep bug is fixed, or I get a new LED laptop (with SSD!)

Comment 12 Stuart D Gathman 2015-04-16 02:44:29 UTC
Ok, wpa_supplicant hung with hardware crypto, and the extended logging on.  The log file seems to have key info, like:

WPA: TX EAPOL-Key - hexdump(len=99): < hex data >

Any pointers on sanitizing the log?  Or are those only temporary keys?

Comment 13 Stuart D Gathman 2015-04-16 03:10:04 UTC
Created attachment 1015010 [details]
sanitized teaser of context around hang

Comment 14 Stuart D Gathman 2015-04-18 22:17:41 UTC
Got another hang.  

1429394776.993464: IEs - hexdump(len=83): 00 05 4a 5a 36 59 34 01 08 82 84 8b 96 0c 12 18 24 03 01 01 05 04 00 01 00 00 07 06 55 53 20 01 0b 1b 20 01 00 2a 01 00 32 04 30 48 60 6c dd 18 00 50 f2 02 01 01 83 00 03 a4 00 00 27 a4 00 00 42 43 5e 00 62 32 2f 00 dd 09 00 03 7f 01 01 00 20 00 00
1429394776.993544: Beacon IEs - hexdump(len=83): 00 05 4a 5a 36 59 34 01 08 82 84 8b 96 0c 12 18 24 03 01 01 05 04 00 01 00 00 07 06 55 53 20 01 0b 1b 20 01 00 2a 01 00 32 04 30 48 60 6c dd 18 00 50 f2 02 01 01 83 00 03 a4 00 00 27 a4 00 00 42 43 5e 00 62 32 2f 00 dd 09 00 03 7f 01 01 00 20 00 00
1429394776.993608: wlp11s0: BSS: Start scan result update 266
1429394776.993677: BSS: last_scan_res_used=5/32 last_scan_full=0
1429394776.993701: Add randomness: count=1410 entropy=969
1429394776.993725: random pool - hexdump(len=128): [REMOVED]
1429394776.993746: random_mix_pool - hexdump(len=8): [REMOVED]
1429394776.993766: random_mix_pool - hexdump(len=5): [REMOVED]
1429394776.993787: random pool - hexdump(len=128): [REMOVED]
1429394776.993806: Add randomness: count=1411 entropy=970
1429394776.993827: random pool - hexdump(len=128): [REMOVED]
1429394776.993849: random_mix_pool - hexdump(len=8): [REMOVED]
1429394776.993871: random_mix_pool - hexdump(len=5): [REMOVED]
1429394776.993891: random pool - hexdump(len=128): [REMOVED]
1429394776.993911: Add randomness: count=1412 entropy=971
1429394776.993931: random pool - hexdump(len=128): [REMOVED]
1429394776.993952: random_mix_pool - hexdump(len=8): [REMOVED]
1429394776.993974: random_mix_pool - hexdump(len=5): [REMOVED]
1429394776.993996: random pool - hexdump(len=128): [REMOVED]
1429394776.994017: Add randomness: count=1413 entropy=972
1429394776.994037: random pool - hexdump(len=128): [REMOVED]
1429394776.994056: random_mix_pool - hexdump(len=8): [REMOVED]
1429394776.994075: random_mix_pool - hexdump(len=5): [REMOVED]
1429394776.994097: random pool - hexdump(len=128): [REMOVED]
1429394776.994118: Add randomness: count=1414 entropy=973
1429394776.994139: random pool - hexdump(len=128): [REMOVED]
1429394776.994158: random_mix_pool - hexdump(len=8): [REMOVED]
1429394776.994178: random_mix_pool - hexdump(len=5): [REMOVED]
1429394776.994198: random pool - hexdump(len=128): [REMOVED]
1429394776.994221: wlp11s0: New scan results available
1429394776.994335: wlp11s0: Selecting BSS from priority group 0
1429394776.994376: wlp11s0: 0: xx:xx:xx:xx:xx:xx ssid='Gathman Home' wpa_ie_len=0 rsn_ie_len=20 caps=0x411 level=-61
1429394776.994405: wlp11s0:    selected based on RSN IE
1429394776.994432: wlp11s0:    selected BSS xx:xx:xx:xx:xx:xx ssid='Gathman Home'
1429394776.994459: wlp11s0: Checking for other virtual interfaces sharing same radio (phy0) in event_scan_results
1429394776.997434: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/3
1429394776.998769: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/3/BSSs/507
1429394776.999015: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/3/BSSs/508
1429394776.999210: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/3/BSSs/497
********HUNG*********
1429394890.724255: dbus: fi.w1.wpa_supplicant1.Interface.Scan (/fi/w1/wpa_supplicant1/Interfaces/3)

Comment 15 Stuart D Gathman 2015-09-10 20:11:44 UTC
Still a problem in Fedora 22 with wpa_supplicant-2.4-4.fc22.i686

Since wireless connections need to be redone on resume anyway, restarting wpa_supplicant beforehand would go a long way to making the problem unnoticeable.  Trying to figure out how to do that with systemd...

Comment 16 Fedora Admin XMLRPC Client 2015-10-14 14:50:19 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 17 Stuart D Gathman 2016-01-28 15:27:28 UTC
This workaround is painless and mostly avoids the problem on a laptop which gets frequently suspended:

/etc/systemd/system/wpa_supplicant-resume.service
----%<-------%<--------%<---------
[Unit]
Description=Restart wpa_supplicant on resume from sleep
After=sleep.target

[Service]
Type=oneshot
ExecStart=/usr/bin/killall wpa_supplicant

[Install]
WantedBy=sleep.target
----%<-------%<--------%<---------


# systemctl enable wpa_supplicant-resume

Comment 18 Stuart D Gathman 2016-01-28 15:29:38 UTC
I'm not sure what the outstanding NEEDINFO request is for.  I've provided all the logs and stuff asked for.

Comment 19 Stuart D Gathman 2016-06-28 23:03:35 UTC
Still broken on F24.
wpa_supplicant-2.5-5.fc24.x86_64

Comment 20 Stuart D Gathman 2016-08-20 03:22:44 UTC
On F24, killing wpa_supplicant does not automatically restart it.  You have to use "systemctl restart wpa_supplicant".

Comment 21 Stuart D Gathman 2016-08-20 03:26:42 UTC
This problem persists on a variety of laptop models and wifi chipsets.  Setting wpa_supplicant to restart on resume from suspend is the best way to make the problem mostly invisible for chipsets with hardware encryption (all modern ones).  I turn that off for new Fedora versions to check whether the problem has been fixed. 

For chipsets without hardware assist, wpa_supplicant dies much more quickly.

Comment 22 Stuart D Gathman 2016-08-24 14:38:33 UTC
This problem also manifests on marginal connections.  This might be an easier way to reproduce.  When the signal level is at something like 30%, wpa_supplicant will hang in less than an hour.

Comment 23 Fedora End Of Life 2017-07-25 18:41:00 UTC
This message is a reminder that Fedora 24 is nearing its end of life.
Approximately 2 (two) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 24. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '24'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 24 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 24 Stuart D Gathman 2017-07-25 19:44:19 UTC
Looks like this will never get fixed.  At least I have a decent workaround.

Comment 25 Fedora End Of Life 2017-08-08 11:47:30 UTC
Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.