Bug 1119524
Summary: | Hangs without (and even with) hardware crypto | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Stuart D Gathman <stuart> | ||||
Component: | wpa_supplicant | Assignee: | Lubomir Rintel <lkundrak> | ||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 24 | CC: | dcbw, dpal, stuart | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-08-08 11:47:30 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Stuart D Gathman
2014-07-15 01:25:44 UTC
annotated wpa_supplicant.log wlan0: SME: Trying to authenticate with 00:1d:7e:2f:bd:0e (SSID='Gathman Home' freq=2462 MHz) wlan0: Trying to associate with 00:1d:7e:2f:bd:0e (SSID='Gathman Home' freq=2462 MHz) wlan0: Associated with 00:1d:7e:2f:bd:0e wlan0: WPA: Key negotiation completed with 00:1d:7e:2f:bd:0e [PTK=CCMP GTK=CCMP] wlan0: CTRL-EVENT-CONNECTED - Connection to 00:1d:7e:2f:bd:0e completed [id=0 id_str=] ************working away*************** dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7 dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7 dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7 .... repeat hundreds of times dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7 dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7 *********hangs************************* ************killall wpa_supplicant***** wlan0: CTRL-EVENT-DISCONNECTED bssid=00:1d:7e:2f:bd:0e reason=3 locally_generated=1 wlan0: CTRL-EVENT-TERMINATING Successfully initialized wpa_supplicant dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7 wlan0: SME: Trying to authenticate with 00:1d:7e:2f:bd:0e (SSID='Gathman Home' freq=2462 MHz) wlan0: Trying to associate with 00:1d:7e:2f:bd:0e (SSID='Gathman Home' freq=2462 MHz) wlan0: Associated with 00:1d:7e:2f:bd:0e wlan0: WPA: Key negotiation completed with 00:1d:7e:2f:bd:0e [PTK=CCMP GTK=CCMP] wlan0: CTRL-EVENT-CONNECTED - Connection to 00:1d:7e:2f:bd:0e completed [id=0 id_str=] dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7 dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7 dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7 ************working away again**************** When this happens, could you 'gdb attach <pid>' and then run "backtrace" and attach the backtrace so we can see where it hangs? Here is a first cut. I'll see about downloading some debuginfo (gdb) backtrace #0 0xb7597424 in __kernel_vsyscall () #1 0xb720427d in ___newselect_nocancel () from /lib/libc.so.6 #2 0xb75d22f6 in eloop_run () #3 0xb768efe2 in wpa_supplicant_run () #4 0xb75c740d in main () Hung it again after getting first round of debuginfo installed: Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.5.9-1.fc19.i686 krb5-libs-1.11.3-21.fc19.i686 libcom_err-1.42.7-2.fc19.i686 libselinux-2.1.13-15.fc19.i686 pcre-8.32-9.fc19.i686 zlib-1.2.7-10.fc19.i686 (gdb) backtrace #0 0xb7671424 in __kernel_vsyscall () #1 0xb72de27d in ___newselect_nocancel () at ../sysdeps/unix/syscall-template.S:81 #2 0xb76ac2f6 in eloop_run () at ../src/utils/eloop.c:736 #3 0xb7768fe2 in wpa_supplicant_run (global=global@entry=0xb878d6f8) at wpa_supplicant.c:3322 #4 0xb76a140d in main (argc=11, argv=0xbf971454) at main.c:297 I've actually seen this happen twice *with* hardware crypto. But it is so rare as to be essentially unreproducable. This does indicate, however, that it might be a timing problem. And even though rare on modern Wifi hardware, fixing it will increase reliability. This message is a notice that Fedora 19 is now at end of life. Fedora has stopped maintaining and issuing updates for Fedora 19. It is Fedora's policy to close all bug reports from releases that are no longer maintained. Approximately 4 (four) weeks from now this bug will be closed as EOL if it remains open with a Fedora 'version' of '19'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 19 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Still happens in F20 *with* hardware crypto. However, it takes several days. (Vs less than an hour without hardware crypto). I can reproduce by leaving a laptop up for 3 days and 2 nights. It seems to be related to bytes transferred, rather than elapsed time, however. Suspending the laptop, moving to another WAP, rebooting - all reset the byte count for when wpa_supplicant is going to fail. Since people don't run into it very often, maybe an easier fix is a watchdog to just restart wpa_supplicant when no more byte are getting transferred. That is what I end up doing manually anyway. Still getting this on F21, *with* hardware crypto. bug#1192550 makes it difficult to keep my laptop up for the required number of days. I'll leave the extended logging on, but don't want to leave my screen on all night as it is fluorescent. So this is not likely to reproduce until the display sleep bug is fixed, or I get a new LED laptop (with SSD!) Ok, wpa_supplicant hung with hardware crypto, and the extended logging on. The log file seems to have key info, like: WPA: TX EAPOL-Key - hexdump(len=99): < hex data > Any pointers on sanitizing the log? Or are those only temporary keys? Created attachment 1015010 [details]
sanitized teaser of context around hang
Got another hang. 1429394776.993464: IEs - hexdump(len=83): 00 05 4a 5a 36 59 34 01 08 82 84 8b 96 0c 12 18 24 03 01 01 05 04 00 01 00 00 07 06 55 53 20 01 0b 1b 20 01 00 2a 01 00 32 04 30 48 60 6c dd 18 00 50 f2 02 01 01 83 00 03 a4 00 00 27 a4 00 00 42 43 5e 00 62 32 2f 00 dd 09 00 03 7f 01 01 00 20 00 00 1429394776.993544: Beacon IEs - hexdump(len=83): 00 05 4a 5a 36 59 34 01 08 82 84 8b 96 0c 12 18 24 03 01 01 05 04 00 01 00 00 07 06 55 53 20 01 0b 1b 20 01 00 2a 01 00 32 04 30 48 60 6c dd 18 00 50 f2 02 01 01 83 00 03 a4 00 00 27 a4 00 00 42 43 5e 00 62 32 2f 00 dd 09 00 03 7f 01 01 00 20 00 00 1429394776.993608: wlp11s0: BSS: Start scan result update 266 1429394776.993677: BSS: last_scan_res_used=5/32 last_scan_full=0 1429394776.993701: Add randomness: count=1410 entropy=969 1429394776.993725: random pool - hexdump(len=128): [REMOVED] 1429394776.993746: random_mix_pool - hexdump(len=8): [REMOVED] 1429394776.993766: random_mix_pool - hexdump(len=5): [REMOVED] 1429394776.993787: random pool - hexdump(len=128): [REMOVED] 1429394776.993806: Add randomness: count=1411 entropy=970 1429394776.993827: random pool - hexdump(len=128): [REMOVED] 1429394776.993849: random_mix_pool - hexdump(len=8): [REMOVED] 1429394776.993871: random_mix_pool - hexdump(len=5): [REMOVED] 1429394776.993891: random pool - hexdump(len=128): [REMOVED] 1429394776.993911: Add randomness: count=1412 entropy=971 1429394776.993931: random pool - hexdump(len=128): [REMOVED] 1429394776.993952: random_mix_pool - hexdump(len=8): [REMOVED] 1429394776.993974: random_mix_pool - hexdump(len=5): [REMOVED] 1429394776.993996: random pool - hexdump(len=128): [REMOVED] 1429394776.994017: Add randomness: count=1413 entropy=972 1429394776.994037: random pool - hexdump(len=128): [REMOVED] 1429394776.994056: random_mix_pool - hexdump(len=8): [REMOVED] 1429394776.994075: random_mix_pool - hexdump(len=5): [REMOVED] 1429394776.994097: random pool - hexdump(len=128): [REMOVED] 1429394776.994118: Add randomness: count=1414 entropy=973 1429394776.994139: random pool - hexdump(len=128): [REMOVED] 1429394776.994158: random_mix_pool - hexdump(len=8): [REMOVED] 1429394776.994178: random_mix_pool - hexdump(len=5): [REMOVED] 1429394776.994198: random pool - hexdump(len=128): [REMOVED] 1429394776.994221: wlp11s0: New scan results available 1429394776.994335: wlp11s0: Selecting BSS from priority group 0 1429394776.994376: wlp11s0: 0: xx:xx:xx:xx:xx:xx ssid='Gathman Home' wpa_ie_len=0 rsn_ie_len=20 caps=0x411 level=-61 1429394776.994405: wlp11s0: selected based on RSN IE 1429394776.994432: wlp11s0: selected BSS xx:xx:xx:xx:xx:xx ssid='Gathman Home' 1429394776.994459: wlp11s0: Checking for other virtual interfaces sharing same radio (phy0) in event_scan_results 1429394776.997434: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/3 1429394776.998769: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/3/BSSs/507 1429394776.999015: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/3/BSSs/508 1429394776.999210: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/3/BSSs/497 ********HUNG********* 1429394890.724255: dbus: fi.w1.wpa_supplicant1.Interface.Scan (/fi/w1/wpa_supplicant1/Interfaces/3) Still a problem in Fedora 22 with wpa_supplicant-2.4-4.fc22.i686 Since wireless connections need to be redone on resume anyway, restarting wpa_supplicant beforehand would go a long way to making the problem unnoticeable. Trying to figure out how to do that with systemd... This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component. This workaround is painless and mostly avoids the problem on a laptop which gets frequently suspended: /etc/systemd/system/wpa_supplicant-resume.service ----%<-------%<--------%<--------- [Unit] Description=Restart wpa_supplicant on resume from sleep After=sleep.target [Service] Type=oneshot ExecStart=/usr/bin/killall wpa_supplicant [Install] WantedBy=sleep.target ----%<-------%<--------%<--------- # systemctl enable wpa_supplicant-resume I'm not sure what the outstanding NEEDINFO request is for. I've provided all the logs and stuff asked for. Still broken on F24. wpa_supplicant-2.5-5.fc24.x86_64 On F24, killing wpa_supplicant does not automatically restart it. You have to use "systemctl restart wpa_supplicant". This problem persists on a variety of laptop models and wifi chipsets. Setting wpa_supplicant to restart on resume from suspend is the best way to make the problem mostly invisible for chipsets with hardware encryption (all modern ones). I turn that off for new Fedora versions to check whether the problem has been fixed. For chipsets without hardware assist, wpa_supplicant dies much more quickly. This problem also manifests on marginal connections. This might be an easier way to reproduce. When the signal level is at something like 30%, wpa_supplicant will hang in less than an hour. This message is a reminder that Fedora 24 is nearing its end of life. Approximately 2 (two) weeks from now Fedora will stop maintaining and issuing updates for Fedora 24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '24'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 24 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Looks like this will never get fixed. At least I have a decent workaround. Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |