Bug 523069 - wpa_supplicant segfaulted shortly after resume
wpa_supplicant segfaulted shortly after resume
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: wpa_supplicant (Show other bugs)
12
All Linux
low Severity medium
: ---
: ---
Assigned To: Dan Williams
Fedora Extras Quality Assurance
: Reopened, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-09-13 17:31 EDT by James
Modified: 2010-02-10 17:16 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-02-10 17:16:58 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description James 2009-09-13 17:31:40 EDT
Description of problem:
wpa_supplicant died in mysterious circumstances shortly after resuming from sleep. Had to restart NetworkManager:

wpa_supplicant[1680]: segfault at 18 ip 0000003c5aa75bd0 sp 00007fff573a0050 error 4 in libc-2.10.1.so[3c5aa00000+164000]

Here is the relevant syslog excerpt:

Sep 13 20:00:35 localhost NetworkManager: <info>  Waking up...
Sep 13 20:00:35 localhost NetworkManager: <info>  (eth0): now managed
Sep 13 20:00:35 localhost NetworkManager: <info>  (eth0): device state change: 1 -> 2 (reason 2)
Sep 13 20:00:35 localhost NetworkManager: <info>  (eth0): bringing up device.
Sep 13 20:00:35 localhost NetworkManager: <info>  (eth0): preparing device.
Sep 13 20:00:35 localhost NetworkManager: <info>  (eth0): deactivating device (reason: 2).
Sep 13 20:00:35 localhost kernel: r8169: eth0: link down
Sep 13 20:00:35 localhost kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready
Sep 13 20:00:35 localhost NetworkManager: <info>  (wlan0): now managed
Sep 13 20:00:35 localhost NetworkManager: <info>  (wlan0): device state change: 1 -> 2 (reason 2)
Sep 13 20:00:35 localhost NetworkManager: <info>  (wlan0): bringing up device.
Sep 13 20:00:35 localhost kernel: Registered led device: iwl-phy0::radio
Sep 13 20:00:35 localhost kernel: Registered led device: iwl-phy0::assoc
Sep 13 20:00:35 localhost kernel: Registered led device: iwl-phy0::RX
Sep 13 20:00:35 localhost kernel: Registered led device: iwl-phy0::TX
Sep 13 20:00:35 localhost kernel: ADDRCONF(NETDEV_UP): wlan0: link is not ready
Sep 13 20:00:35 localhost NetworkManager: <info>  (wlan0): preparing device.
Sep 13 20:00:35 localhost NetworkManager: <info>  (wlan0): deactivating device (reason: 2).
Sep 13 20:00:35 localhost kernel: wpa_supplicant[1680]: segfault at 18 ip 0000003c5aa75bd0 sp 00007fff573a0050 error 4 in libc-2.10.1.so[3c5aa00000+164000]
Sep 13 20:00:35 localhost NetworkManager: <WARN>  nm_call_store_remove(): Trying to move a non-existant call id.

This is repeated quite a lot until:

Sep 13 20:00:35 localhost NetworkManager: <WARN>  nm_call_store_remove(): Trying to move a non-existant call id.
Sep 13 20:00:36 localhost dbus: Rejected send message, 3 matched rules; type="method_return", sender=":1.19" (uid=0 pid=1930 comm="/usr/libexec/devkit-power-daemon ") interface="(unset)" member="(unset)" error name="(unset)" requested_reply=0 destination=":1.31" (uid=500 pid=2172 comm="gnome-power-manager "))
Sep 13 20:00:36 localhost ntpd[1722]: sendto(87.232.1.41) (fd=22): Invalid argument
Sep 13 20:00:36 localhost ntpd[1722]: sendto(83.229.210.18) (fd=22): Invalid argument
Sep 13 20:00:36 localhost ntpd[1722]: sendto(91.121.45.45) (fd=22): Invalid argument
Sep 13 20:00:36 localhost ntpd[1722]: Deleting interface #7 wlan0, fe80::21d:e0ff:feb5:201d#123, interface stats: received=0, sent=0, dropped=0, active_time=26535 secs
Sep 13 20:00:36 localhost ntpd[1722]: Deleting interface #8 wlan0, 192.168.1.65#123, interface stats: received=3, sent=3, dropped=3, active_time=26533 secs
Sep 13 20:00:36 localhost NetworkManager: <WARN>  nm_call_store_remove(): Trying to move a non-existant call id.

And there's a lot of the last line following that.

Version-Release number of selected component (if applicable):
NetworkManager-0.7.1-8.git20090708.fc11.x86_64
wpa_supplicant-0.6.8-4.fc11.x86_64

How reproducible:
Don't know, only seen once. Possible spin on bug 516585?
Comment 1 James 2009-09-26 14:35:06 EDT
Apologies for closing this by mistake (wrong browser tab...).
Comment 2 James 2009-10-27 16:36:37 EDT
Additional note:

wpa_supplicant[1159]: segfault at 20000c2 ip 000000000043d107 sp 00007fff86372c80 error 4 in wpa_supplicant[400000+64000]

seen in rawhide, wpa_supplicant-0.6.8-6.fc12.x86_64.
Comment 3 James 2009-11-15 16:12:17 EST
And another: wpa_supplicant[1102]: segfault at 2cd ip 000000000043d124 sp 00007fff9b0428e0 error 4 in wpa_supplicant[400000+64000]. Any idea how I can get a coredump on these?
Comment 4 Dan Williams 2009-11-16 01:58:13 EST
Better yet...  lets just debug it with gdb.  I'm curious to know the cause too.  So in a root terminal or two:

1) service NetworkManager stop
2) killall -TERM wpa_supplicant
3) gdb /usr/sbin/wpa_supplicant
4) handle SIGTERM nostop pass
5) handle SIGCHLD nostop pass
6) run -u
7) service NetworkManager start

then leave that running in the terminal and then try to get it to segfaul.  Just check the terminal after each resume to see if gdb has dropped you back to the (gdb) prompt and said something about a SIGSEGV.  Then, when you're at the gdb prompt, type:

backtrace

and hit return.  Copy all the wpa_supplicant output and paste it (or attach it) to this bug.  Apologies if you're already awesome with gdb and didnt' need this step-by-step.  Thanks!
Comment 5 Bug Zapper 2009-11-16 07:20:26 EST
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 6 James 2009-11-16 14:56:53 EST
OK, I've got something. A couple of debuginfos were missing, if you want me to run it again with more installed, let me know.

[root@rhapsody james]#  gdb /usr/sbin/wpa_supplicant
GNU gdb (GDB) Fedora (7.0-3.fc12)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/wpa_supplicant...Reading symbols from /usr/lib/debug/usr/sbin/wpa_supplicant.debug...done.
done.
(gdb) handle SIGTERM nostop pass
Signal        Stop	Print	Pass to program	Description
SIGTERM       No	Yes	Yes		Terminated
(gdb) handle SIGCHLD nostop pass
Signal        Stop	Print	Pass to program	Description
SIGCHLD       No	No	Yes		Child status changed
(gdb) run -u
Starting program: /usr/sbin/wpa_supplicant -u
[Thread debugging using libthread_db enabled]

ioctl[SIOCSIWSCAN]: Device or resource busy
Failed to initiate AP scan.
ioctl[SIOCGIWSCAN]: Network is down
l2_packet_receive - recvfrom: Network is down
ioctl[SIOCSIWSCAN]: Device or resource busy
Failed to initiate AP scan.
l2_packet_receive - recvfrom: Network is down
ioctl[SIOCSIWSCAN]: Device or resource busy
Failed to initiate AP scan.
CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
l2_packet_receive - recvfrom: Network is down
ioctl[SIOCSIWMLME]: Invalid argument
ioctl[SIOCGIWMODE]: Bad file descriptor
ioctl[SIOCSIWAP]: Bad file descriptor
ioctl[SIOCSIWESSID]: Bad file descriptor
ioctl[SIOCSIWSCAN]: Device or resource busy
Failed to initiate AP scan.

Program received signal SIGSEGV, Segmentation fault.
wpa_supplicant_scan (eloop_ctx=0x66b400, timeout_ctx=0x0) at scan.c:88
88			if (!ssid->disabled) {
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.2-6.fc12.x86_64 krb5-libs-1.7-8.fc12.x86_64 libcap-ng-0.6.2-3.fc12.x86_64 libcom_err-1.41.9-5.fc12.x86_64 libselinux-2.0.87-1.fc12.x86_64
(gdb) 
(gdb) backtrace
#0  wpa_supplicant_scan (eloop_ctx=0x66b400, timeout_ctx=0x0) at scan.c:88
#1  0x000000000040b72b in eloop_run () at ../src/utils/eloop.c:496
#2  0x0000000000436807 in wpa_supplicant_run (global=0x668730)
    at wpa_supplicant.c:2155
#3  0x000000000043d8a8 in main (argc=<value optimized out>, 
    argv=<value optimized out>) at main.c:253
(gdb) quit
A debugging session is active.

	Inferior 1 [process 12496] will be killed.

Quit anyway? (y or n) y
Comment 7 Dan Williams 2009-11-29 18:03:44 EST
Any chance you could do this again and see if it consistently crashes at the same place?  And if you can get it to crash in the same place again, can you do this right after you do the 'backtrace' command?

print ssid

Thanks!
Comment 8 James 2009-12-30 16:05:16 EST
Dan, I'm not seeing the crash much at the moment; in fact I've seen nothing about wpa_supplicant segfaulting in the recent logs (they go back to 29 Nov 2009). I know it was updated to version 0.6.8-7 from -6 on 13 Dec 2009, could the changes therein have solved this? The only other thing I can think of is kernel updates.
Comment 9 James 2010-02-10 17:16:58 EST
I've not seen this in a while, I'll close it for the time being.

Note You need to log in before you can comment on or make changes to this bug.