Bug 538717 - NetworkManager crashes on resume
Summary: NetworkManager crashes on resume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 12
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Dan Williams
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 541476 579821 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-11-19 07:25 UTC by Gordon Messmer
Modified: 2013-01-11 04:54 UTC (History)
6 users (show)

Fixed In Version: NetworkManager-0.8.0-7.git20100422.fc13
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-04-23 06:03:26 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
crash backtrace (20.09 KB, text/plain)
2009-11-19 07:25 UTC, Gordon Messmer
no flags Details
Syslog output from when my laptop was resumed (13.89 KB, text/plain)
2009-11-19 16:31 UTC, Tore Anderson
no flags Details
Backtrace as generated by abrt (168.07 KB, text/plain)
2010-04-12 01:29 UTC, Jonathan Larmour
no flags Details

Description Gordon Messmer 2009-11-19 07:25:18 UTC
Created attachment 370288 [details]
crash backtrace

Description of problem:
When resuming from suspend, NetworkManager will sometimes crash.  The following will appear in /var/log/messages somewhere around 100k times:

Nov 18 10:30:57 vagabond NetworkManager: <WARN>  nm_call_store_remove(): Trying to remove a non-existant call id.

The backtrace appears to show that NetworkManager enters a recursive function call until it crashes.  I haven't gone through things closely enough to be sure...

Version-Release number of selected component (if applicable):
NetworkManager-0.7.996-6.git20091021.fc12.x86_64

As far as I know, this problem affects all of the NetworkManager releases under F11 and F12.

How reproducible:
Frequent.

Steps to Reproduce:
1. Suspend
2. Resume
3. Repeat
  
Actual results:
Eventually NetworkManager will crash, and the nm-applet will no longer appear in the Gnome panel.  Restarting the NetworkManager service will cause the icon to appear once again.

More info:
I've attached a backtrace from gdb which has about 41MB of repetition removed.  All of the lines removed are exactly the same as the repeating lines which appear in the backtrace.  If it would help, I have the full backtrace and the core file (41MB and 26MB, respectively).

Comment 1 Tore Anderson 2009-11-19 16:30:06 UTC
I've also seen this error happen a few times after upgrading from F11 to F12.  I never saw it under F11, though, but I also switched from x86 to x86_64 when I upgraded so it might be that it only affects 64-bits systems.  No change in hardware though, see Smolt profile at <http://www.smolts.org/show?uuid=pub_42d1d513-8f1c-4746-9d6b-7301d0ba27aa>.

I see that the system log reports a crash in wpa_supplicant.  No idea if this is because of, or causing, the NM crash.

Tore

Comment 2 Tore Anderson 2009-11-19 16:31:54 UTC
Created attachment 371261 [details]
Syslog output from when my laptop was resumed

The log is filtered through "uniq -c" to reduce its size as well as making it more readable.

Comment 3 Gordon Messmer 2009-11-20 07:05:39 UTC
Thanks, Tore.  I'd completely missed seeing that in the messages log.  Now that you point it out, I see the same thing.  abrt also saved the core file from wpa_supplicant.  I've opened bug 539438 to track that one.

Comment 4 Matthias Clasen 2009-12-07 05:24:14 UTC
Looks like unexpected recursion in nm_supplicant_info_destroy  ?

Comment 5 Dan Williams 2010-02-09 02:03:08 UTC
*** Bug 541476 has been marked as a duplicate of this bug. ***

Comment 6 Jonathan Larmour 2010-04-12 01:29:41 UTC
Created attachment 405876 [details]
Backtrace as generated by abrt

I've seen this as well, and can confirm, like the others that the trigger seems to be wpa_supplicant dying (due to a separate wpa_supplicant bug). NM should be able to cope with that but doesn't.

And like the others, /var/log/messages has many lines of the following form:
Apr 12 01:47:51 lert NetworkManager: <WARN>  nm_call_store_remove(): Trying to remove a non-existant call id.

And for reference, here's what /var/log/messages said as it was bringing up the connection (before wpa_supplicant died) which shows state changes similar to Tore's:
Apr 12 01:47:49 lert NetworkManager: <info>  Waking up...
Apr 12 01:47:49 lert NetworkManager: <info>  (eth1): now managed
Apr 12 01:47:49 lert NetworkManager: <info>  (eth1): device state change: 1 -> 2 (reason 2)
Apr 12 01:47:49 lert NetworkManager: <info>  (eth1): bringing up device.
Apr 12 01:47:49 lert NetworkManager: <info>  (eth1): preparing device.
Apr 12 01:47:49 lert NetworkManager: <info>  (eth1): deactivating device (reason: 2).
Apr 12 01:47:49 lert NetworkManager: <info>  (eth0): now managed
Apr 12 01:47:49 lert NetworkManager: <info>  (eth0): device state change: 1 -> 2 (reason 2)
Apr 12 01:47:49 lert NetworkManager: <info>  (eth0): bringing up device.
Apr 12 01:47:49 lert NetworkManager: <info>  (eth0): preparing device.
Apr 12 01:47:49 lert NetworkManager: <info>  (eth0): deactivating device (reason: 2).
Apr 12 01:47:49 lert kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready
Apr 12 01:47:49 lert kernel: wpa_supplicant[1469]: segfault at 18 ip 00007fa1a3a70b94 sp 00007fffccdb5390 error 4 in libc-2.11.1.so[7fa1a39fc
000+16f000]
Apr 12 01:47:50 lert abrt[3804]: saved core dump of pid 1469 (/usr/sbin/wpa_supplicant) to /var/cache/abrt/ccpp-1271033269-1469.new/coredump (897024 bytes)
Apr 12 01:47:50 lert abrtd: Directory 'ccpp-1271033269-1469' creation detected
Apr 12 01:47:50 lert NetworkManager: <WARN>  nm_call_store_remove(): Trying to remove a non-existant call id.
[ repeat times lots ]

In normal operation, I would have expected it instead to have output:
NetworkManager: <info>  (eth1): supplicant interface state:  starting -> ready

Also in case it helps at all, I am attaching the backtrace generated by abrt. This has the slight advantage of including some of the local variables, although admittedly to my untutored eye, this may not be useful in practice. But I've attached it anyway, just in case.

Comment 7 Dan Williams 2010-04-14 22:24:32 UTC
Upstream fix is 5a01a0b39e634e2cf3c378deb73f15b16645b76e.

Comment 8 Fedora Update System 2010-04-22 23:55:06 UTC
NetworkManager-0.8.0-7.git20100422.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/NetworkManager-0.8.0-7.git20100422.fc13

Comment 9 Fedora Update System 2010-04-23 06:03:13 UTC
NetworkManager-0.8.0-7.git20100422.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 10 Dan Williams 2010-04-26 06:10:31 UTC
*** Bug 579821 has been marked as a duplicate of this bug. ***

Comment 11 Fedora Update System 2010-05-07 05:35:08 UTC
NetworkManager-0.7.2.997-1.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/NetworkManager-0.7.2.997-1.fc11

Comment 12 Jonathan Larmour 2010-05-18 19:45:05 UTC
Hmm, since both F13 and F11 have been fixed, has F12 been forgotten? I write after it happening yet again (NetworkManager-0.8.0-6.git20100408.fc12.x86_64). I can send an abrt log if it's meant to have been fixed in this version.

Comment 13 Fedora Update System 2010-05-18 21:48:35 UTC
NetworkManager-0.7.2.997-1.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.