Bug 865009

Summary: GString mem alloc crashes after dbus op
Product: [Fedora] Fedora Reporter: Tomáš Bžatek <tbzatek>
Component: NetworkManagerAssignee: Dan Williams <dcbw>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 18CC: awilliam, danw, dcbw, dcharlespyle, jklimes, kparal, mikhail.v.gavrilov, psimerda, robatino, sergei.litvinenko, tsmetana
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: AcceptedNTH
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-22 12:41:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 752660, 752664    
Attachments:
Description Flags
backtrace
none
backtrace
none
valgrind log, try #1
none
valgrind log, try #2, with G_SLICE=always-malloc
none
Free GValueArray with g_value_array_free () none

Description Tomáš Bžatek 2012-10-10 15:39:19 UTC
Might not be NetworkManager issue after all, take it as a starting point. Potential candidates include dbus-glib and glib2.

Description of problem:
NetworkManager segfaults.

Version-Release number of selected component (if applicable):
NetworkManager-0.9.7.0-4.git20121004.fc18.i686
dbus-glib-0.100-1.fc18.i686
glib2-2.34.0-1.fc18.i686
kernel-PAE-3.6.1-1.fc18.i686

How reproducible:
always

Steps to Reproduce:
1. Fire up Gnome with The Shell
2. gnome-control-center network
3. watch NM daemon crashes

Comment 1 Tomáš Bžatek 2012-10-10 15:39:42 UTC
Created attachment 624964 [details]
backtrace

Comment 2 Tomáš Bžatek 2012-10-10 15:40:05 UTC
Created attachment 624965 [details]
backtrace

Comment 3 Tomáš Bžatek 2012-10-10 15:48:22 UTC
Forgot to add this is a clean F18 (Alpha) install, i686 32-bit PAE mode, SELinux in enforcing mode.

One physical Ethernet NIC.

> # ip link sh
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN mode DEFAULT 
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 2: p2p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode > DEFAULT qlen 1000
>     link/ether 00:xx:xx:xx:xx:92 brd ff:ff:ff:ff:ff:ff

Comment 4 Tomáš Bžatek 2012-10-10 15:50:12 UTC
Both traces are /usr/sbin/NetworkManager processes. Since it's started on demand, using the following script to attach to a newly started instance:

> (while [[ -z `pidof NetworkManager` ]]; do sleep 1; done) && gdb /usr/sbin/NetworkManager `pidof NetworkManager`

Comment 5 Tomáš Bžatek 2012-10-10 15:57:24 UTC
Created attachment 624981 [details]
valgrind log, try #1

valgrind log, still using GSlice, will try to catch better one

Comment 6 Tomáš Bžatek 2012-10-10 16:04:18 UTC
Okay, running the daemon with G_SLICE=always-malloc set it seems it doesn't crash anymore. Allocator fault perhaps?

Comment 7 Tomáš Bžatek 2012-10-10 16:09:02 UTC
Created attachment 624985 [details]
valgrind log, try #2, with G_SLICE=always-malloc

Another valgrind log, this time the process didn't crash, terminated manually with Ctrl+C. Attaching just for curiosity and comparison.

Comment 8 Dan Williams 2012-10-10 19:59:11 UTC
Thanks; likely a mixup of g_array_unref() used on a GValueArray instead of g_value_array_unref() in src/nm-dispatcher.c.

Comment 9 Dan Williams 2012-10-10 19:59:32 UTC
(In reply to comment #8)
> Thanks; likely a mixup of g_array_unref() used on a GValueArray instead of
> g_value_array_unref() in src/nm-dispatcher.c.

By which I mean g_value_array_free().

Comment 10 Jirka Klimes 2012-10-11 12:32:46 UTC
Created attachment 625498 [details]
Free GValueArray with g_value_array_free ()

Use g_value_array_free () as Dan suggests. g_array_unref () seems to make harm to heap.

Comment 11 Jirka Klimes 2012-10-11 12:43:04 UTC
I'm not sure about the other "Invalid read" in foreach_route_cb (nm-netlink-utils.c:410). The code looks OK for the first look. Maybe nl_addr_get_binary_addr () returns a bad addr?
Anyway, this one shouldn't cause any harm.

Comment 12 Dan Winship 2012-10-11 12:45:01 UTC
looks good

Comment 13 Tomáš Bžatek 2012-10-11 13:39:18 UTC
(In reply to comment #10)
> Created attachment 625498 [details]
> Free GValueArray with g_value_array_free ()
> 
> Use g_value_array_free () as Dan suggests. g_array_unref () seems to make
> harm to heap.

Looks like this did the trick, thanks! I can't seem to make it crash again with the patch applied.

Comment 14 Jirka Klimes 2012-10-11 13:43:59 UTC
Patch from comment #10 pushed to upstream master:
b95b6c8aa1b2e2d6a662e93843e50b50d5a9c6c6

Comment 15 Fedora Update System 2012-10-15 12:39:40 UTC
NetworkManager-0.9.7.0-6.git20121004.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.7.0-6.git20121004.fc18

Comment 16 Dan Winship 2012-10-15 15:40:41 UTC
*** Bug 865042 has been marked as a duplicate of this bug. ***

Comment 17 Fedora Update System 2012-10-15 17:36:54 UTC
Package NetworkManager-0.9.7.0-6.git20121004.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing NetworkManager-0.9.7.0-6.git20121004.fc18'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-16127/NetworkManager-0.9.7.0-6.git20121004.fc18
then log in and leave karma (feedback).

Comment 18 Jirka Klimes 2012-10-16 06:56:40 UTC
*** Bug 863544 has been marked as a duplicate of this bug. ***

Comment 19 Jirka Klimes 2012-10-16 10:58:57 UTC
*** Bug 866434 has been marked as a duplicate of this bug. ***

Comment 20 Kamil Páral 2012-10-16 11:36:28 UTC
Jirka, please transfer Blocks field. Doing that now.

Comment 21 Jirka Klimes 2012-10-17 08:20:52 UTC
*** Bug 864300 has been marked as a duplicate of this bug. ***

Comment 22 Kamil Páral 2012-10-17 10:59:23 UTC
For anyone reproducing this - please note that the issue might be tied just to i386 systems.

Comment 23 Adam Williamson 2012-10-17 18:07:05 UTC
Discussed at 2012-10-17 blocker review meeting: http://meetbot.fedoraproject.org/fedora-qa/2012-10-17/f18beta-blocker-review-4.2012-10-17-16.00.log.txt . Note that this bug can potentially affect installation - see #866434 - which is why it's proposed as a blocker.

We agreed the bug constitutes a conditional violation of the criteria - when it affects installation, install crashes, which is obviously against the criteria. However, it seems to affect only 32-bit installs and does so only occasionally: 2 in 5 tries for kparal, but 1 in 20 tries for Jirka. On the basis that you can just restart and try again, and you should be able to get it to go through after a couple of tries, this is rejected as a blocker. it is accepted as NTH as obviously install crashers are worth fixing post-freeze.

If we get data indicating this might be more than just an occasional problem, it can be re-proposed as a blocker, but it's very likely to be fixed in future builds anyway since it's been accepted as NTH.

Comment 24 Kamil Páral 2012-10-18 08:50:38 UTC
I'm sorry I couldn't have attended yesterday's blocker bug meeting. But I have to re-propose this as a Beta blocker, because this is one of the biggest blockers I have ever seen. I have learned that this manifests only on 32bit systems and that's the reason why my colleagues weren't able to reproduce it. But I have performed about 30 Anaconda boots yesterday on two different 32bit bare metal machines and I see a 80-90% failure rate. Moreover, the remaining 10-20% of cases where it successfully boots to the installer, it's ever worse, because it causes so many of weird things happening, just look at duplicates of bug 866434: no network devices, installation hangs, invalid hostname, etc.

Comment 25 Adam Williamson 2012-10-18 21:21:59 UTC
On that basis I change vote to +1 blocker.

Comment 26 Adam Williamson 2012-10-18 21:23:33 UTC
Well, the update has gone stable now, so we could in fact close this bug. But let's wait for TC5 to verify that the fix is good. Kamil, can you re-test with TC5 - which should include the fixed NM - and close this bug if it works OK? Thanks!

Comment 27 D. Charles Pyle 2012-10-20 02:35:22 UTC
"For anyone reproducing this - please note that the issue might be tied just to i386 systems."

My system is x86_64. I have seen this problem and was added to the CC list by abrt accordingly.

Comment 28 Kamil Páral 2012-10-22 12:41:41 UTC
I have done many boots of Beta TC6 and I'm pretty sure this bug no longer affects Anaconda installer. Closing.

Comment 29 Jirka Klimes 2012-10-31 13:08:10 UTC
*** Bug 863554 has been marked as a duplicate of this bug. ***