Bug 825375

Summary: dbus-related crash in rgmanager
Product: Red Hat Enterprise Linux 6 Reporter: Ryan McCabe <rmccabe>
Component: rgmanagerAssignee: Ryan McCabe <rmccabe>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 6.3CC: cluster-maint, jruemker, mjuricek, pmoravec
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rgmanager-3.0.12.1-13.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 10:18:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
fix none

Description Ryan McCabe 2012-05-25 20:16:03 UTC
When running rgmanager without the -q flag (the default), rgmanager can crash inside dbus library functions as a result of unlocked access to internal dbus data structures from different rgmanager threads.

I've observed the following crash (and others similar to it):

process 26806: The last reference on a connection was dropped without closing the connection. This is a bug in an application. See dbus_connection_unref() documentation for details.
Most likely, the application was supposed to call dbus_connection_close(), since this is a private connection.
  D-Bus not built with -rdynamic so unable to print a backtrace
Aborted (core dumped)

(gdb) bt
#0  0x00007ff38ceed8a5 in raise () from /lib64/libc.so.6
#1  0x00007ff38ceef085 in abort () from /lib64/libc.so.6
#2  0x00007ff38d47f975 in _dbus_abort () at dbus-sysdeps.c:88
#3  0x00007ff38d47b845 in _dbus_warn_check_failed (
    format=0x7ff38d484388 "The last reference on a connection was dropped without closing the connection. This is a bug in an application. See dbus_connection_unref() documentation for details.\n%s") at dbus-internals.c:283
#4  0x00007ff38d465c62 in _dbus_connection_read_write_dispatch (
    connection=0xe573a0, timeout_milliseconds=500, 
    dispatch=<value optimized out>) at dbus-connection.c:3512
#5  0x000000000041b261 in ?? ()
#6  0x00007ff38d8a1851 in start_thread (arg=0x7ff38bea3700)
    at pthread_create.c:301
#7  0x00007ff38cfa267d in clone () from /lib64/libc.so.6

The can be reproduced by repeatedly relocating and restarting services. I was able to reproduce it fairly reliably (albeit after a couple hours in some cases) by using the following configuration snippet:
	<rm>
		<service name="a"/>
		<service name="b"/>
	</rm>

and running the following commands at the same time on two nodes:
 while [ 1 ] ; do clusvcadm -r a ; done
 while [ 1 ] ; do clusvcadm -R a ; done
 while [ 1 ] ; do clusvcadm -r b ; done
 while [ 1 ] ; do clusvcadm -R b ; done

Comment 1 Ryan McCabe 2012-05-25 20:17:58 UTC
Created attachment 586943 [details]
fix

Comment 6 errata-xmlrpc 2013-02-21 10:18:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0409.html

Comment 7 Ryan McCabe 2013-06-03 17:17:31 UTC
*** Bug 970017 has been marked as a duplicate of this bug. ***

Comment 8 Ryan McCabe 2013-06-04 13:09:26 UTC
*** Bug 970550 has been marked as a duplicate of this bug. ***

Comment 9 Ryan McCabe 2013-06-12 12:09:11 UTC
*** Bug 970018 has been marked as a duplicate of this bug. ***