Bug 983296

Summary: rgmanager segfault while starting service
Product: Red Hat Enterprise Linux 6 Reporter: Chester Knapp <cknapp>
Component: rgmanagerAssignee: Ryan McCabe <rmccabe>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 6.2CC: cknapp, cluster-maint, cphillip, djansa, mjuricek, rmccabe, wbirkhea
Target Milestone: rcKeywords: OtherQA
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: rgmanager-3.0.12.1-18.el6 Doc Type: Bug Fix
Doc Text:
* Previously, attempts to start an MRG Messaging (MRG-M) broker caused rgmanager to terminate unexpectedly with a segmentation fault. This was caused by subtle memory corruption introduced by calling pthread_mutex_unlock() on a mutual exclusion that was not locked. This update adresses scenarios where memory could be corrupted when calling pthread_mutex_unlock(), and crashes no longer occur in the described scenario.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-21 10:56:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
abrt output
none
another abrt core dump none

Description Chester Knapp 2013-07-10 22:13:56 UTC
Created attachment 771875 [details]
abrt output (core dump)

Description of problem: rgmanager dumps core while starting up an HA MRG-M broker


Version-Release number of selected component (if applicable):
rgmanager-3.0.12.1-17.el6.x86_64

How reproducible:
Only once, so far


Steps to Reproduce:
1. Attempt to start MRG broker (e.g.  /usr/sbin/clusvcadm -e BR.0 -m hostname)
2.
3.

Actual results:
rgmanager crashes

Expected results:
rgmanager starts services normally

Additional info:
uname -a: 
Linux omhq1adf 2.6.32-220.7.1.el6.x86_64 #1 SMP Fri Feb 10 15:22:22 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

Comment 2 Ryan McCabe 2013-07-12 19:50:31 UTC
Hopefully this is fixed by cluster.git commit 156a32063baa99fefdb445c68111b174dc06753e

Comment 3 Chester Knapp 2013-07-16 13:57:33 UTC
Created attachment 774266 [details]
abrt output

Comment 4 Chester Knapp 2013-07-16 13:58:52 UTC
Created attachment 774281 [details]
another abrt core dump

I've added two additional core dumps after seeing this behavior repeated. Consider upping the frequency of the defect.

Comment 5 Ryan McCabe 2013-07-16 15:21:20 UTC
Are you able to try a test package with a proposed fix?

Comment 6 Chester Knapp 2013-07-16 16:43:44 UTC
I'll contact the customer to determine if this is feasible. Although, I don't have a reliable reproducer either way. rgmanager has cored 5 times out of several hundred over the last week, so it is still intermittent at best.

Comment 7 Chester Knapp 2013-07-16 16:44:54 UTC
(In reply to Ryan McCabe from comment #5)
> Are you able to try a test package with a proposed fix?

Yes, the customer is agreeable. We'll try out the test package you provide.

Comment 8 Ryan McCabe 2013-07-17 15:22:34 UTC
(In reply to Chester Knapp from comment #7)
> (In reply to Ryan McCabe from comment #5)
> > Are you able to try a test package with a proposed fix?
> 
> Yes, the customer is agreeable. We'll try out the test package you provide.

You can grab the build from https://brewweb.devel.redhat.com/buildinfo?buildID=282334

Let me know how it goes.


Thanks for testing!

Comment 11 Chester Knapp 2013-08-27 15:15:16 UTC
Customer opted not to use the test package.

Comment 16 errata-xmlrpc 2013-11-21 10:56:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1600.html