Bug 983296

Summary:

rgmanager segfault while starting service

Product:

Red Hat Enterprise Linux 6

Reporter:

Chester Knapp <cknapp>

Component:

rgmanager

Assignee:

Ryan McCabe <rmccabe>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

high

Docs Contact:

Priority:

high

Version:

6.2

CC:

cknapp, cluster-maint, cphillip, djansa, mjuricek, rmccabe, wbirkhea

Target Milestone:

Keywords:

OtherQA

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

rgmanager-3.0.12.1-18.el6

Doc Type:

Bug Fix

Doc Text:

* Previously, attempts to start an MRG Messaging (MRG-M) broker caused rgmanager to terminate unexpectedly with a segmentation fault. This was caused by subtle memory corruption introduced by calling pthread_mutex_unlock() on a mutual exclusion that was not locked. This update adresses scenarios where memory could be corrupted when calling pthread_mutex_unlock(), and crashes no longer occur in the described scenario.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-11-21 10:56:10 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
abrt output	none
another abrt core dump	none

Description Chester Knapp 2013-07-10 22:13:56 UTC

Created attachment 771875 [details]
abrt output (core dump)

Description of problem: rgmanager dumps core while starting up an HA MRG-M broker


Version-Release number of selected component (if applicable):
rgmanager-3.0.12.1-17.el6.x86_64

How reproducible:
Only once, so far


Steps to Reproduce:
1. Attempt to start MRG broker (e.g.  /usr/sbin/clusvcadm -e BR.0 -m hostname)
2.
3.

Actual results:
rgmanager crashes

Expected results:
rgmanager starts services normally

Additional info:
uname -a: 
Linux omhq1adf 2.6.32-220.7.1.el6.x86_64 #1 SMP Fri Feb 10 15:22:22 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

Comment 2 Ryan McCabe 2013-07-12 19:50:31 UTC

Hopefully this is fixed by cluster.git commit 156a32063baa99fefdb445c68111b174dc06753e

Comment 3 Chester Knapp 2013-07-16 13:57:33 UTC

Created attachment 774266 [details]
abrt output

Comment 4 Chester Knapp 2013-07-16 13:58:52 UTC

Created attachment 774281 [details]
another abrt core dump

I've added two additional core dumps after seeing this behavior repeated. Consider upping the frequency of the defect.

Comment 5 Ryan McCabe 2013-07-16 15:21:20 UTC

Are you able to try a test package with a proposed fix?

Comment 6 Chester Knapp 2013-07-16 16:43:44 UTC

I'll contact the customer to determine if this is feasible. Although, I don't have a reliable reproducer either way. rgmanager has cored 5 times out of several hundred over the last week, so it is still intermittent at best.

Comment 7 Chester Knapp 2013-07-16 16:44:54 UTC

(In reply to Ryan McCabe from comment #5)
> Are you able to try a test package with a proposed fix?

Yes, the customer is agreeable. We'll try out the test package you provide.

Comment 8 Ryan McCabe 2013-07-17 15:22:34 UTC

(In reply to Chester Knapp from comment #7)
> (In reply to Ryan McCabe from comment #5)
> > Are you able to try a test package with a proposed fix?
> 
> Yes, the customer is agreeable. We'll try out the test package you provide.

You can grab the build from https://brewweb.devel.redhat.com/buildinfo?buildID=282334

Let me know how it goes.


Thanks for testing!

Comment 11 Chester Knapp 2013-08-27 15:15:16 UTC

Customer opted not to use the test package.

Comment 16 errata-xmlrpc 2013-11-21 10:56:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1600.html