Bug 619476

Summary:	clurgmgrd segfaults with error 6
Product:	[Retired] Red Hat Cluster Suite	Reporter:	Shane Bradley <sbradley>
Component:	rgmanager	Assignee:	Lon Hohberger <lhh>
Status:	CLOSED DUPLICATE	QA Contact:	Cluster QE <mspqa-list>
Severity:	medium	Docs Contact:
Priority:	low
Version:	4	CC:	cluster-maint, djansa, edamato, kurt, tao
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2010-10-22 14:32:44 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Shane Bradley 2010-07-29 15:56:32 UTC

Description of problem:
rgmanager is generating segfaults on service state change:


Jul  7 14:10:04 nodeA clurgmgrd[500]: <notice> Service Q01STRS_TH started 
....
Jul  8 05:39:27 nodeA clurgmgrd[500]: <notice> Service Q01STRS_MY started
Jul  8 05:44:45 nodeA clurgmgrd[500]: <err> #48: Unable to obtain cluster lock: Unknown error 65539
Jul  8 05:44:45 nodeA clurgmgrd[500]: <notice> Stopping service Q01STRS_AU
Jul  8 05:44:55 nodeA clurgmgrd[500]: <notice> Service Q01STRS_AU is recovering
Jul  8 05:44:55 nodeA clurgmgrd[500]: <notice> Recovering failed service Q01STRS_AU
Jul  8 05:45:16 nodeA clurgmgrd[500]: <notice> Service Q01STRS_AU started
Jul  8 05:56:06 nodeA kernel: clurgmgrd[15838]: segfault at 000000c000000010 rip 0000003000269b40 rsp 000000007204e900 error 6
Jul  8 05:56:06 nodeA clurgmgrd[499]: <crit> Watchdog: Daemon died, rebooting...
Jul  8 05:56:06 nodeA kernel: md: stopping all md devices.
Jul  8 05:56:06 nodeA kernel: md: md0 switched to read-only mode.
Jul  8 05:59:25 nodeA syslogd 1.4.1: restart (remote reception).
....
Jul  8 06:01:12 nodeA clurgmgrd[506]: <notice> Starting stopped service Q01STRS_TH 
Jul  8 06:01:33 nodeA clurgmgrd[506]: <notice> Service Q01STRS_TH started 

The segfault backtrace looks like:
Program terminated with signal 11, Segmentation fault.
#0  _int_malloc (av=0x3000434640, bytes=) at malloc.c:4181
4181            bck->fd = bin;

Thread 1 (process 15838):
#0  _int_malloc (av=0x3000434640, bytes=) at malloc.c:4181
#1  0x000000300026b6d2 in *__GI___libc_malloc (bytes=32) at malloc.c:3346
#2  0x0000000000425028 in clist_insert ()
#3  0x00000000004216bf in msg_open ()
#4  0x000000000041efc6 in vf_write (membership=0x657850, flags=2, keyid=0x7204ec60 "usrm::rg=\"Q01STRS_TH\"", data=0x7204ef20, datalen=104) at vft.c:1315
#5  0x000000000040b515 in set_rg_state (rgname=0x7204efd8 "Q01STRS_TH", svcblk=0x7204ef20) at rg_state.c:306
#6  0x000000000040b595 in init_rg (name=0x7204efd8 "Q01STRS_TH", svcblk=0x7204ef20) at rg_state.c:323
#7  0x000000000040b688 in get_rg_state (rgname=0x7204efd0 "service:Q01STRS_TH", svcblk=0x7204ef20) at rg_state.c:353
#8  0x000000000040c3c7 in svc_status (svcName=0x7204efd0 "service:Q01STRS_TH") at rg_state.c:877
#9  0x0000000000404f10 in resgroup_thread_main (arg=0x414620c0) at rg_thread.c:384
#10 0x0000003527d06137 in start_thread (arg=) at pthread_create.c:274
#11 0x00000030002c9883 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 from /lib64/tls/libc.so.6
Current language:  auto; currently c
#1  0x000000300026b6d2 in *__GI___libc_malloc (bytes=32) at malloc.c:3346
3346      victim = _int_malloc(ar_ptr, bytes);

Version-Release number of selected component (if applicable):
rgmanager-1.9.87-1.el4_8.1-x86_64 


How reproducible:
Not easily, only happen couple times.

Steps to Reproduce:
1. Appears to happen when Service is changing states
  
Actual results:
clurgmgrd segfaults with error 6

Expected results:
No segfault

Additional info:

Comment 3 Lon Hohberger 2010-09-28 16:10:01 UTC

*** Bug 637263 has been marked as a duplicate of this bug. ***

Comment 5 Lon Hohberger 2010-10-22 14:32:44 UTC

This was fixed some time ago by bug 572695.

Furthermore, it was copied into the z-stream (EUS) as bug 572792.

https://rhn.redhat.com/errata/RHBA-2010-0404.html

*** This bug has been marked as a duplicate of bug 572695 ***