619476 – clurgmgrd segfaults with error 6

Bug 619476 - clurgmgrd segfaults with error 6

Summary: clurgmgrd segfaults with error 6

Keywords:
Status:	CLOSED DUPLICATE of bug 572695
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	rgmanager
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Lon Hohberger
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	637263 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-07-29 15:56 UTC by Shane Bradley
Modified:	2018-10-27 13:34 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2010-10-22 14:32:44 UTC
Embargoed:

Attachments	(Terms of Use)

Description Shane Bradley 2010-07-29 15:56:32 UTC

Description of problem:
rgmanager is generating segfaults on service state change:


Jul  7 14:10:04 nodeA clurgmgrd[500]: <notice> Service Q01STRS_TH started 
....
Jul  8 05:39:27 nodeA clurgmgrd[500]: <notice> Service Q01STRS_MY started
Jul  8 05:44:45 nodeA clurgmgrd[500]: <err> #48: Unable to obtain cluster lock: Unknown error 65539
Jul  8 05:44:45 nodeA clurgmgrd[500]: <notice> Stopping service Q01STRS_AU
Jul  8 05:44:55 nodeA clurgmgrd[500]: <notice> Service Q01STRS_AU is recovering
Jul  8 05:44:55 nodeA clurgmgrd[500]: <notice> Recovering failed service Q01STRS_AU
Jul  8 05:45:16 nodeA clurgmgrd[500]: <notice> Service Q01STRS_AU started
Jul  8 05:56:06 nodeA kernel: clurgmgrd[15838]: segfault at 000000c000000010 rip 0000003000269b40 rsp 000000007204e900 error 6
Jul  8 05:56:06 nodeA clurgmgrd[499]: <crit> Watchdog: Daemon died, rebooting...
Jul  8 05:56:06 nodeA kernel: md: stopping all md devices.
Jul  8 05:56:06 nodeA kernel: md: md0 switched to read-only mode.
Jul  8 05:59:25 nodeA syslogd 1.4.1: restart (remote reception).
....
Jul  8 06:01:12 nodeA clurgmgrd[506]: <notice> Starting stopped service Q01STRS_TH 
Jul  8 06:01:33 nodeA clurgmgrd[506]: <notice> Service Q01STRS_TH started 

The segfault backtrace looks like:
Program terminated with signal 11, Segmentation fault.
#0  _int_malloc (av=0x3000434640, bytes=) at malloc.c:4181
4181            bck->fd = bin;

Thread 1 (process 15838):
#0  _int_malloc (av=0x3000434640, bytes=) at malloc.c:4181
#1  0x000000300026b6d2 in *__GI___libc_malloc (bytes=32) at malloc.c:3346
#2  0x0000000000425028 in clist_insert ()
#3  0x00000000004216bf in msg_open ()
#4  0x000000000041efc6 in vf_write (membership=0x657850, flags=2, keyid=0x7204ec60 "usrm::rg=\"Q01STRS_TH\"", data=0x7204ef20, datalen=104) at vft.c:1315
#5  0x000000000040b515 in set_rg_state (rgname=0x7204efd8 "Q01STRS_TH", svcblk=0x7204ef20) at rg_state.c:306
#6  0x000000000040b595 in init_rg (name=0x7204efd8 "Q01STRS_TH", svcblk=0x7204ef20) at rg_state.c:323
#7  0x000000000040b688 in get_rg_state (rgname=0x7204efd0 "service:Q01STRS_TH", svcblk=0x7204ef20) at rg_state.c:353
#8  0x000000000040c3c7 in svc_status (svcName=0x7204efd0 "service:Q01STRS_TH") at rg_state.c:877
#9  0x0000000000404f10 in resgroup_thread_main (arg=0x414620c0) at rg_thread.c:384
#10 0x0000003527d06137 in start_thread (arg=) at pthread_create.c:274
#11 0x00000030002c9883 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 from /lib64/tls/libc.so.6
Current language:  auto; currently c
#1  0x000000300026b6d2 in *__GI___libc_malloc (bytes=32) at malloc.c:3346
3346      victim = _int_malloc(ar_ptr, bytes);

Version-Release number of selected component (if applicable):
rgmanager-1.9.87-1.el4_8.1-x86_64 


How reproducible:
Not easily, only happen couple times.

Steps to Reproduce:
1. Appears to happen when Service is changing states
  
Actual results:
clurgmgrd segfaults with error 6

Expected results:
No segfault

Additional info:

Comment 3 Lon Hohberger 2010-09-28 16:10:01 UTC

*** Bug 637263 has been marked as a duplicate of this bug. ***

Comment 5 Lon Hohberger 2010-10-22 14:32:44 UTC

This was fixed some time ago by bug 572695.

Furthermore, it was copied into the z-stream (EUS) as bug 572792.

https://rhn.redhat.com/errata/RHBA-2010-0404.html

*** This bug has been marked as a duplicate of bug 572695 ***

Note You need to log in before you can comment on or make changes to this bug.