Description of problem: rgmanager is generating segfaults on service state change: Jul 7 14:10:04 nodeA clurgmgrd[500]: <notice> Service Q01STRS_TH started .... Jul 8 05:39:27 nodeA clurgmgrd[500]: <notice> Service Q01STRS_MY started Jul 8 05:44:45 nodeA clurgmgrd[500]: <err> #48: Unable to obtain cluster lock: Unknown error 65539 Jul 8 05:44:45 nodeA clurgmgrd[500]: <notice> Stopping service Q01STRS_AU Jul 8 05:44:55 nodeA clurgmgrd[500]: <notice> Service Q01STRS_AU is recovering Jul 8 05:44:55 nodeA clurgmgrd[500]: <notice> Recovering failed service Q01STRS_AU Jul 8 05:45:16 nodeA clurgmgrd[500]: <notice> Service Q01STRS_AU started Jul 8 05:56:06 nodeA kernel: clurgmgrd[15838]: segfault at 000000c000000010 rip 0000003000269b40 rsp 000000007204e900 error 6 Jul 8 05:56:06 nodeA clurgmgrd[499]: <crit> Watchdog: Daemon died, rebooting... Jul 8 05:56:06 nodeA kernel: md: stopping all md devices. Jul 8 05:56:06 nodeA kernel: md: md0 switched to read-only mode. Jul 8 05:59:25 nodeA syslogd 1.4.1: restart (remote reception). .... Jul 8 06:01:12 nodeA clurgmgrd[506]: <notice> Starting stopped service Q01STRS_TH Jul 8 06:01:33 nodeA clurgmgrd[506]: <notice> Service Q01STRS_TH started The segfault backtrace looks like: Program terminated with signal 11, Segmentation fault. #0 _int_malloc (av=0x3000434640, bytes=) at malloc.c:4181 4181 bck->fd = bin; Thread 1 (process 15838): #0 _int_malloc (av=0x3000434640, bytes=) at malloc.c:4181 #1 0x000000300026b6d2 in *__GI___libc_malloc (bytes=32) at malloc.c:3346 #2 0x0000000000425028 in clist_insert () #3 0x00000000004216bf in msg_open () #4 0x000000000041efc6 in vf_write (membership=0x657850, flags=2, keyid=0x7204ec60 "usrm::rg=\"Q01STRS_TH\"", data=0x7204ef20, datalen=104) at vft.c:1315 #5 0x000000000040b515 in set_rg_state (rgname=0x7204efd8 "Q01STRS_TH", svcblk=0x7204ef20) at rg_state.c:306 #6 0x000000000040b595 in init_rg (name=0x7204efd8 "Q01STRS_TH", svcblk=0x7204ef20) at rg_state.c:323 #7 0x000000000040b688 in get_rg_state (rgname=0x7204efd0 "service:Q01STRS_TH", svcblk=0x7204ef20) at rg_state.c:353 #8 0x000000000040c3c7 in svc_status (svcName=0x7204efd0 "service:Q01STRS_TH") at rg_state.c:877 #9 0x0000000000404f10 in resgroup_thread_main (arg=0x414620c0) at rg_thread.c:384 #10 0x0000003527d06137 in start_thread (arg=) at pthread_create.c:274 #11 0x00000030002c9883 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 from /lib64/tls/libc.so.6 Current language: auto; currently c #1 0x000000300026b6d2 in *__GI___libc_malloc (bytes=32) at malloc.c:3346 3346 victim = _int_malloc(ar_ptr, bytes); Version-Release number of selected component (if applicable): rgmanager-1.9.87-1.el4_8.1-x86_64 How reproducible: Not easily, only happen couple times. Steps to Reproduce: 1. Appears to happen when Service is changing states Actual results: clurgmgrd segfaults with error 6 Expected results: No segfault Additional info:
*** Bug 637263 has been marked as a duplicate of this bug. ***
This was fixed some time ago by bug 572695. Furthermore, it was copied into the z-stream (EUS) as bug 572792. https://rhn.redhat.com/errata/RHBA-2010-0404.html *** This bug has been marked as a duplicate of bug 572695 ***