Description of problem: I'm trying to set up a simple 2 node cluster on RHEL 5.3 ppc64 (IBM JS20 blades). I'm using only manual fencing for simplicity's sake and have no resources defined. service cman start was spitting out errors, and, upon manual invocation of the startup commands it seems that groupd is segfaulting and not starting up properly. Version-Release number of selected component (if applicable): cman-2.0.98-1.el5_3.1 How reproducible: Always Steps to Reproduce: 1. Set up a cluster with the attached cluster.conf file on a RHEL53 PPC64 machine. 2. Execute the following commands: # mount -t configfs none /sys/kernel/config # ccsd # cman_tool join # groupd 3. Execute ps afx | grep groupd Actual results: groupd isn't running Expected results: groupd should be running Additional info: The following shows up in /var/log/groupd: 1241754782 cman: our nodeid 2 name domusB.esri.com quorum 1 1241754782 groupd segfault log follows: (Nothing further) A gdb run and backtrace on the groupd process is as follows: # gdb groupd GNU gdb Fedora (6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "ppc64-redhat-linux-gnu"... (gdb) set follow-fork-mode child (gdb) r Starting program: /sbin/groupd Program received signal SIGSEGV, Segmentation fault. [Switching to process 671] 0x0ff238b0 in __new_semctl (semid=<value optimized out>, semnum=<value optimized out>, cmd=<value optimized out>) at ../sysdeps/unix/sysv/linux/semctl.c:122 122 arg = va_arg (ap, union semun); (gdb) bt #0 0x0ff238b0 in __new_semctl (semid=<value optimized out>, semnum=<value optimized out>, cmd=<value optimized out>) at ../sysdeps/unix/sysv/linux/semctl.c:122 #1 0x0fe1314c in openais_service_connect (service=<value optimized out>, shmseg=<value optimized out>) at util.c:339 #2 0x0fe13d60 in cpg_initialize (handle=<value optimized out>, callbacks=<value optimized out>) at cpg.c:108 #3 0x1000c96c in setup_cpg () at cpg.c:638 #4 0x10012930 in loop () at main.c:816 #5 0x1001357c in main (argc=1, argv=0xff9efb14) at main.c:1054 (gdb) bt full #0 0x0ff238b0 in __new_semctl (semid=<value optimized out>, semnum=<value optimized out>, cmd=<value optimized out>) at ../sysdeps/unix/sysv/linux/semctl.c:122 arg = Could not find the frame base for "__new_semctl".
Created attachment 342986 [details] cluster configuration file in use
Thanks for debugging this for us. Looks like more regression from openais ipc.
(In reply to comment #2) > Thanks for debugging this for us. Looks like more regression from openais ipc. Would an older version potentially work for me in the interim? (Perhaps the dependancy chain hassle would make this not worth the effort)
FYI, reverting to openais-0.80.3-22.el5 appears to have resolved the problem. I guess the component on this bz should be changed.
Ray, which version of openais provides this error? openais-0.80.3-22.el5_3.4? Works for me on x86_64, i386, maybe a ppc64 only issue. Regards -steve
(In reply to comment #5) > Ray, > > which version of openais provides this error? openais-0.80.3-22.el5_3.4? > Works for me on x86_64, i386, maybe a ppc64 only issue. > > Regards > -steve openais-0.80.3-22.el5_3.4 - x86_64: Works fine - ppc64: segfaults openais-0.80.3-22.el5 - x86_64: Works fine - ppc64: Works fine I've reproduced on two ppc64 systems (both JS20's).
Created attachment 343606 [details] groupd core
Created attachment 343607 [details] another groupd core
The customer who opened this BZ has a support case open with us as well and has provided us with 2 core files. These were generated on openais-0.80.3-22.el5_3.4-ppc cman-2.0.98-1.el5_3.1-ppc The groupd log for both shows a message such as: 1241755835 cman: our nodeid 1 name domusA.esri.com quorum 1 1241755835 groupd segfault log follows:
*** Bug 501062 has been marked as a duplicate of this bug. ***
Verified with openais-0.80.6-7.el5 and cman-2.0.108-1.el5 on ppc.
Updated to openais-0.80.3-22.el5_3.8 and this appears to be working now. Thanks.
happy to be of service. thanks.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1341.html